Workflow Detection with Improved Phase Discriminability

doi:10.4316/AECE.2024.02003

2/2024 - 3

View TOC | « Previous Article | Next Article »

Workflow Detection with Improved Phase Discriminability

ZHANG, M. , HU, H. , LI, Z.

View the paper record and citations in

Click to see author's profile in

SCOPUS,

IEEE Xplore,

Web of Science

Download PDF (2,243 KB) | Citation | Downloads: 242 | Views: 379

Author keywords
intelligent manufacturing, workflow detection, self-attention mechanism, graph relation reasoning, transformer

References keywords
vision(24), recognition(24), action(23), temporal(20), pattern(15), networks(12), convolutional(12), network(10), iccv(10), cvpr(10)
No common words between the references section and the paper title.

About this article
Date of Publication: 2024-05-31
Volume 24, Issue 2, Year 2024, On page(s): 21 - 30
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2024.02003
Web of Science Accession Number: 001242091800003
SCOPUS ID: 85195645436

Abstract

Full text preview

Workflow detection is a challenge issue in the process of Industry 4.0, which plays a crucial role in intelligent production. However, it faces the problem of inaccurate phase classification and unclear boundary positioning, which are not well resolved in previous works. To solve them, this paper develops a temporal-aware workflow detection framework (TransGAN) which takes advantage of the complementarity between Transformer and graph attention network to improve phase discriminability. Specifically, temporal self-attention is firstly designed to learn the relationship between different positions of feature sequence. Then, multi-scale Transformer is introduced to encode pyramid features, which fuses multiple context cues for discriminative feature representation. At last, contextual and surrounding relations are learned in graph attention network for refined phase classification and boundary localization. Comprehensive experiments are performed to verify the effectiveness of our method. Compared to the advanced AFSD, the accuracy is improved by 2.3 % and 2.1 % when tIoU=0.5 on POTFD and THUMOS-14 dataset, respectively. Empirical study of running speed indicates that the proposed TransGAN can be deployed to real-world industrial environment for workflow detection.

References

Cited By «-- Click to see who has cited this paper

[1] L. Zelnik-Manor, M. Irani, "Statistical analysis of dynamic actions," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1530-1535, Sep. 2006.
[CrossRef] [Web of Science Times Cited 68] [SCOPUS Times Cited 91]

[2] H. Hu, K. Cheng, Z. Li, J. Chen, H. Hu, "Workflow recognition with structured two-stream convolutional networks," Pattern Recognition Letters, vol. 130, pp. 267-274, Oct. 2018.
[CrossRef] [Web of Science Times Cited 7] [SCOPUS Times Cited 8]

[3] C. Thomay, B. Gollan, M. Haslgrubler, A. Ferscha, J. Heftberger, "A multi-sensor algorithm for activity and workflow recognition in an industrial setting," the 12th ACM international conference on pervasive technologies related to assistive environments, pp. 69-76, Jun. 2019.
[CrossRef] [Web of Science Times Cited 5] [SCOPUS Times Cited 5]

[4] T. Xiang, S. Gong, "Beyond tracking: Modelling activity and understanding behavior," International Journal of Computer Vision, vol. 67, pp. 21-51, Apr. 2006.
[CrossRef] [Web of Science Times Cited 157] [SCOPUS Times Cited 198]

[5] A. Voulodimos, D. Kosmopoulos, G. Veres, H. Grabner, L. Van Gool, T. Varvarigou, "Online classification of visual tasks for industrial workflow monitoring," Neural Networks, vol. 24, no. 8, pp. 852-860, Oct. 2011.
[CrossRef] [Web of Science Times Cited 22] [SCOPUS Times Cited 27]

[6] J. E. Bardram, A. Doryab, R. M. Jensen, P. M. Lange, K. L. Nielsen, S. T. Petersen, "Phase recognition during surgical procedures using embedded and body-worn sensors," the 9th IEEE international conference on pervasive computing and communications (PerCom), pp. 45-53, Mar. 2011.
[CrossRef] [SCOPUS Times Cited 62]

[7] T. Czempiel, M. Paschali, M. Keicher, W. Simson, H. Feussner, S. T. Kim, N. Navab, "TeCNO: Surgical phase recognition with multi-stage temporal convolutional networks," the 23rd international conference on medical image computing and computer-assisted intervention, pp. 343-352, Sep. 2020.
[CrossRef] [SCOPUS Times Cited 104]

[8] M. Zhang, H. Hu, Z. Li, J. Chen, "Proposal-based graph attention networks for workflow detection," Neural Processing Letters, vol. 54, no. 1, pp. 101-123, Feb. 2022.
[CrossRef] [Web of Science Times Cited 7] [SCOPUS Times Cited 4]

[9] T. Lima, B. Fernandes, P. Barros, "Human action recognition with 3D convolutional neural network," IEEE Latin American Conference on Computational Intelligence (LA-CCI), pp. 1-6, Nov. 2017.
[CrossRef] [SCOPUS Times Cited 15]

[10] M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang, Q. Tian, "Symbiotic graph neural networks for 3D skeleton-based human action recognition and motion prediction," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 6, pp. 3316-3333, Jan. 2021.
[CrossRef] [Web of Science Times Cited 107] [SCOPUS Times Cited 120]

[11] H. Fan, B. Xiong, K. Mangalam, Y. Li, Z. Yan, J. Malik, C. Feichtenhofer, "Multiscale vision transformers," IEEE/CVF International Conference on Computer Vision, pp. 6824-6835, Oct. 2021.
[CrossRef] [Web of Science Times Cited 436] [SCOPUS Times Cited 600]

[12] S. Ji, W. Xu, M. Yang, K. Yu, "3D convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221-231, Mar. 2012.
[CrossRef] [Web of Science Times Cited 3305] [SCOPUS Times Cited 5043]

[13] D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, "Learning spatiotemporal features with 3d convolutional networks," IEEE international conference on computer vision, pp. 4489-4497, Dec. 2015.
[CrossRef] [Web of Science Times Cited 5431] [SCOPUS Times Cited 7334]

[14] K. Simonyan, A. Zisserman, "Two-stream convolutional networks for action recognition in videos," Advances in neural information processing systems, pp. 568-576, 2014

[15] J. Li, X. Liu, W. Zhang, M. Zhang, J. Song, N. Sebe, "Spatio-temporal attention networks for action recognition and detection," IEEE Transactions on Multimedia, vol. 22, no. 11, pp. 2990-3001, Nov. 2020.
[CrossRef] [Web of Science Times Cited 106] [SCOPUS Times Cited 125]

[16] J. Gao, Z. Yang, K. Chen, C. Sun, R. Nevatia, "TURN TAP: Temporal unit regression network for temporal action proposals," IEEE international conference on computer vision, pp. 3628-3636, Oct. 2017.
[CrossRef] [Web of Science Times Cited 321] [SCOPUS Times Cited 346]

[17] T. Lin, X. Liu, X. Li, E. Ding, S. Wen, "BMN: Boundary-matching network for temporal action proposal generation," IEEE/CVF international conference on computer vision, pp. 3889-3898, Oct. 2019.
[CrossRef] [Web of Science Times Cited 380] [SCOPUS Times Cited 449]

[18] Z. Zhu, W. Tang, L. Wang, N. Zheng, G. Hua, "Enriching local and global contexts for temporal action localization," IEEE/CVF International Conference on Computer Vision, pp. 13516-13525, Oct. 2021.
[CrossRef] [Web of Science Times Cited 45] [SCOPUS Times Cited 74]

[19] R. Girdhar, J. Carreira, C. Doersch, A. Zisserman, "Video action transformer network," IEEE/CVF conference on computer vision and pattern recognition, pp. 244-253, Jun. 2019.
[CrossRef] [Web of Science Times Cited 443] [SCOPUS Times Cited 536]

[20] G. Bertasius, H. Wang, L. Torresani, "Is space-time attention all you need for video understanding?," The 38th International Conference on Machine Learning, pp. 813-824, 2021

[21] D. Neimark, O. Bar, M. Zohar, D. Asselmann, "Video transformer network," IEEE/CVF International Conference on Computer Vision, pp. 3163-3172, Oct. 2021.
[CrossRef] [Web of Science Times Cited 237] [SCOPUS Times Cited 217]

[22] J. Yang, X. Dong, L. Liu, C. Zhang, J. Shen, D. Yu, "Recurring the transformer for video action recognition," IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14063-14073, Jun. 2022.
[CrossRef] [Web of Science Times Cited 44] [SCOPUS Times Cited 53]

[23] T. Nagarajan, Y. Li, C. Feichtenhofer, K. Grauman, "Ego-topo: Environment affordances from egocentric video," IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 163-172, Jun. 2020.
[CrossRef] [Web of Science Times Cited 39] [SCOPUS Times Cited 76]

[24] B. Pan, H. Cai, D. A. Huang, K. H. Lee, A. Gaidon, E. Adeli, J. C. Niebles, "Spatio-temporal graph for video captioning with knowledge distillation," IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10870-10879, Jun. 2020.
[CrossRef] [Web of Science Times Cited 172] [SCOPUS Times Cited 195]

[25] X. Wang, A. Gupta, "Videos as space-time region graphs," European conference on computer vision (ECCV), pp. 399-417, Oct. 2018.
[CrossRef] [Web of Science Times Cited 417] [SCOPUS Times Cited 123]

[26] Y. Chen, B. Guo, Y. Shen, W. Wang, W. Lu, X. Suo, "Boundary graph convolutional network for temporal action detection," Image and Vision Computing, vol. 109, pp. 104144, May, 2021.
[CrossRef] [Web of Science Times Cited 11] [SCOPUS Times Cited 11]

[27] R. Zeng, W. Huang, M. Tan, Y. Rong, P. Zhao, J. Huang, C. Gan, "Graph convolutional networks for temporal action localization," IEEE/CVF International Conference on Computer Vision, pp. 7094-7103, Oct. 2019.
[CrossRef] [Web of Science Times Cited 350] [SCOPUS Times Cited 399]

[28] Z. Chen, S. Li, B. Yang, Q. Li, H. Liu, "Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition," AAAI Conference on Artificial Intelligence, pp. 1113-1122, May, 2021.
[CrossRef]

[29] L. Deng, Z. Liu, J. Wang, B. Yang, "ATT-YOLOv5-Ghost: water surface object detection in complex scenes," Journal of Real-Time Image Processing, vol. 20(5), pp. 97, Aug. 2023.
[CrossRef] [Web of Science Times Cited 5] [SCOPUS Times Cited 8]

[30] I. D. Borlea, R. E. Precup, A. B. Borlea, "Improvement of K-means cluster quality by post processing resulted clusters," Procedia Computer Science, vol. 199, pp. 63-70, Feb. 2022.
[CrossRef] [Web of Science Times Cited 75] [SCOPUS Times Cited 87]

[31] D. Protic, M. Stankovic, "XOR-based detector of different decisions on anomalies in the computer network traffic," Science and Technology, vol. 26, no. 3-4, pp. 323-338, 2023.
[CrossRef] [Web of Science Times Cited 9] [SCOPUS Times Cited 9]

[32] J. Carreira, A. Zisserman, "Quo vadis, action recognition? A new model and the kinetics dataset," IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299-6308, Jul. 2017.
[CrossRef] [Web of Science Times Cited 4965] [SCOPUS Times Cited 5925]

[33] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, N. Houlsby, "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint, 2020.
[CrossRef]

[34] T. Xiao, M. Singh, E. Mintun, T. Darrell, P. Dollar, R. Girshick, "Early convolutions help transformers see better," Advances in Neural Information Processing Systems, pp. 30392-30400, 2021

[35] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, I. Polosukhin, "Attention is all you need," Advances in neural information processing systems, pp. 5998-6008, 2017

[36] C. Lin, C. Xu, D. Luo, Y. Wang, Y. Tai, C. Wang, Y. Fu, "Learning salient boundary feature for anchor-free temporal action localization," IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3320-3329, Jun. 2021.
[CrossRef] [Web of Science Times Cited 135] [SCOPUS Times Cited 168]

[37] T. Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, "Focal Loss for Dense Object Detection," IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 2, pp. 318-327, Oct. 2017.
[CrossRef] [Web of Science Times Cited 7918] [SCOPUS Times Cited 15472]

[38] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, S. Savarese, "Generalized intersection over union: A metric and a loss for bounding box regression," IEEE/CVF conference on computer vision and pattern recognition, pp. 658-666, Jun. 2019.
[CrossRef] [Web of Science Times Cited 2907] [SCOPUS Times Cited 3875]

[39] R. Girshick, "Fast R-CNN," IEEE international conference on computer vision, pp. 1440-1448, Dec. 2015.
[CrossRef] [Web of Science Times Cited 15092] [SCOPUS Times Cited 21159]

[40] D. P. Kingma, J. Ba, "Adam: A method for stochastic optimization," arXiv preprint, 2014.
[CrossRef]

[41] N. Bodla, B. Singh, R. Chellappa, L. S. Davis, "Soft-NMS--improving object detection with one line of code," IEEE international conference on computer vision, pp. 5561-5569, Oct. 2017.
[CrossRef] [Web of Science Times Cited 1225] [SCOPUS Times Cited 1567]

[42] H. Xu, A. Das, K. Saenko, "R-C3D: Region convolutional 3D network for temporal activity detection," IEEE international conference on computer vision, pp. 5783-579, Oct. 2017.
[CrossRef] [Web of Science Times Cited 435] [SCOPUS Times Cited 574]

[43] Y. W. Chao, S. Vijayanarasimhan, B. Seybold, D. A. Ross, J. Deng, R. Sukthankar, "Rethinking the faster R-CNN architecture for temporal action localization," IEEE conference on computer vision and pattern recognition, pp. 1130-1139, Jun. 2018.
[CrossRef] [Web of Science Times Cited 462] [SCOPUS Times Cited 575]

[44] F. Long, T. Yao, Z. Qiu, X. Tian, J. Luo, T. Mei, "Gaussian temporal awareness networks for action localization," IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 344-353, Jun. 2019.
[CrossRef] [Web of Science Times Cited 237] [SCOPUS Times Cited 282]

[45] L. Yang, H. Peng, D. Zhang, J. Fu, J. Han, "Revisiting anchor mechanisms for temporal action localization," IEEE Transactions on Image Processing, vol. 29, pp. 8535-8548, Aug. 2020.
[CrossRef] [Web of Science Times Cited 118] [SCOPUS Times Cited 144]

[46] R. Su, D. Xu, L. Sheng, W. Ouyang, "PCG-TAL: Progressive cross-granularity cooperation for temporal action localization," IEEE Transactions on Image Processing, vol. 30, pp. 2103-2113, Dec. 2020.
[CrossRef] [Web of Science Times Cited 22] [SCOPUS Times Cited 23]

[47] Z. Shou, J. Chan, A. Zareian, K. Miyazawa, S. F. Chang, "Cdc: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos," IEEE conference on computer vision and pattern recognition, pp. 5734-5743, Jul. 2017.
[CrossRef] [Web of Science Times Cited 325] [SCOPUS Times Cited 450]

[48] Q. Liu, Z. Wang, "Progressive boundary refinement network for temporal action detection," AAAI Conference on Artificial Intelligence, pp. 11612-11619, Apr. 2020.
[CrossRef]

[49] X. Liu, Q. Wang, Y. Hu, X. Tang, S. Zhang, S. Bai, "End-to-end temporal action detection with transformer," IEEE Transactions on Image Processing, vol. 31, pp. 5427-5441, 2022.
[CrossRef] [Web of Science Times Cited 72] [SCOPUS Times Cited 107]

[50] M. Nawhal, G. Mori, "Activity graph transformer for temporal action localization," arXiv preprint, 2021.
[CrossRef]

References Weight

Web of Science® Citations for all references: 46,112 TCR
SCOPUS® Citations for all references: 66,640 TCR

Web of Science® Average Citations per reference: 904 ACR
SCOPUS® Average Citations per reference: 1,307 ACR

TCR = Total Citations for References / ACR = Average Citations per Reference

We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more

Citations for references updated on 2024-10-21 14:30 in 310 seconds.

Note¹: Web of Science® is a registered trademark of Clarivate Analytics.
Note²: SCOPUS® is a registered trademark of Elsevier B.V.
Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site.

Copyright ©2001-2024
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania

All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.

Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.

Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.

Menu:

Workflow Detection with Improved Phase Discriminability