2/2023 - 1 | View TOC | « Previous Article | Next Article » |
Structure-aware Heatmap and Boundary Map Regression Based Robust Face AlignmentHUANG, L. , WU, Y. |
Extra paper information in |
Click to see author's profile in SCOPUS, IEEE Xplore, Web of Science |
Download PDF (2,608 KB) | Citation | Downloads: 979 | Views: 1,945 |
Author keywords
distance learning, image analysis, neural network, pattern analysis, supervised learning
References keywords
vision(46), face(32), alignment(26), recognition(25), pattern(23), facial(21), landmark(20), detection(18), cvpr(16), robust(11)
Blue keywords are present in both the references section and the paper title.
About this article
Date of Publication: 2023-05-31
Volume 23, Issue 2, Year 2023, On page(s): 3 - 10
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2023.02001
Web of Science Accession Number: 001009953400001
SCOPUS ID: 85164342223
Abstract
Large head pose variations and severe occlusion are challenging problems for face alignment. In this paper, we propose a Structure-aware Heatmap and Boundary map Regression Network (SHBRN), consisting of a rough estimation network and a refinement network, to accounting for the structural geometry of faces via the boundary map. Specifically, in the rough estimation network, a structure-aware module is designed to capture low-level features rich in structure information, and both heatmaps and boundary maps are predicted by the hourglass network. In this way, the network can not only estimate the initial location of keypoints, but also implicitly take the geometric structure into consideration. In the refinement network, the boundary maps and heatmaps are fused with the features extracted in the rough stage via attention mechanism. As a result, the network can combine the global information with local appearance for obtaining complete face representations, and also optimize the spatial relationship of different keypoints. Our proposed network is superior to the existing methods on 300W, COFW, and AFLW datasets, especially for those challenging situations, which proves the effectiveness and robustness of our model. |
References | | | Cited By «-- Click to see who has cited this paper |
[1] F. Liu, D. Zeng, Q. Zhao, X. Liu, "Joint face alignment and 3D face reconstruction," in European Conference on Computer Vision, 2016, pp. 545-560. [CrossRef] [Web of Science Times Cited 88] [SCOPUS Times Cited 104] [2] R. Weng, J. Lu, Y. Tan, "Robust point set matching for partial face recognition," IEEE Transactions on Image Processing, vol. 25, no. 3, pp. 1163-1176, 2016. [CrossRef] [Web of Science Times Cited 94] [SCOPUS Times Cited 112] [3] A. Asthana, S. Zafeiriou, S. Cheng, and M. Pantic, "Robust discriminative response map fitting with constrained local models," in Computer Vision and Pattern Recognition, 2013, pp. 3444-3451. [CrossRef] [Web of Science Times Cited 378] [SCOPUS Times Cited 523] [4] M. Jeong, B. C. Ko, S. Kwak, J. Nam, "Driver facial landmark detection in real driving situations," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2753-2767, 2017. [CrossRef] [Web of Science Times Cited 14] [SCOPUS Times Cited 23] [5] T. F. Cootes, C. J. Taylor, D. H. Cooper, J. Graham, "Active shape models-their training and application," Computer Vision and Image Understanding, vol. 61, no. 1, pp. 38-59, 1995. [CrossRef] [Web of Science Times Cited 5028] [SCOPUS Times Cited 6287] [6] T. F. Cootes, G. J. Edwards, C. J. Taylor, "Active appearance models," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, 2001. [CrossRef] [Web of Science Times Cited 3520] [SCOPUS Times Cited 4233] [7] D. Cristinacce, T. F. Cootes, "Feature detection and tracking with constrained local models," in British Machine Vision Conference, 2006, pp. 95.1-95.10. [CrossRef] [8] H. Yang, I. Patras, "Privileged information-based conditional structured output regression forest for facial point detection," IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, no. 9, pp. 1507-1520, 2015. [CrossRef] [Web of Science Times Cited 7] [SCOPUS Times Cited 11] [9] X. Xiong, F. D. Torre, "Supervised descent method and its applications to face alignment," in Computer Vision and Pattern Recognition, 2013, pp. 532-539. [CrossRef] [Web of Science Times Cited 1323] [SCOPUS Times Cited 1791] [10] X. Cao, Y. Wei, F. Wen, J. Sun, "Face alignment by explicit shape regression," International Journal of Computer Vision, vol. 107, 2014, pp. 177-190. [CrossRef] [Web of Science Times Cited 504] [SCOPUS Times Cited 647] [11] Y. Wu, Q. Ji, "Constrained joint cascade regression framework for simultaneous facial action unit recognition and facial landmark detection," in Computer Vision and Pattern Recognition, 2016, pp. 3400-3408. [CrossRef] [Web of Science Times Cited 34] [SCOPUS Times Cited 72] [12] S. Zhu, C. Li, C. L. Chen, X. Tang, "Face alignment by coarse-to-fine shape searching, in Computer Vision and Pattern Recognition," 2015, pp. 4998-5006. [CrossRef] [SCOPUS Times Cited 482] [13] K. Zhang, Z. Zhang, Z. Li, Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, 2016. [CrossRef] [Web of Science Times Cited 1499] [SCOPUS Times Cited 4460] [14] G. Trigeorgis, P. Snape, M. A. Nicolaou, E. Antonakos, S. Zafeiriou, "Mnemonic descent method: A recurrent process applied for end-to-end face alignment," in Computer Vision and Pattern Recognition, 2016, pp. 4177-4187. [CrossRef] [Web of Science Times Cited 217] [SCOPUS Times Cited 303] [15] S. Xiao, J. Feng, J. Xing, H. Lai, S. Yan, A. Kassim, "Robust facial landmark detection via recurrent attentive-refinement networks," in European Conference on Computer Vision, 2016, pp. 57-72. [CrossRef] [Web of Science Times Cited 150] [SCOPUS Times Cited 158] [16] Y. Liu, A. Jourabloo, W. Ren, X. Liu, "Dense face alignment," in IEEE Conference on International Conference on Computer Vision Workshops (ICCVW), 2017, pp. 1619-1628. [CrossRef] [17] J. Lv, X. Shao, J. Xing, C. Cheng, X. Zhou, "A deep regression architecture with two-stage re-initialization for high performance facial landmark detection," in Computer Vision and Pattern Recognition, 2017, pp. 3317-3326. [CrossRef] [Web of Science Times Cited 159] [SCOPUS Times Cited 211] [18] D. Merget, M. Rock, G. Rigoll, "Robust facial landmark detection via a fully-convolutional local-global context network," in Computer Vision and Pattern Recognition, 2018, pp. 781-790. [CrossRef] [Web of Science Times Cited 61] [SCOPUS Times Cited 78] [19] A. Bulat, G. Tzimiropoulos, "Convolutional aggregation of local evidence for large pose face alignment," in British Machine Vision Conference, 2016, pp. 1-12. [CrossRef] [SCOPUS Times Cited 51] [20] J. Yang, Q. Liu, K. Zhang, "Stacked hourglass network for robust facial landmark localisation," in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 79-87. [CrossRef] [Web of Science Times Cited 150] [SCOPUS Times Cited 220] [21] W. Wu, C. Qian, S. Yang, Q. Wang, Y. Cai, Q. Zhou, "Look at boundary: A boundary-aware face alignment algorithm," in Computer Vision and Pattern Recognition, 2018, pp. 2129-2138. [CrossRef] [22] Z. Tang, X. Peng, S. Geng, L. Wu, S. Zhang, D. Metaxas, "Quantized densely connected u-nets for efficient landmark localization," in European Conference on Computer Vision, 2018, pp. 339-354. [CrossRef] [23] X. Wang, L. Bo, F. Li, "Adaptive wing loss for robust face alignment via heatmap regression," in International Conference on Computer Vision, 2019, pp. 6971-6981. [CrossRef] [Web of Science Times Cited 222] [SCOPUS Times Cited 197] [24] H. J. Lee, S. T. Kim, H. Lee, Y. M. Ro, "Lightweight and effective facial landmark detection using adversarial learning with face geometric map generative network," IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 3, pp. 771-780, 2019. [CrossRef] [Web of Science Times Cited 15] [SCOPUS Times Cited 16] [25] M. S. Shakeel, Y. Zhang, X. Wang, W. Kang, A. Mahmood, "Multi-scale attention guided network for end-to-end face alignment and recognition," Journal of Visual Communication and Image Representation, vol. 88, p. 103628, 2022. [CrossRef] [Web of Science Times Cited 5] [SCOPUS Times Cited 5] [26] Z. Shao, Z. Liu, J. Cai, L. Ma, "JAA-Net: Joint facial action unit detection and face alignment via adaptive attention," International Journal of Computer Vision, vol. 129, 321-340, 2021. [CrossRef] [Web of Science Times Cited 62] [SCOPUS Times Cited 91] [27] Q. Wang, T. Wu, H. Zheng, G. Guo, "Hierarchical pyramid diverse attention networks for face recognition," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8326-8335. [CrossRef] [Web of Science Times Cited 41] [SCOPUS Times Cited 72] [28] Y. Li, K. Guo, Y. Lu, L. Liu, "Cropping and attention based approach for masked face recognition," Applied Intelligence, 2021, pp. 3012-3025. [CrossRef] [Web of Science Times Cited 95] [SCOPUS Times Cited 151] [29] X. Liu and Q. Xu. "Adaptive attention-based high-level semantic introduction for image caption," ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 16, no. 4, pp. 128:1-128:22, 2020. [CrossRef] [Web of Science Times Cited 12] [SCOPUS Times Cited 17] [30] X. Liu, Y. Ma, Z. Shi, J. Chen. "GridDehazenet: Attention-based multi-scale network for image dehazing," IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp. 7313-7322. [CrossRef] [Web of Science Times Cited 682] [SCOPUS Times Cited 698] [31] X. P. Burgos-Artizzu, P. Perona, P. Dollar, "Robust face landmark estimation under occlusion," in IEEE Conference on International Conference on Computer Vision, 2013, pp. 1513-1520. [CrossRef] [Web of Science Times Cited 496] [SCOPUS Times Cited 663] [32] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, M. Pantic, "300 faces in-the-wild challenge: The first facial landmark localization challenge," in IEEE Conference on International Conference on Computer Vision Workshops (ICCVW), 2013, pp. 397-403. [CrossRef] [Web of Science Times Cited 670] [SCOPUS Times Cited 961] [33] M. Koestinger, P. Wohlhart, P. M. Roth, H. Bischof, "Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization," in IEEE Conference on International Conference on Computer Vision Workshops (ICCVW), 2011, pp. 2144-2151. [CrossRef] [SCOPUS Times Cited 822] [34] J. Zhang, S. Shan, M. Kan, X. Chen, "Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment," in European Conference on Computer Vision, 2014, pp. 1-16. [CrossRef] [Web of Science Times Cited 282] [SCOPUS Times Cited 414] [35] Z. Zhang, P. Luo, C. C. Loy, X. Tang, "Facial landmark detection by deep multi-task learning," in European Conference on Computer Vision, 2014, pp. 94-108. [CrossRef] [SCOPUS Times Cited 976] [36] Y. Wu, T. Hassner, K. Kim, G. Medioni, P. Natarajan, "Facial landmark detection with tweaked convolutional neural networks," IEEE Transations on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 3067-3074, 2017. [CrossRef] [Web of Science Times Cited 66] [SCOPUS Times Cited 125] [37] H. Lai, S. Xiao, Y. Pan, Z. Cui, J. Feng, et al., "Deep recurrent regression for facial landmark detection," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 5, pp. 1144-1157, 2016. [CrossRef] [Web of Science Times Cited 30] [SCOPUS Times Cited 39] [38] L. Liu, Q. Wang, W. Zhu, H. Mo, T. Wang, et al., "A face alignment accelerator based on optimized coarse-to-fine shape searching," IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 8, pp. 2467-2481, 2018. [CrossRef] [Web of Science Times Cited 6] [SCOPUS Times Cited 6] [39] M. Kowalski, J. Naruniec, T. Trzcinski, "Deep alignment network: A convolutional neural network for robust face alignment," in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 88-97. [CrossRef] [40] A. Kumar, T. K. Marks, W. Mou, Y. Wang, M. Jones, et al., "LUVLi Face Alignment: Estimating Landmarks' Location, Uncertainty, and Visibility Likelihood," in Computer Vision and Pattern Recognition, 2020, pp. 8236-8246. [CrossRef] [41] X. Zou, S. Zhong, L. Yan, X. Zhao, J. Zhou, Y. Wu, "Learning robust facial landmark detection via hierarchical structured ensemble," in International Conference on Computer Vision, 2019, pp. 141-150. [CrossRef] [Web of Science Times Cited 43] [SCOPUS Times Cited 58] [42] D. Chen, G. Hua, F. Wen, J. Sun, "Supervised transformer network for efficient face detection," in European Conference on Computer Vision, 2016, pp. 122-138. [CrossRef] [43] L. Ke, M. Chang, H. Qi, S. Lyu, "Multi-scale structure-aware network for human pose estimation," in European Conference on Computer Vision, 2018, pp. 713-728. [CrossRef] [44] W. Yang, S. Li, W. Ouyang, H. Li, X. Wang, "Learning feature pyramids for human pose estimation," in International Conference on Computer Vision, 2017, pp. 1281-1290. [CrossRef] [45] A. Bulat, G. Tzimiropoulos, "Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources," in International Conference on Computer Vision, 2017, pp. 3706-3714. [CrossRef] [Web of Science Times Cited 236] [SCOPUS Times Cited 169] [46] V. Le, J. Brandt, Z. Lin, L. Bourdev, T. S. Huang, "Interactive facial feature localization," in European Conference on Computer Vision, 2012, pp. 679-692. [CrossRef] [SCOPUS Times Cited 703] [47] F. Milletari, N. Navab, S. Ahmadi, "V-net: Fully convolutional neural networks for volumetric medical image segmentation," in International Conference on 3D Vision (3DV), 2016, pp. 565-571. [CrossRef] [48] S. Zhu, C. Li, C. Loy, X. Tang, "Unconstrained face alignment via cascaded compositional learning," in Computer Vision and Pattern Recognition, 2016, pp. 3409-3417. [CrossRef] [Web of Science Times Cited 107] [SCOPUS Times Cited 156] [49] A. Bulat, G. Tzimiropoulos, "How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks)," in IEEE Conference on International Conference on Computer Vision, 2017, pp. 1021-1030. [CrossRef] [Web of Science Times Cited 749] [SCOPUS Times Cited 1163] [50] R. Valle, J. M. Buenaposada, A. Valdes, L. Baumela, "A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment," in European Conference on Computer Vision, 2018, pp. 585-601. [CrossRef] [Web of Science Times Cited 61] [SCOPUS Times Cited 25] [51] X. Zhu, Z. Lei, X. Liu, H. Shi, S. Z. Li, "Face alignment across large poses: A 3d solution," in Computer Vision and Pattern Recognition, 2016, pp. 146-155. [CrossRef] [Web of Science Times Cited 679] [SCOPUS Times Cited 948] [52] S. Honari, P. Molchanov, S. Tyree, P. Vincent, C. Pal, J. Kautz, "Improving landmark localization with semi-supervised learning," in Computer Vision and Pattern Recognition, 2018, pp. 1546-1555. [CrossRef] [Web of Science Times Cited 100] [SCOPUS Times Cited 146] [53] M. Zhu, D. Shi, M. Zheng, M. Sadiq, "Robust facial landmark detection via occlusion-adaptive deep networks," in Computer Vision and Pattern Recognition, 2019, pp. 2486-2496. [CrossRef] [Web of Science Times Cited 92] [SCOPUS Times Cited 111] [54] X. Miao, X. Zhen, X. Liu, C. Deng, V. Athitsos, H. Huang, "Direct shape regression networks for end-to-end face alignment," in Computer Vision and Pattern Recognition, 2018, pp. 5040-5049. [CrossRef] [Web of Science Times Cited 77] [SCOPUS Times Cited 107] [55] A. Kumar, R. Chellappa, "Disentangling 3d pose in a dendritic CNN for unconstrained 2d face alignment," in Computer Vision and Pattern Recognition, 2018, pp. 430-439. [CrossRef] [Web of Science Times Cited 98] [SCOPUS Times Cited 126] [56] X. Dong, Y. Yan, W. Ouyang, Y. Yang, "Style aggregated network for facial landmark detection," in Computer Vision and Pattern Recognition, 2018, pp. 379-388. [CrossRef] [57] X. Dong, Y. Yang, S. Wei, X. Weng, Y. Sheikh, S. Yu, "Supervision by registration and triangulation for landmark detection," IEEE Transations on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3681-3694, 2020. [CrossRef] [Web of Science Times Cited 22] [SCOPUS Times Cited 25] [58] G. Tzimiropoulos, M. Pantic, "Optimization problems for fast AAM fitting in-the-wild," in International Conference on Computer Vision, 2013, pp. 593-600. [CrossRef] [Web of Science Times Cited 160] [SCOPUS Times Cited 211] [59] Q. Liu, J. Deng, J. Yang, G. Liu, D. Tao, "Adaptive cascade regression model for robust face alignment," IEEE Transations on Image Processing, vol. 26, no. 2, pp. 797-807, 2016. [CrossRef] [Web of Science Times Cited 23] [SCOPUS Times Cited 28] [60] G. Ghiasi, C. C. Fowlkes, "Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model," in Computer Vision and Pattern Recognition, 2014, pp. 2385-2392. [CrossRef] [Web of Science Times Cited 97] [SCOPUS Times Cited 136] [61] S. Ren, X. Cao, Y. Wei, J. Sun, "Face alignment via regressing local binary features," IEEE Transations on Image Processing, vol. 25, no. 3, pp. 1233-1245, 2016. [CrossRef] [Web of Science Times Cited 42] [SCOPUS Times Cited 82] Web of Science® Citations for all references: 18,496 TCR SCOPUS® Citations for all references: 29,213 TCR Web of Science® Average Citations per reference: 298 ACR SCOPUS® Average Citations per reference: 471 ACR TCR = Total Citations for References / ACR = Average Citations per Reference We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more Citations for references updated on 2024-11-14 09:52 in 413 seconds. Note1: Web of Science® is a registered trademark of Clarivate Analytics. Note2: SCOPUS® is a registered trademark of Elsevier B.V. Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site. |
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania
All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.
Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.
Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.