2/2023 - 1 | View TOC | « Previous Article | Next Article » |
Structure-aware Heatmap and Boundary Map Regression Based Robust Face AlignmentHUANG, L. , WU, Y. |
View the paper record and citations in |
Click to see author's profile in SCOPUS, IEEE Xplore, Web of Science |
Download PDF (2,608 KB) | Citation | Downloads: 939 | Views: 1,160 |
Author keywords
distance learning, image analysis, neural network, pattern analysis, supervised learning
References keywords
vision(46), face(32), alignment(26), recognition(25), pattern(23), facial(21), landmark(20), detection(18), cvpr(16), robust(11)
Blue keywords are present in both the references section and the paper title.
About this article
Date of Publication: 2023-05-31
Volume 23, Issue 2, Year 2023, On page(s): 3 - 10
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2023.02001
Web of Science Accession Number: 001009953400001
SCOPUS ID: 85164342223
Abstract
Large head pose variations and severe occlusion are challenging problems for face alignment. In this paper, we propose a Structure-aware Heatmap and Boundary map Regression Network (SHBRN), consisting of a rough estimation network and a refinement network, to accounting for the structural geometry of faces via the boundary map. Specifically, in the rough estimation network, a structure-aware module is designed to capture low-level features rich in structure information, and both heatmaps and boundary maps are predicted by the hourglass network. In this way, the network can not only estimate the initial location of keypoints, but also implicitly take the geometric structure into consideration. In the refinement network, the boundary maps and heatmaps are fused with the features extracted in the rough stage via attention mechanism. As a result, the network can combine the global information with local appearance for obtaining complete face representations, and also optimize the spatial relationship of different keypoints. Our proposed network is superior to the existing methods on 300W, COFW, and AFLW datasets, especially for those challenging situations, which proves the effectiveness and robustness of our model. |
References | | | Cited By «-- Click to see who has cited this paper |
[1] F. Liu, D. Zeng, Q. Zhao, X. Liu, "Joint face alignment and 3D face reconstruction," in European Conference on Computer Vision, 2016, pp. 545-560. [CrossRef] [Web of Science Times Cited 86] [SCOPUS Times Cited 102] [2] R. Weng, J. Lu, Y. Tan, "Robust point set matching for partial face recognition," IEEE Transactions on Image Processing, vol. 25, no. 3, pp. 1163-1176, 2016. [CrossRef] [Web of Science Times Cited 93] [SCOPUS Times Cited 111] [3] A. Asthana, S. Zafeiriou, S. Cheng, and M. Pantic, "Robust discriminative response map fitting with constrained local models," in Computer Vision and Pattern Recognition, 2013, pp. 3444-3451. [CrossRef] [Web of Science Times Cited 377] [SCOPUS Times Cited 522] [4] M. Jeong, B. C. Ko, S. Kwak, J. Nam, "Driver facial landmark detection in real driving situations," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2753-2767, 2017. [CrossRef] [Web of Science Times Cited 14] [SCOPUS Times Cited 23] [5] T. F. Cootes, C. J. Taylor, D. H. Cooper, J. Graham, "Active shape models-their training and application," Computer Vision and Image Understanding, vol. 61, no. 1, pp. 38-59, 1995. [CrossRef] [Web of Science Times Cited 5017] [SCOPUS Times Cited 6271] [6] T. F. Cootes, G. J. Edwards, C. J. Taylor, "Active appearance models," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, 2001. [CrossRef] [Web of Science Times Cited 3511] [SCOPUS Times Cited 4216] [7] D. Cristinacce, T. F. Cootes, "Feature detection and tracking with constrained local models," in British Machine Vision Conference, 2006, pp. 95.1-95.10. [CrossRef] [8] H. Yang, I. Patras, "Privileged information-based conditional structured output regression forest for facial point detection," IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, no. 9, pp. 1507-1520, 2015. [CrossRef] [Web of Science Times Cited 7] [SCOPUS Times Cited 11] [9] X. Xiong, F. D. Torre, "Supervised descent method and its applications to face alignment," in Computer Vision and Pattern Recognition, 2013, pp. 532-539. [CrossRef] [Web of Science Times Cited 1314] [SCOPUS Times Cited 1784] [10] X. Cao, Y. Wei, F. Wen, J. Sun, "Face alignment by explicit shape regression," International Journal of Computer Vision, vol. 107, 2014, pp. 177-190. [CrossRef] [Web of Science Times Cited 499] [SCOPUS Times Cited 640] [11] Y. Wu, Q. Ji, "Constrained joint cascade regression framework for simultaneous facial action unit recognition and facial landmark detection," in Computer Vision and Pattern Recognition, 2016, pp. 3400-3408. [CrossRef] [Web of Science Times Cited 34] [SCOPUS Times Cited 72] [12] S. Zhu, C. Li, C. L. Chen, X. Tang, "Face alignment by coarse-to-fine shape searching, in Computer Vision and Pattern Recognition," 2015, pp. 4998-5006. [CrossRef] [SCOPUS Times Cited 480] [13] K. Zhang, Z. Zhang, Z. Li, Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, 2016. [CrossRef] [Web of Science Times Cited 1469] [SCOPUS Times Cited 4367] [14] G. Trigeorgis, P. Snape, M. A. Nicolaou, E. Antonakos, S. Zafeiriou, "Mnemonic descent method: A recurrent process applied for end-to-end face alignment," in Computer Vision and Pattern Recognition, 2016, pp. 4177-4187. [CrossRef] [Web of Science Times Cited 213] [SCOPUS Times Cited 301] [15] S. Xiao, J. Feng, J. Xing, H. Lai, S. Yan, A. Kassim, "Robust facial landmark detection via recurrent attentive-refinement networks," in European Conference on Computer Vision, 2016, pp. 57-72. [CrossRef] [Web of Science Times Cited 147] [SCOPUS Times Cited 157] [16] Y. Liu, A. Jourabloo, W. Ren, X. Liu, "Dense face alignment," in IEEE Conference on International Conference on Computer Vision Workshops (ICCVW), 2017, pp. 1619-1628. [CrossRef] [17] J. Lv, X. Shao, J. Xing, C. Cheng, X. Zhou, "A deep regression architecture with two-stage re-initialization for high performance facial landmark detection," in Computer Vision and Pattern Recognition, 2017, pp. 3317-3326. [CrossRef] [Web of Science Times Cited 157] [SCOPUS Times Cited 208] [18] D. Merget, M. Rock, G. Rigoll, "Robust facial landmark detection via a fully-convolutional local-global context network," in Computer Vision and Pattern Recognition, 2018, pp. 781-790. [CrossRef] [Web of Science Times Cited 61] [SCOPUS Times Cited 78] [19] A. Bulat, G. Tzimiropoulos, "Convolutional aggregation of local evidence for large pose face alignment," in British Machine Vision Conference, 2016, pp. 1-12. [CrossRef] [SCOPUS Times Cited 51] [20] J. Yang, Q. Liu, K. Zhang, "Stacked hourglass network for robust facial landmark localisation," in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 79-87. [CrossRef] [Web of Science Times Cited 148] [SCOPUS Times Cited 217] [21] W. Wu, C. Qian, S. Yang, Q. Wang, Y. Cai, Q. Zhou, "Look at boundary: A boundary-aware face alignment algorithm," in Computer Vision and Pattern Recognition, 2018, pp. 2129-2138. [CrossRef] [22] Z. Tang, X. Peng, S. Geng, L. Wu, S. Zhang, D. Metaxas, "Quantized densely connected u-nets for efficient landmark localization," in European Conference on Computer Vision, 2018, pp. 339-354. [CrossRef] [23] X. Wang, L. Bo, F. Li, "Adaptive wing loss for robust face alignment via heatmap regression," in International Conference on Computer Vision, 2019, pp. 6971-6981. [CrossRef] [Web of Science Times Cited 209] [SCOPUS Times Cited 187] [24] H. J. Lee, S. T. Kim, H. Lee, Y. M. Ro, "Lightweight and effective facial landmark detection using adversarial learning with face geometric map generative network," IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 3, pp. 771-780, 2019. [CrossRef] [Web of Science Times Cited 15] [SCOPUS Times Cited 15] [25] M. S. Shakeel, Y. Zhang, X. Wang, W. Kang, A. Mahmood, "Multi-scale attention guided network for end-to-end face alignment and recognition," Journal of Visual Communication and Image Representation, vol. 88, p. 103628, 2022. [CrossRef] [Web of Science Times Cited 4] [SCOPUS Times Cited 4] [26] Z. Shao, Z. Liu, J. Cai, L. Ma, "JAA-Net: Joint facial action unit detection and face alignment via adaptive attention," International Journal of Computer Vision, vol. 129, 321-340, 2021. [CrossRef] [Web of Science Times Cited 57] [SCOPUS Times Cited 82] [27] Q. Wang, T. Wu, H. Zheng, G. Guo, "Hierarchical pyramid diverse attention networks for face recognition," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8326-8335. [CrossRef] [SCOPUS Times Cited 70] [28] Y. Li, K. Guo, Y. Lu, L. Liu, "Cropping and attention based approach for masked face recognition," Applied Intelligence, 2021, pp. 3012-3025. [CrossRef] [Web of Science Times Cited 91] [SCOPUS Times Cited 147] [29] X. Liu and Q. Xu. "Adaptive attention-based high-level semantic introduction for image caption," ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 16, no. 4, pp. 128:1-128:22, 2020. [CrossRef] [Web of Science Times Cited 12] [SCOPUS Times Cited 17] [30] X. Liu, Y. Ma, Z. Shi, J. Chen. "GridDehazenet: Attention-based multi-scale network for image dehazing," IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp. 7313-7322. [CrossRef] [Web of Science Times Cited 656] [SCOPUS Times Cited 662] [31] X. P. Burgos-Artizzu, P. Perona, P. Dollar, "Robust face landmark estimation under occlusion," in IEEE Conference on International Conference on Computer Vision, 2013, pp. 1513-1520. [CrossRef] [Web of Science Times Cited 491] [SCOPUS Times Cited 659] [32] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, M. Pantic, "300 faces in-the-wild challenge: The first facial landmark localization challenge," in IEEE Conference on International Conference on Computer Vision Workshops (ICCVW), 2013, pp. 397-403. [CrossRef] [Web of Science Times Cited 659] [SCOPUS Times Cited 947] [33] M. Koestinger, P. Wohlhart, P. M. Roth, H. Bischof, "Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization," in IEEE Conference on International Conference on Computer Vision Workshops (ICCVW), 2011, pp. 2144-2151. [CrossRef] [SCOPUS Times Cited 809] [34] J. Zhang, S. Shan, M. Kan, X. Chen, "Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment," in European Conference on Computer Vision, 2014, pp. 1-16. [CrossRef] [Web of Science Times Cited 281] [SCOPUS Times Cited 413] [35] Z. Zhang, P. Luo, C. C. Loy, X. Tang, "Facial landmark detection by deep multi-task learning," in European Conference on Computer Vision, 2014, pp. 94-108. [CrossRef] [SCOPUS Times Cited 962] [36] Y. Wu, T. Hassner, K. Kim, G. Medioni, P. Natarajan, "Facial landmark detection with tweaked convolutional neural networks," IEEE Transations on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 3067-3074, 2017. [CrossRef] [Web of Science Times Cited 66] [SCOPUS Times Cited 122] [37] H. Lai, S. Xiao, Y. Pan, Z. Cui, J. Feng, et al., "Deep recurrent regression for facial landmark detection," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 5, pp. 1144-1157, 2016. [CrossRef] [Web of Science Times Cited 30] [SCOPUS Times Cited 39] [38] L. Liu, Q. Wang, W. Zhu, H. Mo, T. Wang, et al., "A face alignment accelerator based on optimized coarse-to-fine shape searching," IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 8, pp. 2467-2481, 2018. [CrossRef] [Web of Science Times Cited 6] [SCOPUS Times Cited 6] [39] M. Kowalski, J. Naruniec, T. Trzcinski, "Deep alignment network: A convolutional neural network for robust face alignment," in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 88-97. [CrossRef] [40] A. Kumar, T. K. Marks, W. Mou, Y. Wang, M. Jones, et al., "LUVLi Face Alignment: Estimating Landmarks' Location, Uncertainty, and Visibility Likelihood," in Computer Vision and Pattern Recognition, 2020, pp. 8236-8246. [CrossRef] [41] X. Zou, S. Zhong, L. Yan, X. Zhao, J. Zhou, Y. Wu, "Learning robust facial landmark detection via hierarchical structured ensemble," in International Conference on Computer Vision, 2019, pp. 141-150. [CrossRef] [Web of Science Times Cited 39] [SCOPUS Times Cited 53] [42] D. Chen, G. Hua, F. Wen, J. Sun, "Supervised transformer network for efficient face detection," in European Conference on Computer Vision, 2016, pp. 122-138. [CrossRef] [43] L. Ke, M. Chang, H. Qi, S. Lyu, "Multi-scale structure-aware network for human pose estimation," in European Conference on Computer Vision, 2018, pp. 713-728. [CrossRef] [44] W. Yang, S. Li, W. Ouyang, H. Li, X. Wang, "Learning feature pyramids for human pose estimation," in International Conference on Computer Vision, 2017, pp. 1281-1290. [CrossRef] [45] A. Bulat, G. Tzimiropoulos, "Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources," in International Conference on Computer Vision, 2017, pp. 3706-3714. [CrossRef] [Web of Science Times Cited 232] [SCOPUS Times Cited 168] [46] V. Le, J. Brandt, Z. Lin, L. Bourdev, T. S. Huang, "Interactive facial feature localization," in European Conference on Computer Vision, 2012, pp. 679-692. [CrossRef] [SCOPUS Times Cited 697] [47] F. Milletari, N. Navab, S. Ahmadi, "V-net: Fully convolutional neural networks for volumetric medical image segmentation," in International Conference on 3D Vision (3DV), 2016, pp. 565-571. [CrossRef] [48] S. Zhu, C. Li, C. Loy, X. Tang, "Unconstrained face alignment via cascaded compositional learning," in Computer Vision and Pattern Recognition, 2016, pp. 3409-3417. [CrossRef] [Web of Science Times Cited 105] [SCOPUS Times Cited 156] [49] A. Bulat, G. Tzimiropoulos, "How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks)," in IEEE Conference on International Conference on Computer Vision, 2017, pp. 1021-1030. [CrossRef] [Web of Science Times Cited 733] [SCOPUS Times Cited 1136] [50] R. Valle, J. M. Buenaposada, A. Valdes, L. Baumela, "A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment," in European Conference on Computer Vision, 2018, pp. 585-601. [CrossRef] [Web of Science Times Cited 59] [SCOPUS Times Cited 25] [51] X. Zhu, Z. Lei, X. Liu, H. Shi, S. Z. Li, "Face alignment across large poses: A 3d solution," in Computer Vision and Pattern Recognition, 2016, pp. 146-155. [CrossRef] [Web of Science Times Cited 668] [SCOPUS Times Cited 935] [52] S. Honari, P. Molchanov, S. Tyree, P. Vincent, C. Pal, J. Kautz, "Improving landmark localization with semi-supervised learning," in Computer Vision and Pattern Recognition, 2018, pp. 1546-1555. [CrossRef] [Web of Science Times Cited 98] [SCOPUS Times Cited 145] [53] M. Zhu, D. Shi, M. Zheng, M. Sadiq, "Robust facial landmark detection via occlusion-adaptive deep networks," in Computer Vision and Pattern Recognition, 2019, pp. 2486-2496. [CrossRef] [Web of Science Times Cited 90] [SCOPUS Times Cited 109] [54] X. Miao, X. Zhen, X. Liu, C. Deng, V. Athitsos, H. Huang, "Direct shape regression networks for end-to-end face alignment," in Computer Vision and Pattern Recognition, 2018, pp. 5040-5049. [CrossRef] [Web of Science Times Cited 77] [SCOPUS Times Cited 107] [55] A. Kumar, R. Chellappa, "Disentangling 3d pose in a dendritic CNN for unconstrained 2d face alignment," in Computer Vision and Pattern Recognition, 2018, pp. 430-439. [CrossRef] [Web of Science Times Cited 98] [SCOPUS Times Cited 126] [56] X. Dong, Y. Yan, W. Ouyang, Y. Yang, "Style aggregated network for facial landmark detection," in Computer Vision and Pattern Recognition, 2018, pp. 379-388. [CrossRef] [57] X. Dong, Y. Yang, S. Wei, X. Weng, Y. Sheikh, S. Yu, "Supervision by registration and triangulation for landmark detection," IEEE Transations on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3681-3694, 2020. [CrossRef] [Web of Science Times Cited 22] [SCOPUS Times Cited 25] [58] G. Tzimiropoulos, M. Pantic, "Optimization problems for fast AAM fitting in-the-wild," in International Conference on Computer Vision, 2013, pp. 593-600. [CrossRef] [Web of Science Times Cited 159] [SCOPUS Times Cited 211] [59] Q. Liu, J. Deng, J. Yang, G. Liu, D. Tao, "Adaptive cascade regression model for robust face alignment," IEEE Transations on Image Processing, vol. 26, no. 2, pp. 797-807, 2016. [CrossRef] [Web of Science Times Cited 23] [SCOPUS Times Cited 28] [60] G. Ghiasi, C. C. Fowlkes, "Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model," in Computer Vision and Pattern Recognition, 2014, pp. 2385-2392. [CrossRef] [Web of Science Times Cited 94] [SCOPUS Times Cited 135] [61] S. Ren, X. Cao, Y. Wei, J. Sun, "Face alignment via regressing local binary features," IEEE Transations on Image Processing, vol. 25, no. 3, pp. 1233-1245, 2016. [CrossRef] [Web of Science Times Cited 42] [SCOPUS Times Cited 82] Web of Science® Citations for all references: 18,263 TCR SCOPUS® Citations for all references: 28,890 TCR Web of Science® Average Citations per reference: 295 ACR SCOPUS® Average Citations per reference: 466 ACR TCR = Total Citations for References / ACR = Average Citations per Reference We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more Citations for references updated on 2024-09-30 11:40 in 407 seconds. Note1: Web of Science® is a registered trademark of Clarivate Analytics. Note2: SCOPUS® is a registered trademark of Elsevier B.V. Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site. |
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania
All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.
Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.
Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.