2/2023 - 1 | View TOC | « Previous Article | Next Article » |
Structure-aware Heatmap and Boundary Map Regression Based Robust Face AlignmentHUANG, L.![]() ![]() ![]() ![]() ![]() ![]() |
Extra paper information in ![]() ![]() ![]() |
Click to see author's profile in ![]() ![]() ![]() |
Download PDF ![]() |
Author keywords
distance learning, image analysis, neural network, pattern analysis, supervised learning
References keywords
vision(46), face(32), alignment(26), recognition(25), pattern(23), facial(21), landmark(20), detection(18), cvpr(16), robust(11)
Blue keywords are present in both the references section and the paper title.
About this article
Date of Publication: 2023-05-31
Volume 23, Issue 2, Year 2023, On page(s): 3 - 10
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2023.02001
Web of Science Accession Number: 001009953400001
SCOPUS ID: 85164342223
Abstract
Large head pose variations and severe occlusion are challenging problems for face alignment. In this paper, we propose a Structure-aware Heatmap and Boundary map Regression Network (SHBRN), consisting of a rough estimation network and a refinement network, to accounting for the structural geometry of faces via the boundary map. Specifically, in the rough estimation network, a structure-aware module is designed to capture low-level features rich in structure information, and both heatmaps and boundary maps are predicted by the hourglass network. In this way, the network can not only estimate the initial location of keypoints, but also implicitly take the geometric structure into consideration. In the refinement network, the boundary maps and heatmaps are fused with the features extracted in the rough stage via attention mechanism. As a result, the network can combine the global information with local appearance for obtaining complete face representations, and also optimize the spatial relationship of different keypoints. Our proposed network is superior to the existing methods on 300W, COFW, and AFLW datasets, especially for those challenging situations, which proves the effectiveness and robustness of our model. |
References | | | Cited By «-- Click to see who has cited this paper |
[1] F. Liu, D. Zeng, Q. Zhao, X. Liu, "Joint face alignment and 3D face reconstruction," in European Conference on Computer Vision, 2016, pp. 545-560. [CrossRef] [Web of Science Times Cited 92] [SCOPUS Times Cited 104] [2] R. Weng, J. Lu, Y. Tan, "Robust point set matching for partial face recognition," IEEE Transactions on Image Processing, vol. 25, no. 3, pp. 1163-1176, 2016. [CrossRef] [Web of Science Times Cited 96] [SCOPUS Times Cited 114] [3] A. Asthana, S. Zafeiriou, S. Cheng, and M. Pantic, "Robust discriminative response map fitting with constrained local models," in Computer Vision and Pattern Recognition, 2013, pp. 3444-3451. [CrossRef] [Web of Science Times Cited 382] [SCOPUS Times Cited 528] [4] M. Jeong, B. C. Ko, S. Kwak, J. Nam, "Driver facial landmark detection in real driving situations," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2753-2767, 2017. [CrossRef] [Web of Science Times Cited 15] [SCOPUS Times Cited 25] [5] T. F. Cootes, C. J. Taylor, D. H. Cooper, J. Graham, "Active shape models-their training and application," Computer Vision and Image Understanding, vol. 61, no. 1, pp. 38-59, 1995. [CrossRef] [Web of Science Times Cited 5074] [SCOPUS Times Cited 6337] [6] T. F. Cootes, G. J. Edwards, C. J. Taylor, "Active appearance models," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, 2001. [CrossRef] [Web of Science Times Cited 3550] [SCOPUS Times Cited 4267] [7] D. Cristinacce, T. F. Cootes, "Feature detection and tracking with constrained local models," in British Machine Vision Conference, 2006, pp. 95.1-95.10. [CrossRef] [8] H. Yang, I. Patras, "Privileged information-based conditional structured output regression forest for facial point detection," IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, no. 9, pp. 1507-1520, 2015. [CrossRef] [Web of Science Times Cited 7] [SCOPUS Times Cited 11] [9] X. Xiong, F. D. Torre, "Supervised descent method and its applications to face alignment," in Computer Vision and Pattern Recognition, 2013, pp. 532-539. [CrossRef] [Web of Science Times Cited 1334] [SCOPUS Times Cited 1805] [10] X. Cao, Y. Wei, F. Wen, J. Sun, "Face alignment by explicit shape regression," International Journal of Computer Vision, vol. 107, 2014, pp. 177-190. [CrossRef] [Web of Science Times Cited 517] [SCOPUS Times Cited 660] [11] Y. Wu, Q. Ji, "Constrained joint cascade regression framework for simultaneous facial action unit recognition and facial landmark detection," in Computer Vision and Pattern Recognition, 2016, pp. 3400-3408. [CrossRef] [Web of Science Times Cited 37] [SCOPUS Times Cited 73] [12] S. Zhu, C. Li, C. L. Chen, X. Tang, "Face alignment by coarse-to-fine shape searching, in Computer Vision and Pattern Recognition," 2015, pp. 4998-5006. [CrossRef] [SCOPUS Times Cited 489] [13] K. Zhang, Z. Zhang, Z. Li, Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, 2016. [CrossRef] [Web of Science Times Cited 1720] [SCOPUS Times Cited 4813] [14] G. Trigeorgis, P. Snape, M. A. Nicolaou, E. Antonakos, S. Zafeiriou, "Mnemonic descent method: A recurrent process applied for end-to-end face alignment," in Computer Vision and Pattern Recognition, 2016, pp. 4177-4187. [CrossRef] [Web of Science Times Cited 220] [SCOPUS Times Cited 306] [15] S. Xiao, J. Feng, J. Xing, H. Lai, S. Yan, A. Kassim, "Robust facial landmark detection via recurrent attentive-refinement networks," in European Conference on Computer Vision, 2016, pp. 57-72. [CrossRef] [Web of Science Times Cited 157] [SCOPUS Times Cited 161] [16] Y. Liu, A. Jourabloo, W. Ren, X. Liu, "Dense face alignment," in IEEE Conference on International Conference on Computer Vision Workshops (ICCVW), 2017, pp. 1619-1628. [CrossRef] [17] J. Lv, X. Shao, J. Xing, C. Cheng, X. Zhou, "A deep regression architecture with two-stage re-initialization for high performance facial landmark detection," in Computer Vision and Pattern Recognition, 2017, pp. 3317-3326. [CrossRef] [Web of Science Times Cited 166] [SCOPUS Times Cited 217] [18] D. Merget, M. Rock, G. Rigoll, "Robust facial landmark detection via a fully-convolutional local-global context network," in Computer Vision and Pattern Recognition, 2018, pp. 781-790. [CrossRef] [Web of Science Times Cited 63] [SCOPUS Times Cited 80] [19] A. Bulat, G. Tzimiropoulos, "Convolutional aggregation of local evidence for large pose face alignment," in British Machine Vision Conference, 2016, pp. 1-12. [CrossRef] [SCOPUS Times Cited 52] [20] J. Yang, Q. Liu, K. Zhang, "Stacked hourglass network for robust facial landmark localisation," in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 79-87. [CrossRef] [Web of Science Times Cited 160] [SCOPUS Times Cited 224] [21] W. Wu, C. Qian, S. Yang, Q. Wang, Y. Cai, Q. Zhou, "Look at boundary: A boundary-aware face alignment algorithm," in Computer Vision and Pattern Recognition, 2018, pp. 2129-2138. [CrossRef] [22] Z. Tang, X. Peng, S. Geng, L. Wu, S. Zhang, D. Metaxas, "Quantized densely connected u-nets for efficient landmark localization," in European Conference on Computer Vision, 2018, pp. 339-354. [CrossRef] [23] X. Wang, L. Bo, F. Li, "Adaptive wing loss for robust face alignment via heatmap regression," in International Conference on Computer Vision, 2019, pp. 6971-6981. [CrossRef] [Web of Science Times Cited 256] [SCOPUS Times Cited 216] [24] H. J. Lee, S. T. Kim, H. Lee, Y. M. Ro, "Lightweight and effective facial landmark detection using adversarial learning with face geometric map generative network," IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 3, pp. 771-780, 2019. [CrossRef] [Web of Science Times Cited 17] [SCOPUS Times Cited 18] [25] M. S. Shakeel, Y. Zhang, X. Wang, W. Kang, A. Mahmood, "Multi-scale attention guided network for end-to-end face alignment and recognition," Journal of Visual Communication and Image Representation, vol. 88, p. 103628, 2022. [CrossRef] [Web of Science Times Cited 5] [SCOPUS Times Cited 5] [26] Z. Shao, Z. Liu, J. Cai, L. Ma, "JAA-Net: Joint facial action unit detection and face alignment via adaptive attention," International Journal of Computer Vision, vol. 129, 321-340, 2021. [CrossRef] [Web of Science Times Cited 81] [SCOPUS Times Cited 109] [27] Q. Wang, T. Wu, H. Zheng, G. Guo, "Hierarchical pyramid diverse attention networks for face recognition," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8326-8335. [CrossRef] [Web of Science Times Cited 47] [SCOPUS Times Cited 80] [28] Y. Li, K. Guo, Y. Lu, L. Liu, "Cropping and attention based approach for masked face recognition," Applied Intelligence, 2021, pp. 3012-3025. [CrossRef] [Web of Science Times Cited 105] [SCOPUS Times Cited 161] [29] X. Liu and Q. Xu. "Adaptive attention-based high-level semantic introduction for image caption," ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 16, no. 4, pp. 128:1-128:22, 2020. [CrossRef] [Web of Science Times Cited 17] [SCOPUS Times Cited 22] [30] X. Liu, Y. Ma, Z. Shi, J. Chen. "GridDehazenet: Attention-based multi-scale network for image dehazing," IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp. 7313-7322. [CrossRef] [Web of Science Times Cited 813] [SCOPUS Times Cited 836] [31] X. P. Burgos-Artizzu, P. Perona, P. Dollar, "Robust face landmark estimation under occlusion," in IEEE Conference on International Conference on Computer Vision, 2013, pp. 1513-1520. [CrossRef] [Web of Science Times Cited 512] [SCOPUS Times Cited 680] [32] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, M. Pantic, "300 faces in-the-wild challenge: The first facial landmark localization challenge," in IEEE Conference on International Conference on Computer Vision Workshops (ICCVW), 2013, pp. 397-403. [CrossRef] [Web of Science Times Cited 705] [SCOPUS Times Cited 1008] [33] M. Koestinger, P. Wohlhart, P. M. Roth, H. Bischof, "Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization," in IEEE Conference on International Conference on Computer Vision Workshops (ICCVW), 2011, pp. 2144-2151. [CrossRef] [SCOPUS Times Cited 846] [34] J. Zhang, S. Shan, M. Kan, X. Chen, "Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment," in European Conference on Computer Vision, 2014, pp. 1-16. [CrossRef] [Web of Science Times Cited 286] [SCOPUS Times Cited 419] [35] Z. Zhang, P. Luo, C. C. Loy, X. Tang, "Facial landmark detection by deep multi-task learning," in European Conference on Computer Vision, 2014, pp. 94-108. [CrossRef] [SCOPUS Times Cited 1006] [36] Y. Wu, T. Hassner, K. Kim, G. Medioni, P. Natarajan, "Facial landmark detection with tweaked convolutional neural networks," IEEE Transations on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 3067-3074, 2017. [CrossRef] [Web of Science Times Cited 67] [SCOPUS Times Cited 129] [37] H. Lai, S. Xiao, Y. Pan, Z. Cui, J. Feng, et al., "Deep recurrent regression for facial landmark detection," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 5, pp. 1144-1157, 2016. [CrossRef] [Web of Science Times Cited 32] [SCOPUS Times Cited 40] [38] L. Liu, Q. Wang, W. Zhu, H. Mo, T. Wang, et al., "A face alignment accelerator based on optimized coarse-to-fine shape searching," IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 8, pp. 2467-2481, 2018. [CrossRef] [Web of Science Times Cited 7] [SCOPUS Times Cited 7] [39] M. Kowalski, J. Naruniec, T. Trzcinski, "Deep alignment network: A convolutional neural network for robust face alignment," in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 88-97. [CrossRef] [40] A. Kumar, T. K. Marks, W. Mou, Y. Wang, M. Jones, et al., "LUVLi Face Alignment: Estimating Landmarks' Location, Uncertainty, and Visibility Likelihood," in Computer Vision and Pattern Recognition, 2020, pp. 8236-8246. [CrossRef] [41] X. Zou, S. Zhong, L. Yan, X. Zhao, J. Zhou, Y. Wu, "Learning robust facial landmark detection via hierarchical structured ensemble," in International Conference on Computer Vision, 2019, pp. 141-150. [CrossRef] [Web of Science Times Cited 44] [SCOPUS Times Cited 60] [42] D. Chen, G. Hua, F. Wen, J. Sun, "Supervised transformer network for efficient face detection," in European Conference on Computer Vision, 2016, pp. 122-138. [CrossRef] [43] L. Ke, M. Chang, H. Qi, S. Lyu, "Multi-scale structure-aware network for human pose estimation," in European Conference on Computer Vision, 2018, pp. 713-728. [CrossRef] [44] W. Yang, S. Li, W. Ouyang, H. Li, X. Wang, "Learning feature pyramids for human pose estimation," in International Conference on Computer Vision, 2017, pp. 1281-1290. [CrossRef] [45] A. Bulat, G. Tzimiropoulos, "Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources," in International Conference on Computer Vision, 2017, pp. 3706-3714. [CrossRef] [Web of Science Times Cited 250] [SCOPUS Times Cited 175] [46] V. Le, J. Brandt, Z. Lin, L. Bourdev, T. S. Huang, "Interactive facial feature localization," in European Conference on Computer Vision, 2012, pp. 679-692. [CrossRef] [SCOPUS Times Cited 729] [47] F. Milletari, N. Navab, S. Ahmadi, "V-net: Fully convolutional neural networks for volumetric medical image segmentation," in International Conference on 3D Vision (3DV), 2016, pp. 565-571. [CrossRef] [48] S. Zhu, C. Li, C. Loy, X. Tang, "Unconstrained face alignment via cascaded compositional learning," in Computer Vision and Pattern Recognition, 2016, pp. 3409-3417. [CrossRef] [Web of Science Times Cited 110] [SCOPUS Times Cited 159] [49] A. Bulat, G. Tzimiropoulos, "How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks)," in IEEE Conference on International Conference on Computer Vision, 2017, pp. 1021-1030. [CrossRef] [Web of Science Times Cited 832] [SCOPUS Times Cited 1261] [50] R. Valle, J. M. Buenaposada, A. Valdes, L. Baumela, "A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment," in European Conference on Computer Vision, 2018, pp. 585-601. [CrossRef] [Web of Science Times Cited 63] [SCOPUS Times Cited 25] [51] X. Zhu, Z. Lei, X. Liu, H. Shi, S. Z. Li, "Face alignment across large poses: A 3d solution," in Computer Vision and Pattern Recognition, 2016, pp. 146-155. [CrossRef] [Web of Science Times Cited 719] [SCOPUS Times Cited 992] [52] S. Honari, P. Molchanov, S. Tyree, P. Vincent, C. Pal, J. Kautz, "Improving landmark localization with semi-supervised learning," in Computer Vision and Pattern Recognition, 2018, pp. 1546-1555. [CrossRef] [Web of Science Times Cited 103] [SCOPUS Times Cited 151] [53] M. Zhu, D. Shi, M. Zheng, M. Sadiq, "Robust facial landmark detection via occlusion-adaptive deep networks," in Computer Vision and Pattern Recognition, 2019, pp. 2486-2496. [CrossRef] [Web of Science Times Cited 99] [SCOPUS Times Cited 114] [54] X. Miao, X. Zhen, X. Liu, C. Deng, V. Athitsos, H. Huang, "Direct shape regression networks for end-to-end face alignment," in Computer Vision and Pattern Recognition, 2018, pp. 5040-5049. [CrossRef] [Web of Science Times Cited 78] [SCOPUS Times Cited 108] [55] A. Kumar, R. Chellappa, "Disentangling 3d pose in a dendritic CNN for unconstrained 2d face alignment," in Computer Vision and Pattern Recognition, 2018, pp. 430-439. [CrossRef] [Web of Science Times Cited 102] [SCOPUS Times Cited 129] [56] X. Dong, Y. Yan, W. Ouyang, Y. Yang, "Style aggregated network for facial landmark detection," in Computer Vision and Pattern Recognition, 2018, pp. 379-388. [CrossRef] [57] X. Dong, Y. Yang, S. Wei, X. Weng, Y. Sheikh, S. Yu, "Supervision by registration and triangulation for landmark detection," IEEE Transations on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3681-3694, 2020. [CrossRef] [Web of Science Times Cited 24] [SCOPUS Times Cited 28] [58] G. Tzimiropoulos, M. Pantic, "Optimization problems for fast AAM fitting in-the-wild," in International Conference on Computer Vision, 2013, pp. 593-600. [CrossRef] [Web of Science Times Cited 161] [SCOPUS Times Cited 212] [59] Q. Liu, J. Deng, J. Yang, G. Liu, D. Tao, "Adaptive cascade regression model for robust face alignment," IEEE Transations on Image Processing, vol. 26, no. 2, pp. 797-807, 2016. [CrossRef] [Web of Science Times Cited 23] [SCOPUS Times Cited 28] [60] G. Ghiasi, C. C. Fowlkes, "Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model," in Computer Vision and Pattern Recognition, 2014, pp. 2385-2392. [CrossRef] [Web of Science Times Cited 99] [SCOPUS Times Cited 139] [61] S. Ren, X. Cao, Y. Wei, J. Sun, "Face alignment via regressing local binary features," IEEE Transations on Image Processing, vol. 25, no. 3, pp. 1233-1245, 2016. [CrossRef] [Web of Science Times Cited 39] [SCOPUS Times Cited 82] Web of Science® Citations for all references: 19,286 TCR SCOPUS® Citations for all references: 30,240 TCR Web of Science® Average Citations per reference: 311 ACR SCOPUS® Average Citations per reference: 488 ACR TCR = Total Citations for References / ACR = Average Citations per Reference We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more Citations for references updated on 2025-05-08 21:08 in 420 seconds. Note1: Web of Science® is a registered trademark of Clarivate Analytics. Note2: SCOPUS® is a registered trademark of Elsevier B.V. Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site. |
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania
All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.
Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.
Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.