Structure-aware Heatmap and Boundary Map Regression Based Robust Face Alignment

doi:10.4316/AECE.2023.02001

2/2023 - 1

View TOC | « Previous Article | Next Article »

Structure-aware Heatmap and Boundary Map Regression Based Robust Face Alignment

HUANG, L. , WU, Y.

Extra paper information in

Click to see author's profile in

SCOPUS,

IEEE Xplore,

Web of Science

Download PDF (2,608 KB) | Citation | Downloads: 1,170 | Views: 2,689

Author keywords
distance learning, image analysis, neural network, pattern analysis, supervised learning

References keywords
vision(46), face(32), alignment(26), recognition(25), pattern(23), facial(21), landmark(20), detection(18), cvpr(16), robust(11)
Blue keywords are present in both the references section and the paper title.

About this article
Date of Publication: 2023-05-31
Volume 23, Issue 2, Year 2023, On page(s): 3 - 10
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2023.02001
Web of Science Accession Number: 001009953400001
SCOPUS ID: 85164342223

Abstract

Full text preview

Large head pose variations and severe occlusion are challenging problems for face alignment. In this paper, we propose a Structure-aware Heatmap and Boundary map Regression Network (SHBRN), consisting of a rough estimation network and a refinement network, to accounting for the structural geometry of faces via the boundary map. Specifically, in the rough estimation network, a structure-aware module is designed to capture low-level features rich in structure information, and both heatmaps and boundary maps are predicted by the hourglass network. In this way, the network can not only estimate the initial location of keypoints, but also implicitly take the geometric structure into consideration. In the refinement network, the boundary maps and heatmaps are fused with the features extracted in the rough stage via attention mechanism. As a result, the network can combine the global information with local appearance for obtaining complete face representations, and also optimize the spatial relationship of different keypoints. Our proposed network is superior to the existing methods on 300W, COFW, and AFLW datasets, especially for those challenging situations, which proves the effectiveness and robustness of our model.

References

Cited By «-- Click to see who has cited this paper

[1] F. Liu, D. Zeng, Q. Zhao, X. Liu, "Joint face alignment and 3D face reconstruction," in European Conference on Computer Vision, 2016, pp. 545-560.
[CrossRef] [SCOPUS Times Cited 105]

[2] R. Weng, J. Lu, Y. Tan, "Robust point set matching for partial face recognition," IEEE Transactions on Image Processing, vol. 25, no. 3, pp. 1163-1176, 2016.
[CrossRef] [SCOPUS Times Cited 118]

[3] A. Asthana, S. Zafeiriou, S. Cheng, and M. Pantic, "Robust discriminative response map fitting with constrained local models," in Computer Vision and Pattern Recognition, 2013, pp. 3444-3451.
[CrossRef] [SCOPUS Times Cited 528]

[4] M. Jeong, B. C. Ko, S. Kwak, J. Nam, "Driver facial landmark detection in real driving situations," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2753-2767, 2017.
[CrossRef] [SCOPUS Times Cited 26]

[5] T. F. Cootes, C. J. Taylor, D. H. Cooper, J. Graham, "Active shape models-their training and application," Computer Vision and Image Understanding, vol. 61, no. 1, pp. 38-59, 1995.
[CrossRef] [SCOPUS Times Cited 6366]

[6] T. F. Cootes, G. J. Edwards, C. J. Taylor, "Active appearance models," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, 2001.
[CrossRef] [SCOPUS Times Cited 4283]

[7] D. Cristinacce, T. F. Cootes, "Feature detection and tracking with constrained local models," in British Machine Vision Conference, 2006, pp. 95.1-95.10.
[CrossRef]

[8] H. Yang, I. Patras, "Privileged information-based conditional structured output regression forest for facial point detection," IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, no. 9, pp. 1507-1520, 2015.
[CrossRef] [SCOPUS Times Cited 11]

[9] X. Xiong, F. D. Torre, "Supervised descent method and its applications to face alignment," in Computer Vision and Pattern Recognition, 2013, pp. 532-539.
[CrossRef] [SCOPUS Times Cited 1809]

[10] X. Cao, Y. Wei, F. Wen, J. Sun, "Face alignment by explicit shape regression," International Journal of Computer Vision, vol. 107, 2014, pp. 177-190.
[CrossRef] [SCOPUS Times Cited 666]

[11] Y. Wu, Q. Ji, "Constrained joint cascade regression framework for simultaneous facial action unit recognition and facial landmark detection," in Computer Vision and Pattern Recognition, 2016, pp. 3400-3408.
[CrossRef] [SCOPUS Times Cited 74]

[12] S. Zhu, C. Li, C. L. Chen, X. Tang, "Face alignment by coarse-to-fine shape searching, in Computer Vision and Pattern Recognition," 2015, pp. 4998-5006.
[CrossRef] [SCOPUS Times Cited 490]

[13] K. Zhang, Z. Zhang, Z. Li, Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, 2016.
[CrossRef] [SCOPUS Times Cited 4954]

[14] G. Trigeorgis, P. Snape, M. A. Nicolaou, E. Antonakos, S. Zafeiriou, "Mnemonic descent method: A recurrent process applied for end-to-end face alignment," in Computer Vision and Pattern Recognition, 2016, pp. 4177-4187.
[CrossRef] [SCOPUS Times Cited 308]

[15] S. Xiao, J. Feng, J. Xing, H. Lai, S. Yan, A. Kassim, "Robust facial landmark detection via recurrent attentive-refinement networks," in European Conference on Computer Vision, 2016, pp. 57-72.
[CrossRef] [SCOPUS Times Cited 161]

[16] Y. Liu, A. Jourabloo, W. Ren, X. Liu, "Dense face alignment," in IEEE Conference on International Conference on Computer Vision Workshops (ICCVW), 2017, pp. 1619-1628.
[CrossRef]

[17] J. Lv, X. Shao, J. Xing, C. Cheng, X. Zhou, "A deep regression architecture with two-stage re-initialization for high performance facial landmark detection," in Computer Vision and Pattern Recognition, 2017, pp. 3317-3326.
[CrossRef] [SCOPUS Times Cited 220]

[18] D. Merget, M. Rock, G. Rigoll, "Robust facial landmark detection via a fully-convolutional local-global context network," in Computer Vision and Pattern Recognition, 2018, pp. 781-790.
[CrossRef] [SCOPUS Times Cited 82]

[19] A. Bulat, G. Tzimiropoulos, "Convolutional aggregation of local evidence for large pose face alignment," in British Machine Vision Conference, 2016, pp. 1-12.
[CrossRef] [SCOPUS Times Cited 52]

[20] J. Yang, Q. Liu, K. Zhang, "Stacked hourglass network for robust facial landmark localisation," in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 79-87.
[CrossRef] [SCOPUS Times Cited 225]

[21] W. Wu, C. Qian, S. Yang, Q. Wang, Y. Cai, Q. Zhou, "Look at boundary: A boundary-aware face alignment algorithm," in Computer Vision and Pattern Recognition, 2018, pp. 2129-2138.
[CrossRef]

[22] Z. Tang, X. Peng, S. Geng, L. Wu, S. Zhang, D. Metaxas, "Quantized densely connected u-nets for efficient landmark localization," in European Conference on Computer Vision, 2018, pp. 339-354.
[CrossRef]

[23] X. Wang, L. Bo, F. Li, "Adaptive wing loss for robust face alignment via heatmap regression," in International Conference on Computer Vision, 2019, pp. 6971-6981.
[CrossRef] [SCOPUS Times Cited 226]

[24] H. J. Lee, S. T. Kim, H. Lee, Y. M. Ro, "Lightweight and effective facial landmark detection using adversarial learning with face geometric map generative network," IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 3, pp. 771-780, 2019.
[CrossRef] [SCOPUS Times Cited 19]

[25] M. S. Shakeel, Y. Zhang, X. Wang, W. Kang, A. Mahmood, "Multi-scale attention guided network for end-to-end face alignment and recognition," Journal of Visual Communication and Image Representation, vol. 88, p. 103628, 2022.
[CrossRef] [SCOPUS Times Cited 6]

[26] Z. Shao, Z. Liu, J. Cai, L. Ma, "JAA-Net: Joint facial action unit detection and face alignment via adaptive attention," International Journal of Computer Vision, vol. 129, 321-340, 2021.
[CrossRef] [SCOPUS Times Cited 110]

[27] Q. Wang, T. Wu, H. Zheng, G. Guo, "Hierarchical pyramid diverse attention networks for face recognition," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8326-8335.
[CrossRef] [SCOPUS Times Cited 85]

[28] Y. Li, K. Guo, Y. Lu, L. Liu, "Cropping and attention based approach for masked face recognition," Applied Intelligence, 2021, pp. 3012-3025.
[CrossRef] [SCOPUS Times Cited 164]

[29] X. Liu and Q. Xu. "Adaptive attention-based high-level semantic introduction for image caption," ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 16, no. 4, pp. 128:1-128:22, 2020.
[CrossRef] [SCOPUS Times Cited 24]

[30] X. Liu, Y. Ma, Z. Shi, J. Chen. "GridDehazenet: Attention-based multi-scale network for image dehazing," IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp. 7313-7322.
[CrossRef] [SCOPUS Times Cited 882]

[31] X. P. Burgos-Artizzu, P. Perona, P. Dollar, "Robust face landmark estimation under occlusion," in IEEE Conference on International Conference on Computer Vision, 2013, pp. 1513-1520.
[CrossRef] [SCOPUS Times Cited 686]

[32] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, M. Pantic, "300 faces in-the-wild challenge: The first facial landmark localization challenge," in IEEE Conference on International Conference on Computer Vision Workshops (ICCVW), 2013, pp. 397-403.
[CrossRef] [SCOPUS Times Cited 1014]

[33] M. Koestinger, P. Wohlhart, P. M. Roth, H. Bischof, "Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization," in IEEE Conference on International Conference on Computer Vision Workshops (ICCVW), 2011, pp. 2144-2151.
[CrossRef] [SCOPUS Times Cited 855]

[34] J. Zhang, S. Shan, M. Kan, X. Chen, "Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment," in European Conference on Computer Vision, 2014, pp. 1-16.
[CrossRef] [SCOPUS Times Cited 420]

[35] Z. Zhang, P. Luo, C. C. Loy, X. Tang, "Facial landmark detection by deep multi-task learning," in European Conference on Computer Vision, 2014, pp. 94-108.
[CrossRef] [SCOPUS Times Cited 1019]

[36] Y. Wu, T. Hassner, K. Kim, G. Medioni, P. Natarajan, "Facial landmark detection with tweaked convolutional neural networks," IEEE Transations on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 3067-3074, 2017.
[CrossRef] [SCOPUS Times Cited 131]

[37] H. Lai, S. Xiao, Y. Pan, Z. Cui, J. Feng, et al., "Deep recurrent regression for facial landmark detection," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 5, pp. 1144-1157, 2016.
[CrossRef] [SCOPUS Times Cited 41]

[38] L. Liu, Q. Wang, W. Zhu, H. Mo, T. Wang, et al., "A face alignment accelerator based on optimized coarse-to-fine shape searching," IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 8, pp. 2467-2481, 2018.
[CrossRef] [SCOPUS Times Cited 7]

[39] M. Kowalski, J. Naruniec, T. Trzcinski, "Deep alignment network: A convolutional neural network for robust face alignment," in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 88-97.
[CrossRef]

[40] A. Kumar, T. K. Marks, W. Mou, Y. Wang, M. Jones, et al., "LUVLi Face Alignment: Estimating Landmarks' Location, Uncertainty, and Visibility Likelihood," in Computer Vision and Pattern Recognition, 2020, pp. 8236-8246.
[CrossRef]

[41] X. Zou, S. Zhong, L. Yan, X. Zhao, J. Zhou, Y. Wu, "Learning robust facial landmark detection via hierarchical structured ensemble," in International Conference on Computer Vision, 2019, pp. 141-150.
[CrossRef] [SCOPUS Times Cited 61]

[42] D. Chen, G. Hua, F. Wen, J. Sun, "Supervised transformer network for efficient face detection," in European Conference on Computer Vision, 2016, pp. 122-138.
[CrossRef]

[43] L. Ke, M. Chang, H. Qi, S. Lyu, "Multi-scale structure-aware network for human pose estimation," in European Conference on Computer Vision, 2018, pp. 713-728.
[CrossRef]

[44] W. Yang, S. Li, W. Ouyang, H. Li, X. Wang, "Learning feature pyramids for human pose estimation," in International Conference on Computer Vision, 2017, pp. 1281-1290.
[CrossRef]

[45] A. Bulat, G. Tzimiropoulos, "Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources," in International Conference on Computer Vision, 2017, pp. 3706-3714.
[CrossRef] [SCOPUS Times Cited 174]

[46] V. Le, J. Brandt, Z. Lin, L. Bourdev, T. S. Huang, "Interactive facial feature localization," in European Conference on Computer Vision, 2012, pp. 679-692.
[CrossRef] [SCOPUS Times Cited 740]

[47] F. Milletari, N. Navab, S. Ahmadi, "V-net: Fully convolutional neural networks for volumetric medical image segmentation," in International Conference on 3D Vision (3DV), 2016, pp. 565-571.
[CrossRef]

[48] S. Zhu, C. Li, C. Loy, X. Tang, "Unconstrained face alignment via cascaded compositional learning," in Computer Vision and Pattern Recognition, 2016, pp. 3409-3417.
[CrossRef] [SCOPUS Times Cited 159]

[49] A. Bulat, G. Tzimiropoulos, "How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks)," in IEEE Conference on International Conference on Computer Vision, 2017, pp. 1021-1030.
[CrossRef] [SCOPUS Times Cited 1291]

[50] R. Valle, J. M. Buenaposada, A. Valdes, L. Baumela, "A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment," in European Conference on Computer Vision, 2018, pp. 585-601.
[CrossRef] [SCOPUS Times Cited 25]

[51] X. Zhu, Z. Lei, X. Liu, H. Shi, S. Z. Li, "Face alignment across large poses: A 3d solution," in Computer Vision and Pattern Recognition, 2016, pp. 146-155.
[CrossRef] [SCOPUS Times Cited 1003]

[52] S. Honari, P. Molchanov, S. Tyree, P. Vincent, C. Pal, J. Kautz, "Improving landmark localization with semi-supervised learning," in Computer Vision and Pattern Recognition, 2018, pp. 1546-1555.
[CrossRef] [SCOPUS Times Cited 153]

[53] M. Zhu, D. Shi, M. Zheng, M. Sadiq, "Robust facial landmark detection via occlusion-adaptive deep networks," in Computer Vision and Pattern Recognition, 2019, pp. 2486-2496.
[CrossRef] [SCOPUS Times Cited 116]

[54] X. Miao, X. Zhen, X. Liu, C. Deng, V. Athitsos, H. Huang, "Direct shape regression networks for end-to-end face alignment," in Computer Vision and Pattern Recognition, 2018, pp. 5040-5049.
[CrossRef] [SCOPUS Times Cited 111]

[55] A. Kumar, R. Chellappa, "Disentangling 3d pose in a dendritic CNN for unconstrained 2d face alignment," in Computer Vision and Pattern Recognition, 2018, pp. 430-439.
[CrossRef] [SCOPUS Times Cited 129]

[56] X. Dong, Y. Yan, W. Ouyang, Y. Yang, "Style aggregated network for facial landmark detection," in Computer Vision and Pattern Recognition, 2018, pp. 379-388.
[CrossRef]

[57] X. Dong, Y. Yang, S. Wei, X. Weng, Y. Sheikh, S. Yu, "Supervision by registration and triangulation for landmark detection," IEEE Transations on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3681-3694, 2020.
[CrossRef] [SCOPUS Times Cited 28]

[58] G. Tzimiropoulos, M. Pantic, "Optimization problems for fast AAM fitting in-the-wild," in International Conference on Computer Vision, 2013, pp. 593-600.
[CrossRef] [SCOPUS Times Cited 212]

[59] Q. Liu, J. Deng, J. Yang, G. Liu, D. Tao, "Adaptive cascade regression model for robust face alignment," IEEE Transations on Image Processing, vol. 26, no. 2, pp. 797-807, 2016.
[CrossRef] [SCOPUS Times Cited 28]

[60] G. Ghiasi, C. C. Fowlkes, "Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model," in Computer Vision and Pattern Recognition, 2014, pp. 2385-2392.
[CrossRef] [SCOPUS Times Cited 139]

[61] S. Ren, X. Cao, Y. Wei, J. Sun, "Face alignment via regressing local binary features," IEEE Transations on Image Processing, vol. 25, no. 3, pp. 1233-1245, 2016.
[CrossRef] [SCOPUS Times Cited 82]

References Weight

Web of Science® Citations for all references: 0
SCOPUS® Citations for all references: 30,618 TCR

Web of Science® Average Citations per reference: 0
SCOPUS® Average Citations per reference: 494 ACR

TCR = Total Citations for References / ACR = Average Citations per Reference

We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more

Citations for references updated on 2025-07-01 11:35 in 402 seconds.

Note¹: Web of Science® is a registered trademark of Clarivate Analytics.
Note²: SCOPUS® is a registered trademark of Elsevier B.V.
Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site.

Copyright ©2001-2025
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania

All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.

Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.

Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.

Menu:

Structure-aware Heatmap and Boundary Map Regression Based Robust Face Alignment