Floating Point Multiple-Precision Fused Multiply Add Architecture for Deep Learning Computation on Artix 7 FPGA Board

doi:10.4316/AECE.2024.04010

4/2024 - 10

View TOC | « Previous Article | Next Article »

Floating Point Multiple-Precision Fused Multiply Add Architecture for Deep Learning Computation on Artix 7 FPGA Board

VINOTHENI, M. S. , JAWAHAR SENTHIL KUMAR, V.

Extra paper information in

Click to see author's profile in

SCOPUS,

IEEE Xplore,

Web of Science

Download PDF (2,018 KB) | Citation | Downloads: 767 | Views: 1,074

Author keywords
field programmable gate arrays, architecture, high performance computing, parallel processing, very large scale integration

References keywords
point(23), floating(23), precision(15), systems(12), multiply(10), design(10), unit(9), efficient(9), arithmetic(7), multiple(6)
Blue keywords are present in both the references section and the paper title.

About this article
Date of Publication: 2024-11-30
Volume 24, Issue 4, Year 2024, On page(s): 93 - 102
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2024.04010
Web of Science Accession Number: 001415806000010
SCOPUS ID: 85211338880

Abstract

Full text preview

Deep learning (DL) has become a transformative force in today's world revolutionizing industries. However, its success relies on high-precision arithmetic units, leading to the requirement of powerful high precision arithmetic design. Hence, this research proposes the multiple precision fused multiply add (MPFMA) architecture for profound computing-based applications. The proposed MPFMA architecture is capable of performing momentous tasks in every single clock cycle such as eight consecutive numbers of half precision (HP) operations, four numbers of concurrent single precision (SP) operations, two simultaneous double precision (DP) operations and single quadruple precision (QP) operations. The propounded architecture is implemented using Xilinx Vivado 2022.2 on Artix-7 FPGA Basys 3 board that demonstrates the functionality and attainment. From the observed results, it is inferred that the proposed framework offers 50% area curtail with the conventional FMA architecture, while still meeting the precision requirements for deep learning tasks. With an astoundingly low error rate of 0.013 % and an amazing accuracy rate of 99.987 %, the MPFMA in deep learning hardware not only enhances model performance but also contributes to energy conservation, making DL systems more sustainable and promising for the future of smart intelligence applications.

References

Cited By «-- Click to see who has cited this paper

[1] H. Tan, L. Huang, Z. Zheng, H. Guo, Q. Yang, L. Shen, G. Chen, L. Xiao, N. Xiao, "A low-cost floating-point dot-product-dual-accumulate architecture for HPC-enabled AI," IEEE Transactions On Computer-Aided Design Of Integrated Circuits And Systems, vol. 43, no. 2, pp. 681 - 693, Feb. 2024.
[CrossRef] [SCOPUS Times Cited 3]

[2] H. Tan, J. Zhang, L. Huang, X. He, Y. Wang, L. Xiao, "A low-cost floating-point FMA unit supporting package operations for HPC-AI applications," IEEE Transactions On Circuits and Systems-II: Express Briefs, vol. 71, no. 7, pp. 3488 - 3492, 2023.
[CrossRef] [SCOPUS Times Cited 2]

[3] W. Mao et al., "A configurable floating-point multiple-precision processing element for HPC and AI converged computing," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 30, no. 2, pp. 213-226, Feb. 2022.
[CrossRef] [SCOPUS Times Cited 27]

[4] H. Zhang, D. Chen, S.-B. Ko, "New flexible multiple-precision multiply-accumulate unit for deep neural network training and inference," IEEE Transactions on Computers, vol. 69, no. 1, pp. 26-38, Jan. 2020.
[CrossRef] [SCOPUS Times Cited 44]

[5] C. Wu, M. Wang, X. Chu, K. Wang, L. He, "Low-precision floating-point arithmetic for high-performance FPGA-based CNN acceleration," ACM Trans. Reconfigurable Technol. Syst. Vol. 15, no. 1, article 6, pp. 1-21, Nov. 2021.
[CrossRef]

[6] J. Zhang, L. Huang, H. Tan, L. Yang, Z. Zheng, Q. Yang, "Low-cost multiple-precision multiplication unit design for deep learning," in Proceedings of the Great Lakes Symposium on VLSI 2023 (GLSVLSI '23), pp.9-14, Jun. 2023.
[CrossRef] [SCOPUS Times Cited 4]

[7] R. Machupalli, M. Hossain, M. Mandal, "Review of ASIC accelerators for deep neural network," Microprocessors and Microsystems, vol. 89, Mar. 2022.
[CrossRef] [SCOPUS Times Cited 47]

[8] A. Vasantharaj, S. Anbu Karuppusamy, N. Nandhagopal, A. Pillai, V. Pillai, "A Low-cost in-tire-pressure monitoring SoC using integer/floating-point type convolutional neural network inference engine," Microprocessors and Microsystems, vol. 98, Mar. 2023.
[CrossRef] [SCOPUS Times Cited 8]

[9] Z. Que, D. Holanda Noronha, R. Zhao, X. Niu, S. J. E. Wilton, W. Luk, "In-circuit tuning of deep learning designs," Journal of Systems Architecture, vol. 118, Sep. 2021.
[CrossRef] [SCOPUS Times Cited 1]

[10] J. Park, Y. Jeong, J. Kim, S. Lee, J. Y. Kwak, J.-K. Park, I. Kim, "High dynamic range digital neuron core with time-embedded floating-point arithmetic," IEEE Transactions On Circuits and Systems-I, vol. 70, no. 1, pp. 290 - 301, Jan. 2023.
[CrossRef] [SCOPUS Times Cited 5]

[11] H. Zhang, D. Chen, S. -B. Ko, "Efficient multiple-precision floating-point fused multiply-add with mixed-precision support," IEEE Transactions on Computers, vol. 68, no. 7, pp. 1035-1048, Jul. 2019.
[CrossRef] [SCOPUS Times Cited 57]

[12] L. Huang, S. Ma, L. Shen, Z. Wang, N. Xiao, "Low-cost Binary128 floating-point FMA unit design with SIMD support," IEEE Transaction on Computers, vol. 61, no. 5, pp. 745-751, May 2012.
[CrossRef] [SCOPUS Times Cited 32]

[13] L. Huang, L. Shen, K. Dai, Z. Wang, "A new architecture for multiple-precision floating-point multiply-add fused unit design," Proc. 18th IEEE Symp. Comput. Arithmetic, pp. 69-76, 2007.
[CrossRef] [SCOPUS Times Cited 32]

[14] N. Neves, P. Tomas, N. Roma, "Dynamic fused multiply-accumulate posit unit with variable exponent size for low-precision DSP applications," 2020 IEEE Workshop on Signal Processing Systems (SiPS), pp. 1-6, 2020.
[CrossRef] [SCOPUS Times Cited 8]

[15] Y. Li, Z. Huang, G. Cai, R. Chen, "A multi-precision floating-point multiplier structure applied to FPGA embedded DSP," 6th International Conference on Artificial Intelligence and Pattern Recognition (AIPR 2023), pp. 932 - 939, Sep. 22-24, 2023.
[CrossRef] [SCOPUS Times Cited 1]

[16] K. Manolopoulos, D. Reisis, V. A. Chouliaras, "An efficient multiple precision floating-point Multiply-Add Fused unit," Microelectronics Journal, vol. 49, 2016.
[CrossRef] [SCOPUS Times Cited 17]

[17] L. Denisov, A. Galimberti, D. Cattaneo, G. Agosta, D. Zoni, "Design-time methodology for optimizing mixed-precision CPU architectures on FPGA," Journal of System Architecture, vol. 155, Oct. 2024.
[CrossRef] [SCOPUS Times Cited 7]

[18] B. Zhou, G. Wang, G. Jie, Q. Liu, Z. Wang, "A high-speed floating-point multiply-accumulator based on FPGAs," IEEE Transactions On Very Large Scale Integration Systems, vol. 29, no. 10, pp. 1782 - 1789, Oct. 2021.
[CrossRef] [SCOPUS Times Cited 12]

[19] "IEEE Std 754-2008," IEEE Standard for floating-point arithmetic, Aug. 2008.
[CrossRef]

[20] D. K. J. Rajanediran, C. Ganesh Babu, K. Priyadharsini, M. Ramkumar, "Hybrid radix-16 booth encoding and rounding-based approximate karatsuba multiplier for fast fourier transform computation in biomedical signal processing application," INTEGRATION, the VLSI Journal, vol. 98, 2024.
[CrossRef] [SCOPUS Times Cited 7]

[21] S. S. H. Krishnan, K. Vidhya, "Distributed arithmetic-FIR filter design using approximate Karatsuba multiplier and VLCSA," Expert Systems with Applications, vol. 249, part B, Sep. 2024.
[CrossRef] [SCOPUS Times Cited 3]

[22] M. Mikaitis, "Monotonicity of multi-term floating-point adder, "IEEE Transactions On Computers, vol. 73, no. 6, pp. 1531-1543, Jun. 2024.
[CrossRef] [SCOPUS Times Cited 5]

[23] V. Sklyarov, I. Skliarova, "Hardware accelerators for data sort in all programmable systems-on-chip," Advances in Electrical and Computer Engineering, vol. 15, no.4, pp. 9-16, 2015.
[CrossRef] [Full Text] [SCOPUS Times Cited 1]

[24] A. HajiRassouliha, A. J. Taberner, M. P. Nash, P.M.F. Nielsen, "Suitability of recent hardware accelerators (DSPs, FPGAs, and GPUs) for computer vision and image processing algorithms," Signal Processing: Image Communication, vol. 68, 2018.
[CrossRef] [SCOPUS Times Cited 121]

[25] S. H. Farghaly, S. M. Ismail, "Floating-point discrete wavelet transform-based image compression on FPGA," International Journal of Electronics and Communication, vol. 124, 2020.
[CrossRef] [SCOPUS Times Cited 33]

[26] A. Khan, S. Wairya, "Efficient and power-aware design of a novel sparse Kogge-Stone adder using hybrid carry prefix generator adder," Advances in Electrical and Computer Engineering, vol. 24, no. 1, pp. 71 - 80, 2024.
[CrossRef] [Full Text] [SCOPUS Times Cited 1]

[27] Y. Wang, X. Liang, S. Niu, C. Zhang, F. Lyu and Y. Luo, "FDM: Fused double-multiply design for low-latency and area- and power-efficient implementation," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 71, no. 1, pp. 450-454, Jan. 2024.
[CrossRef] [SCOPUS Times Cited 3]

[28] V. Arunachalam, A. N. J. Raj, N. Hampannavar, C. B. Bidul, "Efficient dual-precision floating-point fused-multiply-add architecture," Journal of Microprocessors and Microsystems, vol. 57, pp 23-31, 2018.
[CrossRef] [SCOPUS Times Cited 11]

[29] T. Lang, J. D. Bruguera, "Floating-point multiply-add-fused with reduced latency," in IEEE Transactions on Computers, vol. 53, no. 8, pp. 988-1003, Aug. 2004.
[CrossRef] [SCOPUS Times Cited 65]

[30] G. Even, P.M. Seidel, "A comparison of three rounding algorithms for IEEE floating-point multiplication," in IEEE Transactions On Computers, vol. 49, no. 7, pp. 638-650, 2000.
[CrossRef] [SCOPUS Times Cited 75]

[31] A. A. Wahba, H. A. H. Fahmy, "area efficient and fast combined binary/decimal floating point fused multiply add unit," in IEEE Transactions on Computers, vol. 66, no. 2, pp. 226-239, 1 Feb. 2017.
[CrossRef] [SCOPUS Times Cited 23]

[32] M. Fasi, M. Mikaitis, "CPFloat: A C Library for simulating low-precision arithmetic," ACM Trans. Math. Software, vol. 49, no.2, pp. 1-32, Jun. 2023.
[CrossRef]

[33] M. Dali, A. Guessoum, R. M. Gibson, A. Amira, N. Ramzan, "Efficient FPGA implementation of high-throughput mixed radix multipath delay commutator FFT processor for MIMO-OFDM," Advances in Electrical and Computer Engineering, vol.17, no.1, pp.27-38, 2017.
[CrossRef] [Full Text] [SCOPUS Times Cited 13]

[34] V. Sklyarov, I. Skliarova, "Hardware accelerators for data sort in all programmable systems-on-chip," Advances in Electrical and Computer Engineering, vol. 15, no. 4, pp. 9-16, 2015.
[CrossRef] [Full Text] [SCOPUS Times Cited 1]

[35] T. Fernandez-Hart, J. C. Knight, T. Kalganova, "Posit and floating-point based Izhikevich neuron: A Comparison of arithmetic," Neurocomputing, vol. 597, 2024.
[CrossRef] [SCOPUS Times Cited 2]

[36] L. Gao, F. Zheng, R. Wei, J. Dong, N. Emmart, Y. Ma, J. Lin, C. Weems, "DPF-ECC: A Framework for efficient ecc with double precision floating-point computing power," IEEE Transactions On Information Forensics and Security, vol. 16, pp. 3988 - 4002, 2021.
[CrossRef] [SCOPUS Times Cited 12]

[37] M. Kova, L. Dragi, B. Malnar, F. Minervini, O. Palomar, C. Rojas, M. Olivieri, J. Knezovi, M. Kova, "FAUST: Design and implementation of a pipelined RISC-V vector floating-point unit," Microprocessors and Microsystems, vol. 97, March 2023.
[CrossRef] [SCOPUS Times Cited 9]

[38] H. A. Kermani, A. A. Emrani Zarandi, "An efficient multi-format low-precision floating-point multiplier," Sustainable Computing: Informatics and Systems, vol. 41, Jan. 2024.
[CrossRef] [SCOPUS Times Cited 1]

[39] S. Ullah et al., "High-performance accurate and approximate multipliers for FPGA-based hardware accelerators," IEEE Transactions on Computer-Aided design of Integrated Circuits and System, vol.41, no. 2, 2022.
[CrossRef] [SCOPUS Times Cited 72]

References Weight

Web of Science® Citations for all references: 0
SCOPUS® Citations for all references: 765 TCR

Web of Science® Average Citations per reference: 0
SCOPUS® Average Citations per reference: 19 ACR

TCR = Total Citations for References / ACR = Average Citations per Reference

We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more

Citations for references updated on 2025-07-01 13:46 in 257 seconds.

Note¹: Web of Science® is a registered trademark of Clarivate Analytics.
Note²: SCOPUS® is a registered trademark of Elsevier B.V.
Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site.

Copyright ©2001-2025
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania

All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.

Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.

Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.

Menu:

Floating Point Multiple-Precision Fused Multiply Add Architecture for Deep Learning Computation on Artix 7 FPGA Board