4/2024 - 10 |
Floating Point Multiple-Precision Fused Multiply Add Architecture for Deep Learning Computation on Artix 7 FPGA BoardVINOTHENI, M. S. , JAWAHAR SENTHIL KUMAR, V. |
Extra paper information in |
Click to see author's profile in SCOPUS, IEEE Xplore, Web of Science |
Download PDF (2,012 KB) | Citation | Downloads: 80 | Views: 94 |
Author keywords
field programmable gate arrays, architecture, high performance computing, parallel processing, very large scale integration
References keywords
point(23), floating(23), precision(15), systems(12), multiply(10), design(10), unit(9), efficient(9), arithmetic(7), multiple(6)
Blue keywords are present in both the references section and the paper title.
About this article
Date of Publication: 2024-11-30
Volume 24, Issue 4, Year 2024, On page(s): 93 - 102
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2024.04010
Abstract
Deep learning (DL) has become a transformative force in today's world revolutionizing industries. However, its success relies on high-precision arithmetic units, leading to the requirement of powerful high precision arithmetic design. Hence, this research proposes the multiple precision fused multiply add (MPFMA) architecture for profound computing-based applications. The proposed MPFMA architecture is capable of performing momentous tasks in every single clock cycle such as eight consecutive numbers of half precision (HP) operations, four numbers of concurrent single precision (SP) operations, two simultaneous double precision (DP) operations and single quadruple precision (QP) operations. The propounded architecture is implemented using Xilinx Vivado 2022.2 on Artix-7 FPGA Basys 3 board that demonstrates the functionality and attainment. From the observed results, it is inferred that the proposed framework offers 50% area curtail with the conventional FMA architecture, while still meeting the precision requirements for deep learning tasks. With an astoundingly low error rate of 0.013 % and an amazing accuracy rate of 99.987 %, the MPFMA in deep learning hardware not only enhances model performance but also contributes to energy conservation, making DL systems more sustainable and promising for the future of smart intelligence applications. |
References | | | Cited By «-- Click to see who has cited this paper |
[1] H. Tan, L. Huang, Z. Zheng, H. Guo, Q. Yang, L. Shen, G. Chen, L. Xiao, N. Xiao, "A low-cost floating-point dot-product-dual-accumulate architecture for HPC-enabled AI," IEEE Transactions On Computer-Aided Design Of Integrated Circuits And Systems, vol. 43, no. 2, pp. 681 - 693, Feb. 2024. [CrossRef] [Web of Science Record] [SCOPUS Record] [2] H. Tan, J. Zhang, L. Huang, X. He, Y. Wang, L. Xiao, "A low-cost floating-point FMA unit supporting package operations for HPC-AI applications," IEEE Transactions On Circuits and Systems-II: Express Briefs, vol. 71, no. 7, pp. 3488 - 3492, 2023. [CrossRef] [Web of Science Record] [SCOPUS Record] [3] W. Mao et al., "A configurable floating-point multiple-precision processing element for HPC and AI converged computing," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 30, no. 2, pp. 213-226, Feb. 2022. [CrossRef] [Web of Science Times Cited 17] [SCOPUS Times Cited 18] [4] H. Zhang, D. Chen, S.-B. Ko, "New flexible multiple-precision multiply-accumulate unit for deep neural network training and inference," IEEE Transactions on Computers, vol. 69, no. 1, pp. 26-38, Jan. 2020. [CrossRef] [Web of Science Times Cited 26] [SCOPUS Times Cited 35] [5] C. Wu, M. Wang, X. Chu, K. Wang, L. He, "Low-precision floating-point arithmetic for high-performance FPGA-based CNN acceleration," ACM Trans. Reconfigurable Technol. Syst. Vol. 15, no. 1, article 6, pp. 1-21, Nov. 2021. [CrossRef] [Web of Science Times Cited 17] [6] J. Zhang, L. Huang, H. Tan, L. Yang, Z. Zheng, Q. Yang, "Low-cost multiple-precision multiplication unit design for deep learning," in Proceedings of the Great Lakes Symposium on VLSI 2023 (GLSVLSI '23), pp.9-14, Jun. 2023. [CrossRef] [Web of Science Times Cited 1] [SCOPUS Times Cited 2] [7] R. Machupalli, M. Hossain, M. Mandal, "Review of ASIC accelerators for deep neural network," Microprocessors and Microsystems, vol. 89, Mar. 2022. [CrossRef] [Web of Science Times Cited 23] [SCOPUS Times Cited 36] [8] A. Vasantharaj, S. Anbu Karuppusamy, N. Nandhagopal, A. Pillai, V. Pillai, "A Low-cost in-tire-pressure monitoring SoC using integer/floating-point type convolutional neural network inference engine," Microprocessors and Microsystems, vol. 98, Mar. 2023. [CrossRef] [Web of Science Times Cited 2] [SCOPUS Times Cited 2] [9] Z. Que, D. Holanda Noronha, R. Zhao, X. Niu, S. J. E. Wilton, W. Luk, "In-circuit tuning of deep learning designs," Journal of Systems Architecture, vol. 118, Sep. 2021. [CrossRef] [Web of Science Record] [SCOPUS Record] [10] J. Park, Y. Jeong, J. Kim, S. Lee, J. Y. Kwak, J.-K. Park, I. Kim, "High dynamic range digital neuron core with time-embedded floating-point arithmetic," IEEE Transactions On Circuits and Systems-I, vol. 70, no. 1, pp. 290 - 301, Jan. 2023. [CrossRef] [Web of Science Times Cited 1] [SCOPUS Times Cited 3] [11] H. Zhang, D. Chen, S. -B. Ko, "Efficient multiple-precision floating-point fused multiply-add with mixed-precision support," IEEE Transactions on Computers, vol. 68, no. 7, pp. 1035-1048, Jul. 2019. [CrossRef] [Web of Science Times Cited 37] [SCOPUS Times Cited 49] [12] L. Huang, S. Ma, L. Shen, Z. Wang, N. Xiao, "Low-cost Binary128 floating-point FMA unit design with SIMD support," IEEE Transaction on Computers, vol. 61, no. 5, pp. 745-751, May 2012. [CrossRef] [Web of Science Times Cited 23] [SCOPUS Times Cited 28] [13] L. Huang, L. Shen, K. Dai, Z. Wang, "A new architecture for multiple-precision floating-point multiply-add fused unit design," Proc. 18th IEEE Symp. Comput. Arithmetic, pp. 69-76, 2007. [CrossRef] [Web of Science Times Cited 34] [SCOPUS Times Cited 30] [14] N. Neves, P. Tomas, N. Roma, "Dynamic fused multiply-accumulate posit unit with variable exponent size for low-precision DSP applications," 2020 IEEE Workshop on Signal Processing Systems (SiPS), pp. 1-6, 2020. [CrossRef] [Web of Science Times Cited 4] [SCOPUS Times Cited 6] [15] Y. Li, Z. Huang, G. Cai, R. Chen, "A multi-precision floating-point multiplier structure applied to FPGA embedded DSP," 6th International Conference on Artificial Intelligence and Pattern Recognition (AIPR 2023), pp. 932 - 939, Sep. 22-24, 2023. [CrossRef] [SCOPUS Record] [16] K. Manolopoulos, D. Reisis, V. A. Chouliaras, "An efficient multiple precision floating-point Multiply-Add Fused unit," Microelectronics Journal, vol. 49, 2016. [CrossRef] [Web of Science Times Cited 9] [SCOPUS Times Cited 14] [17] L. Denisov, A. Galimberti, D. Cattaneo, G. Agosta, D. Zoni, "Design-time methodology for optimizing mixed-precision CPU architectures on FPGA," Journal of System Architecture, vol. 155, Oct. 2024. [CrossRef] [Web of Science Record] [SCOPUS Times Cited 1] [18] B. Zhou, G. Wang, G. Jie, Q. Liu, Z. Wang, "A high-speed floating-point multiply-accumulator based on FPGAs," IEEE Transactions On Very Large Scale Integration Systems, vol. 29, no. 10, pp. 1782 - 1789, Oct. 2021. [CrossRef] [Web of Science Times Cited 4] [SCOPUS Times Cited 6] [19] "IEEE Std 754-2008," IEEE Standard for floating-point arithmetic, Aug. 2008. [CrossRef] [20] D. K. J. Rajanediran, C. Ganesh Babu, K. Priyadharsini, M. Ramkumar, "Hybrid radix-16 booth encoding and rounding-based approximate karatsuba multiplier for fast fourier transform computation in biomedical signal processing application," INTEGRATION, the VLSI Journal, vol. 98, 2024. [CrossRef] [Web of Science Times Cited 1] [SCOPUS Times Cited 1] [21] S. S. H. Krishnan, K. Vidhya, "Distributed arithmetic-FIR filter design using approximate Karatsuba multiplier and VLCSA," Expert Systems with Applications, vol. 249, part B, Sep. 2024. [CrossRef] [Web of Science Record] [SCOPUS Record] [22] M. Mikaitis, "Monotonicity of multi-term floating-point adder, "IEEE Transactions On Computers, vol. 73, no. 6, pp. 1531-1543, Jun. 2024. [CrossRef] [Web of Science Record] [SCOPUS Record] [23] V. Sklyarov, I. Skliarova, "Hardware accelerators for data sort in all programmable systems-on-chip," Advances in Electrical and Computer Engineering, vol. 15, no.4, pp. 9-16, 2015. [CrossRef] [Full Text] [Web of Science Times Cited 1] [SCOPUS Times Cited 1] [24] A. HajiRassouliha, A. J. Taberner, M. P. Nash, P.M.F. Nielsen, "Suitability of recent hardware accelerators (DSPs, FPGAs, and GPUs) for computer vision and image processing algorithms," Signal Processing: Image Communication, vol. 68, 2018. [CrossRef] [Web of Science Times Cited 81] [SCOPUS Times Cited 106] [25] S. H. Farghaly, S. M. Ismail, "Floating-point discrete wavelet transform-based image compression on FPGA," International Journal of Electronics and Communication, vol. 124, 2020. [CrossRef] [Web of Science Times Cited 17] [SCOPUS Times Cited 27] [26] A. Khan, S. Wairya, "Efficient and power-aware design of a novel sparse Kogge-Stone adder using hybrid carry prefix generator adder," Advances in Electrical and Computer Engineering, vol. 24, no. 1, pp. 71 - 80, 2024. [CrossRef] [Full Text] [SCOPUS Record] [27] Y. Wang, X. Liang, S. Niu, C. Zhang, F. Lyu and Y. Luo, "FDM: Fused double-multiply design for low-latency and area- and power-efficient implementation," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 71, no. 1, pp. 450-454, Jan. 2024. [CrossRef] [Web of Science Record] [SCOPUS Record] [28] V. Arunachalam, A. N. J. Raj, N. Hampannavar, C. B. Bidul, "Efficient dual-precision floating-point fused-multiply-add architecture," Journal of Microprocessors and Microsystems, vol. 57, pp 23-31, 2018. [CrossRef] [Web of Science Times Cited 8] [SCOPUS Times Cited 10] [29] T. Lang, J. D. Bruguera, "Floating-point multiply-add-fused with reduced latency," in IEEE Transactions on Computers, vol. 53, no. 8, pp. 988-1003, Aug. 2004. [CrossRef] [Web of Science Times Cited 44] [SCOPUS Times Cited 63] [30] G. Even, P.M. Seidel, "A comparison of three rounding algorithms for IEEE floating-point multiplication," in IEEE Transactions On Computers, vol. 49, no. 7, pp. 638-650, 2000. [CrossRef] [Web of Science Times Cited 53] [SCOPUS Times Cited 73] [31] A. A. Wahba, H. A. H. Fahmy, "area efficient and fast combined binary/decimal floating point fused multiply add unit," in IEEE Transactions on Computers, vol. 66, no. 2, pp. 226-239, 1 Feb. 2017. [CrossRef] [Web of Science Times Cited 9] [SCOPUS Times Cited 20] [32] M. Fasi, M. Mikaitis, "CPFloat: A C Library for simulating low-precision arithmetic," ACM Trans. Math. Software, vol. 49, no.2, pp. 1-32, Jun. 2023. [CrossRef] [33] M. Dali, A. Guessoum, R. M. Gibson, A. Amira, N. Ramzan, "Efficient FPGA implementation of high-throughput mixed radix multipath delay commutator FFT processor for MIMO-OFDM," Advances in Electrical and Computer Engineering, vol.17, no.1, pp.27-38, 2017. [CrossRef] [Full Text] [Web of Science Times Cited 10] [SCOPUS Times Cited 10] [34] V. Sklyarov, I. Skliarova, "Hardware accelerators for data sort in all programmable systems-on-chip," Advances in Electrical and Computer Engineering, vol. 15, no. 4, pp. 9-16, 2015. [CrossRef] [Full Text] [Web of Science Times Cited 1] [SCOPUS Times Cited 1] [35] T. Fernandez-Hart, J. C. Knight, T. Kalganova, "Posit and floating-point based Izhikevich neuron: A Comparison of arithmetic," Neurocomputing, vol. 597, 2024. [CrossRef] [Web of Science Times Cited 1] [SCOPUS Times Cited 1] [36] L. Gao, F. Zheng, R. Wei, J. Dong, N. Emmart, Y. Ma, J. Lin, C. Weems, "DPF-ECC: A Framework for efficient ecc with double precision floating-point computing power," IEEE Transactions On Information Forensics and Security, vol. 16, pp. 3988 - 4002, 2021. [CrossRef] [Web of Science Times Cited 6] [SCOPUS Times Cited 6] [37] M. Kova, L. Dragi, B. Malnar, F. Minervini, O. Palomar, C. Rojas, M. Olivieri, J. Knezovi, M. Kova, "FAUST: Design and implementation of a pipelined RISC-V vector floating-point unit," Microprocessors and Microsystems, vol. 97, March 2023. [CrossRef] [Web of Science Times Cited 2] [SCOPUS Times Cited 6] [38] H. A. Kermani, A. A. Emrani Zarandi, "An efficient multi-format low-precision floating-point multiplier," Sustainable Computing: Informatics and Systems, vol. 41, Jan. 2024. [CrossRef] [Web of Science Record] [SCOPUS Record] [39] S. Ullah et al., "High-performance accurate and approximate multipliers for FPGA-based hardware accelerators," IEEE Transactions on Computer-Aided design of Integrated Circuits and System, vol.41, no. 2, 2022. [CrossRef] [Web of Science Times Cited 32] [SCOPUS Times Cited 55] Web of Science® Citations for all references: 464 TCR SCOPUS® Citations for all references: 610 TCR Web of Science® Average Citations per reference: 12 ACR SCOPUS® Average Citations per reference: 15 ACR TCR = Total Citations for References / ACR = Average Citations per Reference We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more Citations for references updated on 2024-12-03 18:04 in 277 seconds. Note1: Web of Science® is a registered trademark of Clarivate Analytics. Note2: SCOPUS® is a registered trademark of Elsevier B.V. Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site. |
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania
All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.
Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.
Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.