4/2024 - 5 |
Comparative Analysis of the Robustness of 3-bit PoTQ and UQ and their Application in Post-training QuantizationNIKOLIC, J. , PERIC, Z. , TOMIC, S. , JOVANOVIC, A. , ALEKSIC, D. , PERIC, S. |
Extra paper information in |
Click to see author's profile in SCOPUS, IEEE Xplore, Web of Science |
Download PDF (840 KB) | Citation | Downloads: 21 | Views: 31 |
Author keywords
computer simulation, intelligent transportation systems, machine learning, routing protocols, vehicular ad hoc networks.
References keywords
quantization(22), neural(13), training(9), learning(9), post(8), networks(8), uniform(6), processing(6), power(6), systems(5)
Blue keywords are present in both the references section and the paper title.
About this article
Date of Publication: 2024-11-30
Volume 24, Issue 4, Year 2024, On page(s): 47 - 56
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2024.04005
Abstract
In this paper, we propose a 3-bit quantizer model whose step sizes are obtained by multiplying the smallest step size by successive powers of 2, hence the name PoTQ (Power of Two Quantizer). Despite the fact that this model is non-uniform, it is quite simple to design, similarly to the most exploited and simplest quantizer model, UQ (Uniform Quantizer). Referring to these similarities, as well as to the non-existence of robustness analysis of SQNR to changes in the variance of data being quantized, we conduct a comparative theoretical analysis for both 3-bit quantizer models. In addition, we provide experimental results of the application of both quantizer models in post-training quantization of MLP (Multilayer Perceptron) weights. To illustrate the importance of our robustness analysis, we provide results for the case with and without normalization of the MLP weights, corresponding to the matched and heavily mismatched scenarios. We show that 3-bit PoTQ provides greater robustness of SQNR compared to 3-bit UQ. We also show that PoTQ outperforms UQ in preserving the accuracy of the compressed MLP model. Due to its simple design and superior performance, we can anticipate that 3-bit PoTQ will be widely used likewise UQ. |
References | | | Cited By «-- Click to see who has cited this paper |
[1] E. Frantar, S. Ashkboos, T. Hoefler, D. Alistarh, "OPTQ: Accurate post-training quantization for generative pre-trained transformers," in Proc. 11th International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023
[2] A. Kuzmin, V. B. Mart, Y. Ren, M. Nagel, J. Peters, T. Blankevoort, "FP8 quantization: The power of the exponent," in Proc. 36th Conference on Neural Information Processing Systems, NeurIPS 2022, New Orleans, Louisiana, USA, December 4-9, 2022 [3] S. Zhao, T. Yue, X. Hu, "Distribution-aware adaptive multi-bit quantization," in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, Nashville, TN, USA, June 19-25, 2021, pp. 9277-9286. [CrossRef] [Web of Science Times Cited 15] [SCOPUS Times Cited 21] [4] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 1, pp. 6869-6898, January 2017. [CrossRef] [5] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, "Binarized neural networks," in Proc. 30th Conference on Neural Information Processing Systems, NeurIPS 2016, Barcelona, Spain, December 5-10, 2016, pp. 4114-4122 [6] Z. Peric, B. Denic, M. Savic, V. Despotovic, "Design and analysis of binary scalar quantizer of Laplacian source with applications," Information, vol. 11, no. 501, pp. 1-18, October 2020. [CrossRef] [Web of Science Times Cited 11] [SCOPUS Times Cited 13] [7] R. Banner, Y. Nahshan, D. Soudry, "Post-training 4-bit quantization of convolutional networks for rapid-deployment," in Proc. 33rd Conference on Neural Information Processing Systems, NeurIPS 2019, Vancouver, BC, Canada, December 8-14, 2019, pp. 7948-7956 [8] S. K. Esser, J. L. McKinstry, D. Bablani, R. Appuswamy, D. S. Modha, "Learned step size quantization," in Proc. 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 [9] J. Nikolic, Z. Peric, D. Aleksic, S. Tomic, A. Jovanovic, "Whether the support region of three-bit uniform quantizer has a strong impact on post-training quantization for MNIST dataset?," Entropy, vol. 23, no.12, pp. 1699, December 2021. [CrossRef] [Web of Science Times Cited 4] [SCOPUS Times Cited 6] [10] Z. Peric, D. Aleksic, J. Nikolic, S. Tomic, "Two novel non-uniform quantizers with application in post-training quantization," Mathematics, vol. 10, no. 19, pp. 3435, October 2022. [CrossRef] [Web of Science Record] [SCOPUS Record] [11] A. Zhou, A. Yao, Y. Guo, L. Xu, Y. Chen, "Incremental network quantization: Towards lossless CNNs with low-precision weights," in Proc. 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017 [12] S. Tomic, J. Nikolic. Z. Peric, D. Aleksic, "Performance of post-training two-bits uniform and layer-wise uniform quantization for MNIST dataset from the perspective of support region choice," Mathematical Problems in Engineering, vol. 2022, ID 1463094, pp 1-15, 2022. [CrossRef] [Web of Science Times Cited 1] [SCOPUS Times Cited 4] [13] H. Yao, P. Li, J. Cao, X. Liu, C. Xie, B. Wang, "RAPQ: Rescuing accuracy for power-of-two low-bit post-training quantization," in Proc. 31st International Joint Conference on Artificial Intelligence, IJCAI-22, Vienna, Austria, July 23-29, 2022 [14] Y. Li, X. Dong, W. Wang, "Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks," in Proc. 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 [15] A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, K. Keutzer, "A survey of quantization methods for efficient neural network inference," Low-Power Computer Vision, 1st Edition, Chapman and Hall/CRC, 2022 [16] B. Zhang, T. Wang, S. Xu, D. Doermann, "Neural Networks with Model Compression," Singapore, CA: Springer Singapore, 2024 [17] S. Oh, H. Sim, J. Kim, J. Lee, "Non-uniform step size quantization for accurate post-training quantization," in Proc. 17th European Conference Computer Vision, ECCV 2022, Tel Aviv, Israel, October 23-27, 2022, pp. 658-673. [CrossRef] [Web of Science Times Cited 2] [SCOPUS Times Cited 4] [18] H. Al-Rikabi, B. Renczes, "Floating-point quantization analysis of multi-layer perceptron artificial neural networks," Journal of Signal Processing Systems, vol. 96, no. 4, pp. 301-312, March 2024. [CrossRef] [Web of Science Times Cited 1] [SCOPUS Times Cited 3] [19] J. Yin, J. Dong, Y. Wang, D. S. Christopher, V. Kuleshov, "ModuLoRA: Finetuning 2-Bit LLMs on consumer GPUs by integrating with modular quantizers," Transactions on Machine Learning Research, vol. 1, pp. 1-17, February 2024 [20] W. Huang, Y. Liu, H. Qin, Y. Li, S. Zhang, X. Liu et al., "BiLLM: Pushing the limit of post-training quantization for LLMs," in Proc. 41st International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024 [21] J. Chee, Y. Cai, V. Kuleshov, C. M. De Sa, "QuIP: 2-bit quantization of large language models with guarantees," in Proc. 37th Conference on Neural Information Processing Systems, NeurIPS 2023, New Orleans, December 10-16, 2023 [22] Y. Xu, L. Xie, X. Gu, X. Chen, H. Chang, H. Zhang, et al., "QA-LoRA: Quantization-aware low-rank adaptation of large language models," in Proc. 12th International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 [23] Z. Peric, B. Denic, M. Dincic, J. Nikolic, "Robust 2-bit quantization of weights in neural network modeled by Laplacian distribution," Advances in Electrical and Computer Engineering, vol. 21, no. 3, pp. 3-10, 2021. [CrossRef] [Full Text] [SCOPUS Times Cited 5] [24] D. Przewlocka-Rus, S. S. Sarwar, H. E. Sumbul, Y. Li, D. S. Barbara, "Power-of-two quantization for low bitwidth and hardware compliant neural networks," in Proc. TinyML Research Symposium 2022, San Jose, CA March 22, 2022 [25] X. Li, B. Liu, R. H. Yang, V. Courville, C. Xing, V. P. Nia, "DenseShift: Towards accurate and efficient low-bit power-of-two quantization," in Proc. 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 2-6, 2023 pp. 17010-17020 [CrossRef] [Web of Science Record] [SCOPUS Record] [26] L. Deng, "The MNIST database of handwritten digit images for machine learning research," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141-142, November 2012. [CrossRef] [SCOPUS Times Cited 3099] [27] X. Sui, Q. Lv, Y. Bai, B. Zhu, L. Zhi, Y. Yang, et al., "A hardware-friendly low-bit power-of-two quantization method for CNNs and its FPGA implementation," Sensors, vol. 22, no. 17, pp. 6618, August 2022. [CrossRef] [Web of Science Times Cited 7] [SCOPUS Times Cited 9] Web of Science® Citations for all references: 41 TCR SCOPUS® Citations for all references: 3,164 TCR Web of Science® Average Citations per reference: 1 ACR SCOPUS® Average Citations per reference: 113 ACR TCR = Total Citations for References / ACR = Average Citations per Reference We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more Citations for references updated on 2024-12-01 01:57 in 83 seconds. Note1: Web of Science® is a registered trademark of Clarivate Analytics. Note2: SCOPUS® is a registered trademark of Elsevier B.V. Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site. |
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania
All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.
Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.
Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.