An Automatic Instruction-Level Parallelization of Machine Code

doi:10.4316/AECE.2018.01004

1/2018 - 4

View TOC | « Previous Article | Next Article »

An Automatic Instruction-Level Parallelization of Machine Code

MARINKOVIC, V. , POPOVIC, M. , DJUKIC, M.

Extra paper information in

Click to see author's profile in

SCOPUS,

IEEE Xplore,

Web of Science

Download PDF (1,217 KB) | Citation | Downloads: 960 | Views: 2,754

Author keywords
parallel architectures, parallel programming, multicore processing, assembly, processor scheduling

References keywords
parallel(13), code(10), parallelization(9), automatic(8), systems(7), programming(4), program(4), micro(4), data(4), architectures(4)
Blue keywords are present in both the references section and the paper title.

About this article
Date of Publication: 2018-02-28
Volume 18, Issue 1, Year 2018, On page(s): 27 - 36
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2018.01004
Web of Science Accession Number: 000426449500004
SCOPUS ID: 85043242372

Abstract

Full text preview

Prevailing multicores and novel manycores have made a great challenge of modern day - parallelization of embedded software that is still written as sequential. In this paper, automatic code parallelization is considered, focusing on developing a parallelization tool at the binary level as well as on the validation of this approach. The novel instruction-level parallelization algorithm for assembly code which uses the register names after SSA to find independent blocks of code and then to schedule independent blocks using METIS to achieve good load balance is developed. The sequential consistency is verified and the validation is done by measuring the program execution time on the target architecture. Great speedup, taken as the performance measure in the validation process, and optimal load balancing are achieved for multicore RISC processors with 2 to 16 cores (e.g. MIPS, MicroBlaze, etc.). In particular, for 16 cores, the average speedup is 7.92x, while in some cases it reaches 14x. An approach to automatic parallelization provided by this paper is useful to researchers and developers in the area of parallelization as the basis for further optimizations, as the back-end of a compiler, or as the code parallelization tool for an embedded system.

References

Cited By «-- Click to see who has cited this paper

[1] L. Hochstein, J. Carver, F. Shull, S. Asgari, V. Basili, "Parallel programmer productivity: A case study of novice parallel programmers," Proceedings of the 2005 ACM/IEEE conference on Supercomputing (SC '05), Washington, pp. 35-43, 2005.
[CrossRef] [SCOPUS Times Cited 70]

[2] M. Popovic, M. Djukic, V. Marinkovic, N. Vranic, "On task tree executor architectures based on Intel parallel building blocks," Computer Science and Information Systems, vol. 10, no. 1, pp. 369-392, 2013.
[CrossRef] [Web of Science Times Cited 2] [SCOPUS Times Cited 2]

[3] R. Chandra, R. Menon, L. Dagum, D. Kohr, D. Maydan, J. McDonald, "Parallel programming in OpenMP", pp. 157-159, Academic press, 2001, ISBN: 1558606718.

[4] D.B. Kirk, W.W. Hwu, "Programming massively parallel processors", pp. 68-70, Mogran Kaufmann Publishers, 2010, ISBN: 0124159923.

[5] A. Bhattacharjee, G. Contreras, M. Martonosi, "Parallelization Libraries: Characterizing and Reducing Overheads," ACM Trans. Archit. Code Optim, vol. 8, no. 1, pp. 5:1-5:29, 2011.
[CrossRef] [Web of Science Times Cited 14] [SCOPUS Times Cited 19]

[6] A. Kotha, K. Anand, M. Smithson, G. Yellareddy, R. Barua, "Automatic Parallelization in a Binary Rewriter," Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 43), Washington, pp. 547-557, 2010.
[CrossRef] [SCOPUS Times Cited 36]

[7] G. Karypis, V. Kumar, "A fast and high quality multilevel scheme for partitioning irregular graphs," SIAM Journal of Scientific Computing, vol. 20, no. 1, pp. 359-392, 1998.
[CrossRef] [Web of Science Times Cited 3108] [SCOPUS Times Cited 4087]

[8] X. Wang, S. Thota, "A resource-efficient communication architecture for chip multiprocessors on FPGAs," J. Comput. Sci. Technol., vol. 26, no. 3, pp. 434-447, 2011.
[CrossRef] [Web of Science Times Cited 3] [SCOPUS Times Cited 4]

[9] U. Vishkin, "Is multicore hardware for general-purpose parallel processing Broken?," Communications of the ACM, vol. 57, no. 4, pp. 35-39, 2014.
[CrossRef] [Web of Science Times Cited 9] [SCOPUS Times Cited 12]

[10] M. Djukic, M. Popovic, N. Cetic, I. Povazan, "Embedded Processor Oriented Compiler Infrastructure," Advances in Electrical and Computer Engineering, vol. 14, no. 3, pp. 123-130, 2014.
[CrossRef] [Full Text] [Web of Science Times Cited 1] [SCOPUS Times Cited 1]

[11] N. Vranic, V. Marinkovic, M. Djukic, M. Popovic, "An approach to parallelization of sequential C code," 2011 Second Eastern European Regional Conference on the Engineering of Computer Based Systems, Bratislava, pp. 143-146, 2011.
[CrossRef] [Web of Science Times Cited 2] [SCOPUS Times Cited 4]

[12] D. Kovacevic, M. Stanojevic, V. Marinkovic, M. Popovic, "A solution for automatic parallelization of sequential assembly code," Serbian Journal of Electrical Engineering, vol. 10, no. 1, pp. 91-101, 2013.
[CrossRef]

[13] K. Kyriakopoulos, K. Psarris, "Non-linear symbolic analysis for advanced program parallelization," IEEE Transactions on Parallel and Distributed Systems, vol. 20, no. 5, pp. 623-640, 2009.
[CrossRef] [Web of Science Times Cited 4] [SCOPUS Times Cited 6]

[14] G. Ottoni, R. Rangan, A. Stoler, D. I. August, "Automatic thread extraction with decoupled software pipelining," Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 38), Washington, pp. 105-118, 2005.
[CrossRef] [SCOPUS Times Cited 198]

[15] S. Campanoni, T. Jones, G. Holloway, V. J. Reddi, G. Y. Wei, D. M. Brooks, "HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing," Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO '12), San Jose, pp. 84-93, 2012.
[CrossRef] [SCOPUS Times Cited 70]

[16] C. Dave, H. Bae, S. Min, S. Lee, R. Eligenmann, S. Midkiff, "Cetus: A source-to-source compiler infrastructure for multicores," Computer, vol. 42, no. 12, pp. 36-42, 2009.
[CrossRef] [Web of Science Times Cited 89] [SCOPUS Times Cited 134]

[17] M. Mathews , J. P. Abraham, "Automatic Code Parallelization with OpenMP task constructs," Proceedings of the 2016 International Conference on Information Science (ICIS '16), Kochi, pp. 233-238, 2016.
[CrossRef] [SCOPUS Times Cited 9]

[18] E. Yardimci, M. Franz, "Dynamic parallelization and mapping of binary executables on hierarchical platforms," Proceedings of the 3rd Conference on Computing Frontiers (CF '06), Ischia, pp. 127-138, 2006.
[CrossRef] [SCOPUS Times Cited 16]

[19] W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, J. Torrellas, "POSH: a TLS compiler that exploits program structure," Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '06), New York, pp. 158-167, 2006.
[CrossRef]

[20] H. Kim, N. P. Johnson, J. W. Lee, S. A. Mahlke, D. I. August, "Automatic speculative DOALL for clusters," Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO '12), San Jose, pp. 94-103, 2012.
[CrossRef] [SCOPUS Times Cited 32]

[21] T. Oh, S. R. Beard, N. P. Johnson, S. Popovych, D. I. August, "A Generalized Framework for Automatic Scripting Language Parallelization," Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT '17), Portland, pp. 356-369, 2017.
[CrossRef] [Web of Science Times Cited 3] [SCOPUS Times Cited 11]

[22] C. Wang, X. Li, J. Zhang, X. Zhou, X. Nie, "MP-Tomasulo: A Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs," ACM Trans. Archit. Code Optim, vol. 10, no. 2, pp. 9:1-9:26, 2013.
[CrossRef] [Web of Science Times Cited 35] [SCOPUS Times Cited 29]

[23] Y. Dou, J. Zhou, G.-M. Wu, J.-F. Jiang, Y.-W. Lei, S.-C. Ni, "A unified co-processor architecture for matrix decomposition," J. Comput. Sci. Technol., vol. 25, no. 4, pp. 874-885, 2010.
[CrossRef] [Web of Science Times Cited 3] [SCOPUS Times Cited 5]

[24] M. Dali, A. Guessoum, R. M. Gibson, A. Amira, N. Ramzan, "Efficient FPGA Implementation of High-Throughput Mixed Radix Multipath Delay Commutator FFT Processor for MIMO-OFDM, " Advances in Electrical and Computer Engineering, vol.17, no.1, pp. 27-38, 2017.
[CrossRef] [Full Text] [Web of Science Times Cited 10] [SCOPUS Times Cited 10]

[25] D. Capko, A. Erdeljan, G. Svenda, M. Popovic, "Dynamic repartitioning of large data model in distribution management systems," Electronics and Electrical Engineering, vol. 120, no. 4, pp. 83-88, 2012.
[CrossRef] [Web of Science Times Cited 1] [SCOPUS Times Cited 1]

[26] D. Capko, A. Erdeljan, M. Popovic, G. Svenda, "An optimal initial partitioning of large data model in utility management systems," Advances in Electrical and Computer Engineering, vol. 11, no. 4, pp. 41-46, 2011.
[CrossRef] [Full Text] [Web of Science Times Cited 8] [SCOPUS Times Cited 8]

[27] A. H. Hormati, Y. Choi, M. Kudlur, R. Rabbah, T. Mudge, S. Mahlke, "Flextream: Adaptive compilation of streaming applications for heterogeneous architectures," The 18th Int. Conf. on Parallel Arch. and Compilation Techn., Washington, pp. 214-223, 2009.
[CrossRef] [Web of Science Times Cited 37] [SCOPUS Times Cited 70]

[28] A. V. Aho, M. S. Lam, R. Sethi, J. D. Ullman, "Compilers: principles, techniques, & tools", pp. 369-370, Addison-Wesley, 2007, ISBN: 0321486811.

[29] A.J. Bernstein, "Analysis of programs for parallel processing," IEEE Transactions on Electronic Computers, vol. EC-15, no. 5, pp 757-763, 1966.
[CrossRef] [SCOPUS Times Cited 270]

[30] S. Debray, R. Muth, M. Weippert, "Alias analysis of executable code," Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '98), San Diego, pp. 12-24, 1998.
[CrossRef] [SCOPUS Times Cited 91]

[31] W. Amme, P. Braun, F. Thomasset, E. Zehendner, "Data dependence analysis of assembly code," Int. J. Parallel Program., vol. 28, no. 5, pp. 431-467, 2000.
[CrossRef] [Web of Science Times Cited 12] [SCOPUS Times Cited 23]

[32] C. Wimmer, M. Franz, "Linear scan register allocation on SSA Form," Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '10), Toronto, pp. 170-179, 2010.
[CrossRef] [SCOPUS Times Cited 44]

[33] M. Puletto, V. Sarkar, "Linear Scan Register Allocation," ACM Trans. Program. Lang. Syst., vol. 21, no. 5, pp. 895- 913, 1999.
[CrossRef] [Web of Science Times Cited 170] [SCOPUS Times Cited 259]

[34] G. Matheou, P. Evripidou, "Verilog-based simulation of hardware support for data-flow concurrency on multicore systems," Proceedings of the 2013 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pp. 280-287, Samos, 2013.
[CrossRef] [SCOPUS Times Cited 6]

References Weight

Web of Science® Citations for all references: 3,511 TCR
SCOPUS® Citations for all references: 5,527 TCR

Web of Science® Average Citations per reference: 100 ACR
SCOPUS® Average Citations per reference: 158 ACR

TCR = Total Citations for References / ACR = Average Citations per Reference

We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more

Citations for references updated on 2024-11-17 18:58 in 208 seconds.

Note¹: Web of Science® is a registered trademark of Clarivate Analytics.
Note²: SCOPUS® is a registered trademark of Elsevier B.V.
Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site.

Copyright ©2001-2024
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania

All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.

Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.

Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.

Menu:

An Automatic Instruction-Level Parallelization of Machine Code