Index Dependent Nested Loops Parallelization with an Even Distributed Number of Steps
Abstract
Full Text:
PDFReferences
T. Schmidt, J. Stoye, Quadratic time algorithms for finding common intervals in two and more sequences, Proc of CPM 2004 LNCS 3109(2004) 347–358.doi:10.1007/978-3-540-27801-6_26.
C. Lengauer, Loop parallelization in the polytope model, CONCUR’93 (1993) 398–416.
K. Bondalapati, Parallelizing dsp nested loops on reconfigurable architectures using data context switching, Proceedings of the 38th DesignAutomation Conference (IEEE Cat. No.01CH37232) (2001).doi:10.1145/378239.378483.
J. Ramanujam, Optimal software pipelining of nested loops, in: Proceedings of 8th International Parallel Processing Symposium, 1994, pp.335–342.doi:10.1109/IPPS.1994.288280.
S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, P. Sadayappan, Effective automatic parallelization of stencilcomputations, Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (2007) 235–244doi:10.1145/1250734.1250761.
B. J. Gaska, N. Jothi, M. S. Mohammadi, K. Volk, Handling nested parallelism and extreme load imbalance in an orbital analysis code,arXiv:1707.09668 (2017).
B. Bylina, J. Bylina, Strategies of parallelizing nested loops on the multicore architectures on the example of the wz factorization for thedense matrices, Proceedings of the Federated Conference on Computer Science and Information Systems (2015) 629–639doi:10.15439/2015F354.
A. Kejariwal, P. D’Alberto, A. Nicolau, C. D. Polychronopoulos, A geometric approach for partitioning n-dimensional non-rectangular iterationspaces, in: Languages and Compilers for High Performance Computing, Springer Berlin Heidelberg, Berlin, Heidelberg, 2005, pp. 102–116.
V. Beletskyy, M. P. Parallelizing, Perfectly nested loops with non-uniform dependences (2002).
J. Liu, J. Wickerson, G. A. Constantinides, Loop splitting for efficient pipelining in high-level synthesis, Proceedings of the Federated Con-ference on Computer Science and Information Systems (2016).doi:10.1109/FCCM.2016.27.
S. D. Polychronopoulos, Loop coalescing: a compiler transformation for parallel machines, University of Illinois at Urbana-Champaign, Centerfor Supercomputing Research and Development, Technical Report CSRD-635 (1987).
M. K. Philippe Clauss, Ervin Altıntas, Automatic collapsing of non-rectangular loops, Parallel and Distributed Processing Symposium(IPDPS), Orlando, United States (2017) 778–787.
N. Kafri, J. A. Sbeih, Simple optimal partitioning approch to perfect tringular iteration space, Proceedings of the 2008 High Performance,Computing & Simulation Conference©ECMS, Waleed W. Smari (Ed.), ISBN: 978-0-9553018-7-2 / ISBN: 978-0-9553018-6-5 (CD) (2008)124–131.
R. Sakellariou, A compile-time partitioning strategy for non-rectangular loop nests, Proceedings of the 11th International Parallel ProcessingSymposium, IPPS’97 (Geneva), IEEE Computer Society Press (1997) 633–637.
A. Jackson, O. Agathokleous, Dynamic loop parallelisation, arXiv:1205.2367 (2012).
T. H. Tzen, L. M. Ni, Dependence uniformization: a loop parallelization technique, IEEE Transactions on Parallel and Distributed Systems4 (5) (1993) 547–558.doi:10.1109/71.224217.
B. Bylina, J. Bylina, Parallelizing nested loops on the intel xeon phi on the example of the dense wz factorization, Proceedings of the FederatedConference on Computer Science and Information Systems 8 (2016) 655–664.doi:10.15439/2016F436.
M. Wolf, M. Lam, A loop transformation theory and an algorithm to maximize parallelism, IEEE Transactions on Parallel and DistributedSystems 2 (4) (1991) 452–471.doi:10.1109/71.97902.
H. Blinn, How to solve a quadratic equation?, IEEE Computer Graphics and Applications (Volume: 25 , Issue: 6) (2005)
DOI: https://doi.org/10.31449/inf.v45i4.3130
This work is licensed under a Creative Commons Attribution 3.0 License.