Evaluating the Performance of OpenMP Offloading on the NEC SX-Aurora TSUBASA Vector Engine
The NEC SX-Aurora TSUBASA vector engine (VE) follows the tradition of long vector processors for high-performance computing (HPC). The technology combines the vector computing capabilities with the popularity of standard x86 architecture by integrating it as an accelerator. To decrease the burden of code porting for different accelerator types, the OpenMP specification is designed to be single parallel programming model for all of them. Besides the availability of compiler and runtime implementations, the functionality as well as the performance is important for the usability and acceptance of this paradigm. In this work, we present LLVM-based solutions for OpenMP target device offloading from the host to the vector engine and vice versa (reverse offloading). Therefore, we use our source-to-source transformation tool sotoc as well as the native LLVM-VE code path. We assess the functionality and present the first performance numbers of real-world HPC kernels. We discuss the advantages and disadvantage of the different approaches and show that our implementation is competitive to other GPU OpenMP runtime implementations. Our work gives scientific programmers new opportunities and flexibilities for the development of scalable OpenMP offloading applications for SX-Aurora TSUBASA.
Cramer, T., Kosmynin, B., Moll, S., Römmer, M., Focht, E., & Müller, M. S. (2021). Evaluating the Performance of OpenMP Offloading on the NEC SX-Aurora TSUBASA Vector Engine. Supercomputing Frontiers and Innovations, 8(2), 59-74. https://doi.org/10.14529/jsfi210204