Employing an on-chip network in a manycore system (to improve scalability) makes the latencies of data accesses issued by a core non-uniform, which significant impact application performance. This paper presents a compiler strategy which involves exposing architecture information to the compiler to enable optimized computation-to-core mapping. Our scheme takes into account the relative positions of (and distances between) cores, last-level caches (LLCs) and memory controllers (MCs) in a manycore system, and generates a mapping of computations to cores with the goal of minimizing the on-chip network traffic. Our experiments of 12 multi-threaded applications reveal that, on average, our approach reduces the on-chip network latency in a 6x6 manycore system by 49.5% in the case of private LLCs and 52.7% in the case of shared LLCs. These improvements translate to the corresponding execution time improvements of 14.8% and 15.2% for the private LLC and shared LLC based systems.
|Title of host publication||Proceedings - 26th International Conference on Parallel Architectures and Compilation Techniques, PACT 2017|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||2|
|Publication status||Published - 2017 Oct 31|
|Event||26th International Conference on Parallel Architectures and Compilation Techniques, PACT 2017 - Portland, United States|
Duration: 2017 Sept 9 → 2017 Sept 13
|Name||Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT|
|Other||26th International Conference on Parallel Architectures and Compilation Techniques, PACT 2017|
|Period||17/9/9 → 17/9/13|
Bibliographical notePublisher Copyright:
© 2017 IEEE.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Hardware and Architecture