Selectively GPU cache bypassing for un-coalesced loads

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

GPUs are widely used to accelerate general purpose applications, and could hide memory latency through massive multithreading. But multithreading can increase contention for the L1 data caches (L1D). This problem is exacerbated when an application contains irregular memory references which would lead to un-coalesced memory accesses. In this paper, we propose a simple yet effective GPU cache Bypassing scheme for Un-Coalesced Loads (BUCL). BUCL makes bypassing decisions at two granularities. At the instruction-level, when the number of memory accesses generated by a non-coalesced load instruction is bigger than a threshold, referred as the threshold of un-coalescing degree (TUCD), all the accesses generated from this load will bypass L1D. The reason is that the cache data filled by un-coalesced loads typically have low probabilities to be reused. At the level of each individual memory access, when the L1D is stalled, the accessed data is likely with low locality, and the utilization of the target memory sub-partition is not high, this memory access may also bypass L1D. Our experiments show that BUCL achieves 36% and 5% performance improvement over the baseline GPU for memory un-coalesced and memory coherent benchmarks, respectively, and also significantly outperforms prior GPU cache bypassing and warp throttling schemes.

Original languageEnglish
Title of host publicationProceedings - 22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016
EditorsXiaofei Liao, Robert Lovas, Xipeng Shen, Ran Zheng
PublisherIEEE Computer Society
Pages908-915
Number of pages8
ISBN (Electronic)9781509044573
DOIs
StatePublished - 2 Jul 2016
Event22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016 - Wuhan, Hubei, China
Duration: 13 Dec 201616 Dec 2016

Publication series

NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
Volume0
ISSN (Print)1521-9097

Conference

Conference22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016
Country/TerritoryChina
CityWuhan, Hubei
Period13/12/1616/12/16

Keywords

  • Cache Bypassing
  • Data Cache
  • GPU
  • Memory divergence
  • Un-Coalesced Load Instruction

Fingerprint

Dive into the research topics of 'Selectively GPU cache bypassing for un-coalesced loads'. Together they form a unique fingerprint.

Cite this