By Todd C. Mowry, Monica S. Lam and Anoop Gupta
Software-controlled data prefetching is a promising technique for improving the performance of the memory subsystem to match today's high-performance processors. While prefetching is useful in hiding the latency, issuing prefetches incurs an instruction overhead and can increase the load on the memory subsystem. As a result, care must be taken to ensure that such overheads do not exceed the bene ts.
This paper proposes a compiler algorithm to insert prefetch instructions into code that operates on dense matrices. Our algorithm identi es those references that are likely to be cache misses, and issues prefetches only for them. We have implemented our algorithm in the SUIF (Stanford University Intermediate Form) optimizing compiler. By generating fully functional code, we have been able to measure not only the improvements in cache miss rates, but also the overall performance of a simulated system. We show that our algorithm signi cantly improves the execution speed of our benchmark programssome of the programs improve by as much as a factor of two. When compared to an algorithm that indiscriminately prefetches all array accesses, our algorithm can eliminate many of the unnecessary prefetches without any signi cant decrease in the coverage of the cache misses.
This paper proposes a compiler algorithm to insert prefetch instructions into code that operates on dense matrices. Our algorithm identi es those references that are likely to be cache misses, and issues prefetches only for them. We have implemented our algorithm in the SUIF (Stanford University Intermediate Form) optimizing compiler. By generating fully functional code, we have been able to measure not only the improvements in cache miss rates, but also the overall performance of a simulated system. We show that our algorithm signi cantly improves the execution speed of our benchmark programssome of the programs improve by as much as a factor of two. When compared to an algorithm that indiscriminately prefetches all array accesses, our algorithm can eliminate many of the unnecessary prefetches without any signi cant decrease in the coverage of the cache misses.
No comments:
Post a Comment