[N.sub.LINE] Number of bits used to index offset in a
cache line [N.sub.SLICE] Number of slices in LLC which equals the number of CPU cores.
Mapped memory blocks in
cache lines are determined by the mapping function.
The alignment of memory determines if there is a need to fetch the transactions or
cache lines. There are situations when one can access a structure containing a certain header at the beginning and the first thread will process a memory address with a different offset than zero [12].
Figure 4(b) shows an example of how the prefetch filtering mechanism works, and Figure 5 summarizes the states and transitions of the prefetch bit and the saturation counter for a particular
cache line.
Therefore, any two code fragments whose addresses differ by a multiple of the cache size are mapped to the same
cache line, and they cannot both be present in a direct-mapped cache simultaneously.
--The block dimensions RCB, RB, and CB, should all be multiples of the number of double-precision words that fit in a
cache line (LNSZ/_P).
Their cache filter is the same as Puzak's, but in addition to recording all read misses, their reduced trace also includes the first write to any clean
cache line. With both of these methods, the trace reduction factor is equal to the inverse of the cache miss ratio.
During this stage, the instruction fetcher retrieves two instructions at a time from the instruction cache (regardless of alignment), unless the address points to the last word of a
cache line, in which case only a single word is returned.
In the new integrated roofline model, memory transfers between caches and DRAM are computed based on the
cache line size.