next up previous
Next: Example Up: Timing Analysis for Data Previous: Results

 

Set-Associative Instruction Caches

Modern processors generally use instruction and data caches to bridge the increasing gap between ever-faster processors and only moderately faster memory. Most caches are split caches, i.e., instruction cache and data cache are separate. The level of associativity for such caches typically ranges between 1 and 8 [4].

The method of static cache simulation provides the means to predict the caching behavior of instruction and data references. This section formalizes the handling of set-associative instruction caches. An instruction is assigned a category for each loop level (i.e., always-hit, always-miss, first-hit or first-miss, as discussed previously in the context of data caches). The analysis for set-associative instruction caches is based on the following formal framework:



The traversal of every possible sequence of blocks leads to an exponential explosion. To avoid this overhead, we restrict the analysis to abstract cache states:



For direct-mapped caches, the ACS is a singleton set used to determine the category of an instruction describing the cache behavior. For an n-way set-associative cache, the ACS is an n-tuple of sets.

Given the control-flow information of a program and a cache configuration, the ACSs for each block have to be calculated. Using data-flow analysis (DFA), each block has an input state and an output state, corresponding to the ACS before and after the execution of the block, respectively. An iterative algorithm for the calculation of ACS' via DFA is given in [14]. Our DFA requires a time overhead comparable to that of inter-procedural DFA performed in optimizing compilers. The space overhead is O(pl * bb * fi * n), where pl, bb, fi, n denote the number of program lines, basic blocks, function instances, and cache associativity, respectively. Notice that set-associative caches impose a factor of n, which is typically very small () for instruction caches in contemporary architectures (for direct-mapped caches n=1). The correctness of iterative DFA has been discussed elsewhere [1]. Additional DFA is required to determine the linear cache state and the post-dominator set for each block before a definition for instruction categories can be given.



The forward control-flow graph is the acyclic graph resulting from the removal of back edges (backwards edges forming loops [1]) in the regular control-flow graph. Informally, the LCS represents the hypothetical cache state in the absence of loops. It will be used to determine whether a program line may be cached due to loops or due to the sequential control flow.




 

Informally, the post-dominator set describes the program lines certain to be reached from the current block, regardless of the taken paths in the control flow (see [1] for more details).

The instruction categories can now be formally defined (see Definition 5, at the top of the next page) and implemented rather efficiently once DFA has been performed. First, simple set operations on bit vectors suffice to test the conditions. Second, if one conjunct in a condition fails, the remaining ones are not tested. Third, the implementation orders the conjuncts such that the least likely ones are tested first. To motivate this definition, an informal description of the conditions shall be given (see [14] for algorithmic details).

Always-hit:
(on spatial locality within the program line) or ((the instruction is in cache in the absence of loops) and ((there are no conflicting instructions in the cache state) or (all conflicts fit into the remaining associativity levels))).

First-hit:
(the instruction was a first-hit for inner loops) or (it is potentially cached, even without loops and even for all loop preheaders, it is always executed in the loop, not all conflicts fit into the remaining associativity levels but conflicts within the loop fit into the remaining associativity levels for the loop headers, even when disregarding loops).

First-miss:
the instruction was a first-miss for inner loops, it is potentially cached, conflicts do not fit into the remaining associativity levels but the conflicts within the loop do.

Always-miss:
This is the conservative assumption for the prediction of worst-case execution time when none of the above conditions apply.




next up previous
Next: Example Up: Timing Analysis for Data Previous: Results

Robert Palmer
Mon May 19 10:18:36 EDT 1997