The iterative algorithm in Figure 2 was used to calculate the abstract cache states. Each UP has an input and output state of program lines that can potentially be in cache at that point. Initially, the input states of the top paths (entry paths of the main function) are set to all invalid lines. The input state of a path is calculated by taking the union of the output states of its immediate predecessors. The output state of a path is calculated by taking the union of its input state and the program lines accessed by the path and subtracting the program lines with which the path conflicts. This calculation includes the interprocedural propagation of abstract cache states (not explicitly shown in the algorithm): At a path p with a call to function f as the last instruction, the output state of the UP is propagated to the input states of the entry paths of the corresponding function instance f(i). Similarly, the union of the output states of f(i)'s exit paths provides the input state for p's successor paths.

Figure 2: Algorithm to Calculate Cache States
Figure 3: Example with Flow Graph
The algorithm is a variation of an iterative data-flow analysis
algorithm commonly used in optimizing compilers. Thus, the time
overhead of the algorithm is comparable to that of data-flow analysis
and the space overhead is O(pl*UPs*fi), where pl is the number of
program lines, UPs is the number of paths, and fi the number of
function instances. The correctness of the algorithm for
data-flow analysis is discussed in []. The calculation can be
performed for an arbitrary control-flow graph, even if it is
irreducible. In addition, the order of processing basic blocks is
irrelevant for the correctness of the algorithm. The reaching states
can be calculated using the same base algorithm with
input_state(main) = conf_lines(UP) =
.
Figure 3 depicts the calculation of input and output states. The paths are restricted to basic blocks to simplify the example. In the example, there are 4 cache lines and the line size is 16 bytes (4 instructions). Thus, program line 0 and 4 map into cache line 0, program line 1 and 5 map into cache line 1, program line 2 maps into cache line 2, and program line 3 maps into cache line 3. The immediate successor of a block with a call is the first block in that instance of the called function. Block 8a corresponds to the first instance of foo() called from block 1 and block 8b corresponds to the second instance of foo() called from block 5. Two passes are required to calculate the input and output states of the blocks, given that the blocks are processed in the order shown in Figure 3. Only the states of some blocks inside the loop change on the second pass. Pass 3 results in no more changes.
After determining the input states of all blocks, each instruction is
categorized based on its abstract cache state (derived from the input
state) and the reaching state.
By
inspecting the input states of each block, one can make some
observations that may not have been detected by a naive inspection of
only physically contiguous sequences of references. For instance, the
static simulation determined that the first instruction in block 7
will always be in cache (always hit) due to spatial locality. This can
be determined by observing that line 4 is in in(7) and no
conflicting program line is in in(7). It was also determined
that the first instruction in basic block 8b will always be in cache
(always hit) due to temporal locality. The static simulation
determined that the last instruction in block 3 will not be in cache
on its first reference, but will always be in cache on subsequent
references (first miss). This is indicated by in(3), which
includes program line 2 but also a conflicting program line
``invalid'' for cache line 3. Yet, the conflicting program line cannot
be reached. This is also true for the first instructions of block 5
and 6, though a miss will only occur on the first reference of either
one of the instructions. This is termed a group first miss.
Finally, the first instruction in block 3 is classified as a conflict
since it could either be a hit or a miss (due to the conditional call
to foo). This is indicated by in(3), which includes program
line 1 and a conflicting program line 5 that can still be reached.
The current implementation of the static cache simulator imposes some restrictions. First, only direct-mapped cache configurations are supported. Recent results have shown that direct-mapped caches have a faster access time for hits, which outweighs the benefit of a higher hit ratio in set-associative caches for large cache sizes []. Second, context switches cannot be simulated using this method. Third, recursive programs currently are not allowed since cycles in the call graph would complicate the generation of unique function instances. Finally, indirect calls are not handled since the static cache simulator must be able to generate an explicit call graph.