next up previous
Next: Bit-Encoding Approach Up: Static Cache Simulation Previous: Instruction Categorization

Implementation

The iterative algorithm in Figure 2 was used to calculate the abstract cache states.

  figure81
Figure 2: Algorithm to Calculate Cache States

Each basic block has an input and output state of program lines which can potentially be in cache at that point. Initially the top block's input state (entry block of the main function) is set to all invalid lines. The input state of a block is calculated by taking the union of the output states of its immediate predecessors. The output state of a block is calculated by taking the union of its input state and the program lines accessed by the block and subtracting the program lines with which the block conflicts. The calculation of these abstract cache states requires a time overhead comparable to that of data-flow analysis used in optimizing compilers and a space overhead linear to the number of program lines, basic blocks, and function instances. The correctness of the algorithm for data-flow analysis is discussed in []. The calculation can be performed for an arbitrary control-flow graph, even if it is irreducible. The order of processing basic blocks is irrelevant.

Figure 3 illustrates a simple example of calculating input and output states.

 
 

Figure 3: Example with Flow Graph

Assume there are 4 cache lines and the line size is 16 bytes (4 instructions). The immediate successor of a block with a call is the first block in that instance of the called function. Block 8a corresponds to the first instance of foo() called from block 1 and block 8b corresponds to the second instance of foo() called from block 5. Two passes are required to calculate the input and output states of the blocks, given that the blocks are processed in the order shown in Figure 3. Only the states of blocks 3, 4, and 5 changed during the second pass. Pass 3 results in no more changes. The reaching states are as follows: Block 7 cannot reach any program lines, and all other blocks can reach lines 1 to 5. The calculation of the reaching states can be performed by the same algorithm with input_state(top) = conf_lines(B) = tex2html_wrap_inline1091.

After determining the abstract cache states (input states) of all blocks, each instruction is categorized according to the criteria specified in the previous section. By inspecting the states of each block, one can make some observations that may not be detected by a naive inspection of only physically contiguous sequences of references. For instance, the static simulation determined that the first instruction in block 7 will always be in cache (always hit) due to spatial locality. It also determined that the first instruction in basic block 8b will always be in cache (always hit) due to temporal locality. The last instruction in block 3 will not be in cache on the first reference, but will always be in cache on subsequent references (first miss). This is also true for the first instruction of block 5 and the first instruction of block 6, though a miss will only occur on the first reference of either one of the instructions. This situation is termed a group first miss. The first instruction in block 3 is classified as a conflict since it could either be a hit or a miss. The line is in conflict with the second instruction of block 8b, an always miss, due to the conditional execution of the call to foo() in block 5.

The current implementation of the static simulator imposes some restrictions. First, only direct-mapped cache configurations are allowed. Recent results have shown that direct-mapped caches have a faster access time for hits, which outweighs the benefit of a higher hit ratio in set-associative caches for large cache sizes []. Another restriction is that recursive programs are not allowed since cycles in the call graph would complicate the generation of unique function instances. Finally, indirect calls are not handled since the static simulator must be able to generate an explicit call graph.


next up previous
Next: Bit-Encoding Approach Up: Static Cache Simulation Previous: Instruction Categorization

Robert Palmer
Mon May 19 10:08:14 EDT 1997