next up previous
Next: Pipeline Simulation Up: Real-Time Debugging by Minimal Previous: The Debugging Environment

Instruction Cache Simulation

One part of tracking the elapsed number of cycles is the simulation of instruction caches. We are assuming an on-chip instruction cache of a configurable size with a direct-mapped architecture. Recent results have shown that direct-mapped caches have a faster access time for hits, which outweighs the benefit of a higher hit ratio in set-associative caches for large cache sizes []. As a result, current processor designs incorporate direct-mapped caches more frequently than set-associative caches.

The static hardware simulator includes a static cache simulation that predicts the caching behavior of instructions. Each instruction in a program is determined to be in one of the following categories:

The categorization of each instruction is performed via data-flow analysis based on the control-flow information and the cache configuration. Once the static cache simulator has categorized all instructions, it can emit code to instrument the program. The instrumentation includes simple frequency counting to track always hits, always misses, and first misses. It also includes local state transitions similar to a finite state automaton to determine at execution time whether a conflict results in a cache hit or a cache miss. The details of static cache simulation can be found elsewhere [, , ].

Our prior work dealt with an ideal RISC processor that exhibits a single cycle overhead for an instruction execution on a cache hit and a constant overhead for a cache miss (estimated at 10 cycles). Our current work concentrates on applying the hardware simulation to an existing processor and enhance the simulation to include the effects of pipelining in a retargetable fashion.

The chosen target processor is the MicroSPARC I [], which is commonly used in Sun SPARCclassic workstations. The processor has an on-chip direct-mapped instruction cache of 4kB with a line size of 8 instructions. Assuming the absence of pipeline stalls (discussed in the next section), the following timing behavior can be assumed for cache hits and misses. A cache hit will generally take one cycle. A cache miss on a cache line causes the first and second instructions of the line to become available after a 7 and 8 cycle delay, respectively, then a dead cycle occurs, and the process is repeated for the remaining instructions of the line (with a repeated dead cycle). A cache miss results in bringing an entire program line into cache. This process cannot be interrupted, even if the instructions in the program line do not coincide with the control flow at some point.

For example, consider a sequence of instructions in a program line containing a label L1 for instruction 3 and an unconditional jump instruction for instruction 5 as depicted in Figure 2. Assume that the current instruction transfers control to label L1 via a branch. Furthermore, assume that the depicted program line is not in cache. The reference to instruction 3 at L1 will cause a cache miss. Thus, instruction 3 will be available after a 7 cycle delay, instruction 4 one cycle afterwards, followed by a dead cycle. The jump instruction 5 and instruction 6 in the delay slot will be available after 10 and 11 cycles, respectively. At this point, control is transferred to L2 but the rest of the program line is brought into cache until cycle 17. The execution may continue in parallel at L2 if the instruction reference at L2 results in a cache hit. Conversely, a cache miss at L2 cannot be resolved until the previous program line is cached. Thus, if a miss occurs for the instruction at L2, a memory fetch is requested only after cycle 17. The instruction at L2 becomes then available for the processor at cycle 24.

  figure48
Figure 2: Cache Miss

The timing overhead of a cache miss depends on the taken control-flow path, as illustrated by the example. The following cases have to be distinguished:

As mentioned earlier, the cycle accounting for conflict instructions is performed via state transitions at execution time. These transitions have to reflect all possible paths in the control flow if a cache miss occurs. These details can be automatically determined by the static cache simulator based on a configuration file containing the cache configuration and a description of the memory access protocol on cache misses.


next up previous
Next: Pipeline Simulation Up: Real-Time Debugging by Minimal Previous: The Debugging Environment

Robert Palmer
Mon May 19 10:21:18 EDT 1997