During cache hits, most instructions will account for one cycle execution overhead. For some instructions though, the execution takes multiple cycles. Since out-of-order execution is not permitted for the MicroSPARC I, the instruction pipeline will be stalled by such instructions. An instruction timing file can be used to distinguish instructions with varying execution overhead.
The main challenge of the simulation remains in capturing pipeline interlocks. These situations occur when an instruction in the pipeline stalls due to either a result that cannot yet be made available or to a resource conflict with another instruction in a later pipeline stage. For example, when a memory load into a register (assumed to be a cache hit) is followed by a reference to the same register, the referencing instruction will stall for one cycle until the value becomes available. This scenario is often referred to as a load-use interlock []. The MicroSPARC I interlocks the pipeline for a number of other instruction combinations [].
The traditional approach to detect pipeline conflicts employs resource vectors and reservation tables []. The resource vector for an instruction describes the processor resources during each pipeline stage of the instruction processing. A reservation table is a sequence of resource vectors whose interstruction processing is interleaved for as many pipeline stages as possible. The goal on a RISC processor is to process one instruction per cycle, unless two consecutive instructions try to access the same processor resource during a cycle. The reservation table can be used to detect these resource conflicts. Unfortunately, this traditional approach requires an instruction analysis at the level of pipeline stages. If this analysis was performed statically, one would have to generate reservation tables for instruction sequences along the possible control-flow paths. This approach would possibly impose a considerable overhead.
Our design involves a different approach. Pattern matching (similar to peephole optimization) can be applied to instruction patterns that match specified pipeline interlocks. The current environment includes a modified back-end of a compiler. The compiler back-end could be further modified to recognize patterns of instructions that cause pipeline interlocks. This approach has the advantage that instruction patterns can be determined once and for all for a given architecture and stored in a file. The size of the peephole window is bounded by the maximum delay possible for a pipeline conflict. The window my span instructions along the control-flow paths of the program. Upon detecting a pipeline stall, a pattern of instructions would be annotated with the number of cycles associated with the pipeline stall. The stall cycles could then be reported to the static hardware simulator to include them in the cycle accounting. For example, the patterns to describe a load-use resource conflict would be as follows.
ld *,reg[i] | ld *,reg[i]
* reg[i],*,* | * *,reg[i],*
The patterns describe a load instruction for register i, followed
by any instruction referencing register i, either as the first or
second operand.
During static hardware simulation, the interaction between pipeline stalls and caching for straight-line code (within a basic block) can be resolved statically as long as the caching behavior can also be determined statically. Pipeline stalls reaching across basic blocks or involving dynamically dependent caching behavior will have to be incorporated into the dynamic simulation process. This can be accommodated by additional state transitions but will impact the dynamic execution overhead.