The problem of determining the execution time of programs has been the subject of some research in the past. Sarkar [] suggested a framework to determine both average execution time and its variance. His work was based on the analysis of a program's interval structure and its forward control flow. He calculated a program's execution time for a specific set of input data by using a description of the architecture and the frequency information obtained by incrementing counters during a profiling run. He assumed that the execution order of instructions does not affect this calculation. Thus, his method cannot capture the impact of caching on execution time.
For real-time systems, several tools to predict the execution time of programs have been designed. The analysis has been performed at the level of source code [], at the level of intermediate code [], and at the level of machine code []. Only Harmon's tool took the impact of instruction caches into account for restrictive circumstances, i.e. only for small code segments which entirely fit into cache.
Niehaus outlined how the effects of caching can be taken into account in the prediction of execution time []. He suggested that caches be flushed on context switches to provide a consistent cache state at the beginning of each task execution. He provided a rough estimate of the benefit of caches for speedup and tried to determine the percentage of instruction cache references which can be predicted as hits. The level of analysis remained at a very abstract level though as it only dealt with spatial locality for sequential execution and some temporal locality for simple loops. No general method to analyze the call graph of a task and control flow for each function was given.
A few attempts have been made to improve on the predictability of caches by architectural modifications to meet the needs of real-time systems. Kirk [] outlined such a system which relied on the ability to segment the cache memory into a number of dedicated partitions, each of which can only be accessed by a dedicated task. But this approach introduced new problems such as exhibiting lower hit ratios due to the partitioning and increasing the complexity of scheduling analysis by introducing another resource (cache partitioning) as an additional degree of freedom in the allocation process.
Other suggested architectural modifications often dedicate a bit in the instruction encoding which is used by the compiler to affect the cache behavior. McFarling [] used such an approach to exclude instructions from cache that were not likely to be in cache on subsequent references. Chi and Dietz [] introduced a data cache bypass bit on load and store instructions which, when set, indicates that the processor should go directly to memory (without caching the value as a side-effect) or goes to the cache when clear. Their idea is to improve execution speed by keeping data values either in registers or in cache, thus avoiding storage mirroring among the fasted components in the memory hierarchy (registers and data caches). Our work emphasizes instruction caches rather than data caches. In contrast to McFarling's study and the work by Chi and Dietz, we are primarily concerned about the predictability of instruction caching and secondarily about execution speed.