Static cache simulation and timing analysis were performed for instruction caches for 1/2/4/8-way set-associative caches with 16/8/4/2 lines, respectively, and a line size of 16 bytes. Thus, each cache configuration has the equivalent storage capacity of 256 bytes, which was chosen to model a realistic ratio of program size and cache size (from 2:1 to 9:1). The estimated number of cycles for a program execution was derived from static cache simulation and timing analysis without program execution. This number is compared to the number of observed cycles obtained by a trace-driven cache simulation. In the latter case, the program was executed with its worst-case input data. The miss penalty was assumed to be 9 cycles [7]. (For the numbers reported here, pipeline simulation of the timing analyzer was intentionally disabled to isolate the effects of caching.)
Table 3 shows the results of WCET prediction for a 4-way associative cache with 8 lines. The other cache configurations mentioned before yield similar results in terms of the ratios and are therefore omitted. The programs are described in Table 2. The observed cycles during program execution (column 2) are slightly less than the number of cycles estimated by our tools (column 3). The ratio between estimated and observed cycles (column 4) shows that our method yields tight estimations, sometimes even exact ones. The naive ratio (column 5) simulates a disabled cache and was calculated by assuming that all data cache references were misses and dividing those cycles by the observed cycles. It shows that an overestimation of the WCET of 9.25 times on average for the naive cache is reduced to only a slight overestimation of 1.32 with our approach, i.e., when caches are enabled and included in the WCET prediction. The results for some programs require further explanation.
The timing analysis overestimates program Sorta due to a loop with a varying number of iterations as described in Section 3.5. Likewise, Des causes an overestimation due to a data dependency that was also previously described. For the programs Matcnta, Matsuma and Statsa, the number of cycles was slightly overestimated. The programs Matcnta and Matsuma contain conditional control flow and would require exhaustive analysis of all permutations of execution paths to yield more accurate results. Such an approach would result in exponential complexity. Instead, the timing analyzer approximates the execution times conservatively using a fix-point algorithm (see [14]). This trade-off between accuracy and feasible time complexity still results in relatively tight but not always precise estimations. The program Statsa suffers from an overly conservative categorization due to a program line crossing a function boundary. Nonetheless, the conservative category results in safe estimates that remain very tight.
Table 3: Worst-Case Execution Times
We also measured the average ratio between estimated and observed cycles for cache associativities between 1 and 8 and observed that this ratio is independent of the level of associativity. Furthermore, the distribution of the instruction categories, averaged over the test set, varied only insignificantly for different levels of associativity. Thus, the presented method for WCET predictions yields tight results regardless of the associativity of caches. Finally, we observed that the overhead of performing static cache analysis increases linearly with the level of cache associativity. The increase can be attributed to the overhead of bit-vector operations implementing the DFA. The performance overhead for direct-mapped caches is extremely low (about 200 ms) and is still respectable (about 1.7 sec) for n=8, the largest associativity found in today's processors [14]. Thus, static cache simulation is an adequate method to model caches for WCET predictions for contemporary architectures efficiently.