We have implemented the debugging environment for an ideal processor with a cache hit time of one cycle and a cache miss time of ten cycles []. We are enhancing this implementation according to the design described in this paper to reflect the specifics of the MicroSPARC I instruction cache and to take pipelining into account. The correctness of resulting virtual time accounting during debugging will be verified by comparison with the observed program timing on a stand-alone VME board with a MicroSPARC I under a non-preemptive embedded real-time operating system []. The operating system is designed to exhibit predictable execution behavior and to provide more accurate timing than regular operating systems, such as UNIX.
At this point, an instrumented, optimized program runs at about 1-4 times the speed of the uninstrumented, unoptimized version that is typically used for debugging []. The number varies according to the ratio of program size and cache size. The additional work due to minimal dynamic pipeline simulation is expected to increase this overhead. Yet, the overhead should still be well below that of conventional hardware simulators.
We are also working on the design of a simulator for data caching. Under certain restrictions (e.g., absence of pointers and heap allocation) many addresses of data references can be calculated statically. This includes global data, local data allocated on the stack (in the absence of recursion), and certain patterns of array references. The effect of data caching should be included into the hardware simulation in a manner similar to the handling of instruction caching.