Your program should test at least the following four cases.
- Baseline I: access each byte of a cache-sized block of memory in order.
- Baseline II: access the first byte of each line of cache of a cache-size
block of memory in order.
- Cache Stress: force cache misses on every access.
- 2-way Nice: access memory in such a way that a 2-way set associative
cache would not miss, but a direct mapping will always miss.
For each processor, the time per loop increased at the test progressed. Which is what we expected. The only exception was the Athlon XP who's Baseline II performance was slightly better than it's Baseline I performance. What was interesting was that on the 2-way nice test, the time for each processor would lead one to belive that they each performed worse than on the cache stress test when the cache strees test should have created the worst time, especially since each processor implemented multiple way set-associative cache. What the time per loop doesn't account for is the fact that there were more addition operations per loop in the 2-way nice test, and taking this into account, the times were very quick.
AMD Athlon XP
Cache Size 65536 bytes, Bytes/Line 64 |
Test Name |
Total Time (seconds) |
Time per Loop (nseconds) |
Baseline I |
0.707s |
7.1 ns/loop |
Baseline II |
0.682s |
6.8 ns/loop |
Cache Stress |
1.706s |
17.1 ns/loop |
2-way Nice |
3.304s |
33.0 ns/loop |
Pentium 4
Cache Size 8192 bytes, Bytes/Line 64 |
Test Name |
Total Time (seconds) |
Time per Loop (nseconds) |
Baseline I |
0.896s |
9.0 ns/loop |
Baseline II |
0.902s |
9.0 ns/loop |
Cache Stress |
1.026s |
10.3 ns/loop |
2-way Nice |
1.716s |
17.2 ns/loop |
PowerPC G4
Cache Size 32768 bytes, Bytes/Line 32 |
Test Name |
Total Time (seconds) |
Time per Loop (nseconds) |
Baseline I |
5.187s |
51.9 ns/loop |
Baseline II |
5.753s |
57.5 ns/loop |
Cache Stress |
6.367s |
63.7 ns/loop |
2-way Nice |
9.787s |
97.9 ns/loop |
|