CS25 - Lab 2
General Write-Up

Abstract:

In this lab we tested varios portions of the computer's cache and architecture characteristics and compared it to other computers. Specifically, we tested the L1 cache under various stress tests, determined the endian-ness of the computers (AMD, Pentium 4, G4), found out the cache structure for those computers (also including the UltraSparc), timed various degrees of integer and floating point precision, and did timing tests relating to a superscaler architecture.

How did you implement the cache stress programs and how do you quantify the effects of having a cache?

We implimented the cache stress programs though the implimenting the four cases outlined in part 3 of the lab. We forced misses in the cache by continuously requesting lines of memmory over several cacheblocks so that by the time a line of memory is requested a second time it will no longer be in the L1 cache.

For each processor, the time per loop increased over each test. The only exception was the Athlon XP who's Baseline II performance was slightly better than it's Baseline I performance. What was interesting was that on the 2-way nice test, the time for each processor would lead one to belive that they each performed worse than on the cache stress test when the cache strees test should have created the worst time, especially since each processor implemented multiple way set-associative cache. What the time per loop doesn't account for is the fact that there were more addition operations per loop in the 2-way nice test, and taking this into account, the times were very quick.

The presesnce of a cache made a fairly significant difference on the performance of each of the computers sytems. The least significant difference between cache performace times was in the G4. The cache stress test was only 22% longer per loop than the Baseline I test on the G4 but over took over twice as long on the Athlon and almost twice as line on the P4 per loop than their respective Baseline I times.

This might have been due to different types of L2 caches as well as different cache architecture between the systems

How did you implement the superscalar test programs and how do you quantify the effects of having a superscalar architecture?

Since x86 assembly code won't run on a G4, nor sparc, we wrote the code in plain C and checked the assembly output to make sure stuff wasn't optimized away. The results for the test programs were what one would expect for a superscalar acrchitecture. The Baseline I test without dependecies ran in the smallest amount of time and the Dependency 2 test ran in the slowest amount of time while the two tests with one dependancy ran somewhere in between. What was interesting was that the test with the Dependency 3 test ran faster than the Baseline II eventhough they each only have one dependency.

How might what you learn in this lab affect your programming style in the future?

The row-major/column-major tests (Part4) will definatly effect how I construct two dimesional vectors in my future programs. The ramifications on system performance were very clear in the results. They reinforced many of the rules programming rules we learn in intro classes but have no explinaion for. Also, the benifits/drawbacks of writing code that is optimized for one system over another.

Describe any lab extensions or further comparisons you decided to make to test computer performance.

For our lab extensions we ran all of our tests on at least one more machine than was required. This allowed us to better understand the effects of running under different architectures. We also disassembled our output to find out what code our benchmarks are running.