Article - Feb 4, 1999.

Next - Previous - Home

The Iridium Experience - Performance Monitors

My job on the Iridium project was to help the systems engineers to improve system software performance and predictability during overload. During this development, many hardware/software performance monitors were implemented. In this article I examine advantages and disadvantages of the Iridium tools, in the hopes that future projects can learn from these experiences.

One area where UNIX lacks adequate tools is in software performance analysis. Sure, there is "prof" and "gprof", but these tools only take statistical samples. If you have an inline function that is a pig, or if your slow routines disable interrupts, you will never notice with these tools. If you have a function whose runtime varies by 1000x, you will never know with these tools. These tools are ill-suited to real-time performance analysis.

Of all the software engineers on earth, real-time software engineers write the most negative cycles. Thus, it is surprising that most real-time kernels have little or no support for performance testing. About the only kernel with good performance tools is VxWorks, in my experience. I guess the kernel vendors assume that most engineers have a good logic analyzer nearby. While a logic analyzr is critical in debugging kernel faults or MMU problems, most of them allow software performance analysis at the assembly-level only. In other words, there are no tools to find slow C++ functions to optimize.

The DAS Monitor

Tektronix makes a modular $50,000 logic analyzer called a DAS. These boxes monitor 64 to 512 wires at the same time. The probes can trigger on a wide variety of boolean expressions, and up to 256k samples can be stored. There are two huge problems with the DAS. First, at $50,000, you aren't going to buy them for all 100 engineers on your project. And, if your code runs on UNIX, the DAS won't work there. Second, the DAS is controlled by a clumsy, slow, and needless X-windows GUI. Tektronix really should implement a system like VxWorks consisting of low-level UNIX scriptable commands, and high-level Tcl/Tk GUI's that can be reprogrammed. Tek should halt GUI development while the scriptable commands are perfected. GUI's on software tools are really just decelerators for expert programmers.

The Iridium systems engineers chose a well-known memory address to dedicate to DAS monitoring. A macro was written to write code words to this location. Then, when the DAS was running, the software would look like

  DAS_WRITE(xyz_class_function_start);
xyz_class_function(arg1, arg2, arg3)
DAS_WRITE(xyz_class_function_end);

and the DAS was programmed to record (bus write, time sample) tuples. The cost of taking a sample was about 4 instructions. This allowed us to calculate the cost of running xyz_class_function(). The DAS codewords were stored as large enum lists in a C++ class, one class per subsystem. There were several problems with this setup.

The DAS log was pretty meaningless - it contained a series of timestamps and codewords. A PERL tool was written to substitute symbolic names in for the time stamps, by parsing the C++ class enumerations to reverse-map the codewords. The PERL software had several problems of its own. Clearly, the DAS and gcc made this technique cumbersome, and the PERL script was never completed to our satisfaction.

Since the Iridium system runs on both the PowerPC processor and on UNIX, I wrote a UNIX software DAS using the gettimeofday() function. This DAS used a giant lookup table (indexed by codeword) to store time samples. So, code under DAS analysis would macro expand as follows:

{
  das_array[ ARD_MALLOC_START ] = gettimeofday();
  
  // do some work to allocate memory (non-blocking)
  
  das_array[ ARD_MALLOC_END ] = gettimeofday();
}

The _START and _END enum codewords were set 500 indices apart, so that they could be correlated on-line. A DAS_Print() function was used to subtract codewords and print the most recent runtime deltas for functions on UNIX. This allowed us to tune primitives under UNIX, using the much better tool set, before porting over to pSOS. I could throw away samples damaged by context switches using my eyes. This UNIX performance analyzer was able to help in tuning the code several times.

In general, these performance analyzers were useful. In a few cases we found software bugs when a routine used an unexpected amount of time. Some routines that were supposed to be deterministic and fast were not, and this pointed out bugs in our implementation.

(C) 1999, Donald W. Gillies. All Rights Reserved.

Next - Previous - Home