Documentation of Interactive CACTI

Web-Based Interface for the CACTI Cache Access and Cycle Time Model

INTRODUCTION

The WWW has become the most popular and comprehensive source for conveying information. Due to its high availability and attractive look, many companies and internet users put their information on the WWW in order to allow the intended users to access them. The objective of this project is to create a graphical version of CACTI, a model of estimating cache access and cycle times developed by Dr. Wilton and the researchers at the DEC Western Research Lab, and allow users to use the model directly from a WWW page.

In 1993, Dr. Wilton and Dr. Jouppi worked on a project for improving the currently available cache access time model, called the Wada model, in the DEC Western Research Lab. After a careful study of the Wada model, several disadvantages of Wada’s mode l were discovered and resulted in an error of cache access time estimation. The CACTI model is aimed to enhance Wada’s model by correcting the assumed cache structure and modeling techniques. In addition, cache cycle time is also calculated in CACTI for providing a broader view of cache design comparison. Currently, CACTI is the most popular model for cache access and cycle time estimation in both industrial and academic research labs worldwide.

This project is a continuation of this work. The goal is to improve the user interface and availability of the CACTI program. In order to accomplish the enhancements of the existing CACTI program, the Internet programming language Java is chosen bec ause it is similar to C, has comprehensive graphics library and allows its applets to run on the WWW and even to communicate with each other. By utilizing these advantages of Java, we are able to make the CACTI program available in the Internet and creat e a more interactive user interface which features an output table, a bar chart and a cache picture for displaying the results of CACTI.

In the following sections, a brief introduction of the CACTI model will be presented first, and the remainder of the report will focus on the methodology, design and implementation employed on this project.

INTRODUCTION TO THE CACTI MODEL

The CACTI model is an analytical model that predicts the access and cycle time of both direct-mapped and set-associative caches as a function of different cache parameters, process parameters and array organization parameters. In this section, we will discuss the CACTI model followed by its applications and online graphical program.

2.1 The CACTI Model

There are five input parameters used in the CACTI model:

C: Cache size in bytes
B: Block size in bytes
A: Associativity

b_o: Output width in bits
b_addr: Address width in bits

Figure 1(a) shows a cache array, where B is the block size (in bytes), A is the associativity, and S is the number of sets (S = C/ (B*A)). In order to minimize the access time, six array organization parameters determine how the cache array can b e broken optimally. For example, Ndwl indicates how many times the data array has been cut with vertical lines (making more, but shorter wordlines), and Ndbl indicates how many times the data array has been cut with horizontal lines (causing shorter bitl ines). The total number of subarrays is Ndwl * Ndbl. The parameter Nspd, as illustrated in Figure 1(b), indicates how many sets are mapped to a single wordline in the data array and allows the overall access time of the array to be changed without splitt ing it into smaller subarrays. There are also three tag array parameters: Ntwl, Ntbl and Ntspd.

^{1Figure 1: Cache Array Organization}

In the CACTI model, an SRAM cache is modeled to calculate the cache access and cycle times. The internal structure of an SRAM cache is shown in Figure 2.

^{2Figure 2: Cache Structure}

In term of access time, a cache can be divided by two major components, the data and tag sides. The purpose of the data side is to output a block of data that the address line specifies. The delay of the data side for both direct-mapped and set-ass ociative caches is the following:

T_dataside = T_decoder,data + T_{wordline,data} + T_bitline,data + T_{sense_amp,data}

The purpose of the tag side is to select a portion of the data from data side by comparing the address bits with the tag bits. The time equation of tag side for a direct-mapped cache is:

T_tagside,dm = T_decoder,tag + T_wordline,tag + T_bitline,tag + T_{sense_amp,tag} + T_comparator + T_{valid_output}

In a set-associative cache, the time equation of tag side is:

T_tagside,sa = T_decoder,tag + T_wordline,tag + T_bitline,tag + T_{sense_amp,tag} + T_comparator + T_{mux_driver} +

T_{output_driver}

For a direct-mapped cache, the access time is the time of the longest path through the tag array or the data array:

T_access,dm = max (T_dataside + T_{output_driver,data} , T_tagside,dm)

For a set-associative cache, the output driver writes the data signals only after the tag array is read. Thus, the access time of a set-associative cache is:

T_access,sa = max (T_dataside, T_tagside,sa) + T_{output_driver,data}

The cycle time of a cache is the access time added with the time to precharge the bitline, comparator, and internal decoder bus. The equation of the cycle time for both direct-mapped and set-associative caches is:

T_cycle = T_access + T_precharge

In order to calculate the above time components, the circuit of a cache is decomposed into many equivalent RC circuits, and the delay of each equivalent RC circuit is estimated by simple RC equations. In addition, the CACTI model uses the optimal arra y organization parameters to find the shortest access time for each cache size.

2.2 Applications

The CACTI model has already been implemented into a C program which takes three input parameters: cache size, block size and associativity. When the program is run, it outputs not only the access and cycle times, but also the individual tim e components of the data and tag sides and the array organization parameters that give the smallest access time.

The data resulted from the CACTI program is especially helpful in designing cache architecture. Consider Figure 3 and 4 which show how the cache size affects the cache access and cycle times in a direct-mapped and 4-way set-associative cache. Figure 5 indicates how the access and cycle times are affected by block size (the cache size is kept constant). Finally, consider Figure 6 which shows how the associativity affects the access and cycle time of 16 KB and 64 KB cache.

^{3Figure3: Access/Cycle for Direct Mapped Cache ⁴Figure4: Access/Cycle for Set-Associative Cache

^{5Figure5: Access/Cycle as a Function of Block Size ⁶Figure6: Access/Cycle as a Function of Associativity

By comparing the CACTI model to an Hspice model, the model was shown to be accurate to within 10%. Since the computational complexity of the model is much less than Hspice, the model was shown to be over 100, 000 times faster than Hspice. Thus, the m

odel can greatly facilitate the work of circuit simulation and let cache designers concentrate on their design work.

2.3 Online Graphical CACTI Program
The goal of this project is to implement an online graphical CACTI program that still keeps all the features of the previous C program. In addition, the new program contains the following new features:

Execution on The WWW
Online Input Parameters Collection
Pull-Down Input Menu
A Table of Output Values
A Bar Chart of The Time Breakdown
A Picture of Physical Cache Layout

In the following section, we will present what methods are being used in this project to accomplish the above new features.

3. METHODOLOGY

In this section, we will discuss how the online CACTI model was developed. This includes why we use Java applets and how its graphical user interface helps our design. In addition, we will also discuss the runtime behavior and performance of our imple

mentation.

3.1 Java Applets
The primary advantages of Java are simple, portable and compatible. Its similarity to C/C++ lets programmers learn quickly and eases code migration. Also, some features available in C/C++ are removed by Java designers to make the language se

cure, small and familiar. For instance, Java does not have pointers. Therefore, the programmer does not need to worry about dangling pointers, invalid pointer references, and memory leaks.

Because Java is a distributed language, it supports applications on networks. This feature fulfills the needs of large amount of contemporary web page applications. By including Java applets in an HTML document, one can provide interactive and executa

ble content on a Web page. This makes Java suitable to be used for our program implementation.

However, Java is also an interpreted language, meaning it will not be as fast as a compiled language. In fact, Java is on the average about 10 times slower than C.⁷ However, this speed is often adequate to run interactive, GUI and network-b

ased applications, where the application is often idle, waiting for the user to do something, or waiting for data from the network.

While Java is not as fast as a compiled language, it does provide an architecture-neutral environment in which Java applications can be able to run on many different kinds of systems found on the Internet. Also, applications can even have the appropria

te appearance and behavior for each platform. All of these advantages were important reasons in our decision to use Java.

3.2 Applet User Interface
The graphical user interface of Java applets is a very important element in creating sophisticated images for both better presentation and impression. The Abstract Windowing Toolkit (AWT), created by Java designers, allows Java programmers

to build GUIs very easily. AWT contains many built-in classes and subclasses. For example, the Component class has a tremendous number of methods that all of the classes that subclass it share. These methods perform tasks such as making the control visibl

e, enabling or disabling it, or resizing it. By making good use of these AWT classes, we can provide different kinds of graphics such as buttons, checkboxes, menus, scrollbars or even charts.

These graphics are essential for this online CACTI project because we must use bar chart to illustrate the time components of the cache arrays and use shapes and polygons to show the revised cache array structures.

4. USER INTERFACE DESIGN

In order to enhance the user interface of the CACTI program, Java’s graphics library has been used to create an interactive and user-friendly input and output displays. In this section, the input and output features of the program will be presented.

4.1 Input Features
There are inputs to the program. We provide pull-down menus for the section of cache size, block size, and cache associativity. In addition, the user can optionally enter the output width and address width. Figure 7 shows the input interf

ace.

Figure 7: Input Interface of The CACTI Program

4.2 Output Features
Like the C version of CACTI, our Java version shows a table of results. Unlike the C version, our Java version of CACTI also displays a bar chart showing a breakdown of the delay of each cache component and a picture of physical cache layout

. The addition of the bar chart and the picture of cache layout enhance the interpretation of the table of results and thus, users will have better understanding of the output results. In the following, we will discuss how these additional output features

improve the presentation of the CACTI model.

4.2.1 Table of Output Values
The new table of results follows the same format as that of original CACTI program, but the order of results is rearranged. For example, time components are organized into groups of the data side and tag side. Also, by using different colors

on screen, we distinguish the results into different categories: cache parameters, cache organization parameters, time, data side and tag side. Figure 8 illustrates the output interface.

Figure 8: Output Table of The CACTI Program

4.2.2 Bar Chart of Time Breakdown
The bar chart, as shown in Figure 9, compares the time breakdown of data and tag sides. The delay of the data side is indicated by the left bar and the delay of the tag side is indicated by the right bar.

Figure 9: Bar Chart of The CACTI Program

Each time component is shown in different color. The following list illustrates the color of each time component.

Both Data and Tag Sides

Decoder: Blue

Wordline: Green
Bitline: Gray
Sense_Amp: Yellow

Data Side Only

Data Output: Orange

Tag Side Only

Compare: Cyan
Drive_Valid: Magenta (appear only when A = 1)

Drive_Mux: Magenta (appear only when A > 1)

Sel_Inverter: Pink

Since drive_valild appears only if A (associativty) equals 1 and drive_mux appears only if A is larger than 1, using the same color for both components does not cause confusion.

The height of each bar indicates the delay for each side. The total time (in nanosecond) of each side is printed on top of each bar. A red line is used to indicate the cache access time. For a set-associative cache, the data output component is shown

between the two bars, and the cache access time is printed on top of the data output bar. In addition, a black line is used to indicate the time till select or before mux only for a set-associative cache. Figure 10 shows a bar chart for a direct-mapped

cache. Figure 11 shows a bar chart for a set-associative cache.

(cache size = 4096 bytes, block size = 32 bytes, associativity = 1)
Figure 10: Bar Chart of a Direct-Mapped Cache

(cache size = 524288 bytes, block size = 64 bytes, associativity = 4)
Figure 11: Bar Chart of a Set-Associative Cache

4.2.3 Picture of Physical Cache Layout

We also present a diagram showing the physical cache layout. In Figure 12, we use rectangles to represent the cache array structure. By dimensioning the rectangles, we show users a clear picture about how the original cache array should be r

evised to minimize the access time. As shown in Figure 13, a picture of physical cache is re-drawn when the inputs are changed.

Figure 12: A Picture of Physical Cache Layout of CACTI Program

(cache size = 1024 bytes, block size = 16 bytes, associativity = 2)
Figure 13: A Picture of Physical Cache Layout When Inputs are Changed

5. IMPLEMENTATION

One crucial element in our implementation of the online CACTI program is the communication between applets. Initially, we included all coding in one applet for both input and output. Later, we found problems when we carried out designs of bar chart and

cache layout structure. The problems were that:

We needed to modify the main source code each time we want to add a chart or a picture.
All components (table of results, bar chart and cache layout) must follow the same layout manager. This makes the implementation extremely difficult.
It is harder to understand and debug the program.

Applet communication is a solution to the above problems. However, there are some security restrictions in how applets can communicate with each other. First, the applets must be running on the same page, in the same browser window. Second, many applet

viewers require that the applets originate from the same server. However, these restrictions will not cause a problem to our design because a single-paged web design is ideal for showing all the results at the same time. In addition, multi-browser and

server communications are not required in this project.

According to the user interface design, the new CACTI program is logically split into 4 applets: Input Applet, Output Applet, Bar Chart Applet and Cache Layout Applet. These applets are interconnected. A brief description of each applet follows:

Input Applet

The Input Applet is responsible for taking inputs from users and then calling the Output Applet.

Output Applet

The Output Applet acts as a bridge between the Input Applet and the Bar Chart Applet/ the Cache Layout Applet. After finishing internal calculations, the Output Applet will display a table of results and simultaneously send the required set of results

to both the Bar Chart Applet and the Cache Layout Applet.

Bar Chart Applet

When the new results are collected by the Output Applet, the Output Applet calls the Bar Chart Applet with the new results of time components. After that, the Bar Chart Applet draws the title, the chart and the labels. If the maximum value out of all

time components is less than 10, a bigger scale will be used to draw both the data and tag bars. Otherwise, a smaller scale will be used. When the entire diagram is drawn, the Bar Chart Applet waits for a call from the Output Applet and the whole proces

s repeats.

Cache Layout Applet

Some values such as N_dbl, N_dwl, N_spd, etc. are sent to the Cache Layout Applet by the Output Applet to compute the new dimensions of revised cache array structure. The Cache Layout Applet will draw rectangles, which rep

resent caches, with corresponding dimensions shown on the screen. Again, like the Bar Chart Applet, the whole process repeats when there is a new call from the Output Applet.

Figure 14 summarizes how the applets interact with one another.

Figure 14: Communication Between Applets

6. CONCLUSION AND FUTURE WORK

By using Java, we were able to make the CACTI program available on the WWW. In addition, the user interface of this program was significantly improved by adding these features: online input menus, an output table, a bar chart and a cache pictur

e. As a result, our objective for this project was achieved.

Even though the current CACTI program has met the goal of this project, there are some future work that can make the program even better. For example, a picture of an SRAM cache structure can be drawn on the WWW page. In addition, the address and out

put widths can be interactively changed by the users directly on the cache picture. When the input parameters are submitted, the result of each time component should be shown beside the corresponding component on the cache picture. Thus, this feature wi

ll result in a more interactive and easy-to-understand CACTI program for the users.

REFERENCES

^{1S. J. E. Wilton and Norman P. Jouppi, "An Enhanced Access and Cycle Time Model for
On-Chip Caches," DEC WRL technical report number 93/5, 1994, pp 4.
^{2S. J. E. Wilton and Norman P. Jouppi, "An Enhanced Access and Cycle Time Model for
On-Chip Caches," DEC WRL technical report number 93/5, 1994, pp 3.
^{3S. J. E. Wilton and Norman P. Jouppi, "An Enhanced Access and Cycle Time Model for
On-Chip Caches," DEC WRL technical report number 93/5, 1994, pp 45.
^{4S. J. E. Wilton and Norman P. Jouppi, "An Enhanced Access and Cycle Time Model for
On-Chip Caches," DEC WRL technical report number 93/5, 1994, pp 46.
^{5S. J. E. Wilton and Norman P. Jouppi, "An Enhanced Access and Cycle Time Model for
On-Chip Caches," DEC WRL technical report number 93/5, 1994, pp 47.
^{6S. J. E. Wilton and Norman P. Jouppi, "An Enhanced Access and Cycle Time Model for
On-Chip Caches," DEC WRL technical report number 93/5, 1994, pp 49.
^{7Dr. Greg Bond. IEEE Student Branch Java Seminar. 23 Jan 1997.}}}}}}}}}