This spring I am introducing a new grad course! Details below.

EECE 571T - Compute Accelerator Architectures

Thursdays from 9:30 to 12:30 in FSC 1402 (first meeting Jan 10)


With the approaching end of Moore’s Law computer systems developers are being confronted with the challenge of increasing computing performance without using faster and more plentiful transistors. This course explores the leading approach to tackling this problem that has emerged in both industry and academic research: The use of computation accelerator architectures.

This course will provide students a foundation for understanding both programmable and fixed function accelerator architectures. The initial portion of the course will involve discussion of graphics processor units which are commonly used today for training deep neural networks. The later portion of the course will focus on more specialized accelerators with an emphasis on machine learning accelerators.

The course will involve programming assignments (to get familiar with using computer architecture simulators), research paper readings and presentations and a final project.


Assignments 15% Weekly Paper Reading Quizzes 25% Presentations 20% Project 40%


  1. Course overview

  2. Review of Computer Architecture
    • Instructions
    • Pipelining
    • caches
    • memory and memory access scheduling
    • multi core and multi threading
  3. Graphics processor unit architectures
    • GPU programming model
    • GPU instruction set architecture
    • one loop approximation (multithreading and SIMT model)
    • two loop approximation (register scoreboard, operand collector)
    • three loop approximation (caches, pending memory request tables, memory controller)
    • introduction to the GPGPU-Sim simulator
  4. Machine learning accelerators:
    • Brief review of deep neural networks
    • Linear regression and classification
    • Single layer networks
    • Multilayer networks and back propagation
    • Convolutional neural networks
    • Survey of some recent deep networks - Inference acceleration architecture
    • Approximation (bit width reduction)
    • Ineffectual computations (skipping multiplication by zero)
    • Memory organization for faster acceleration
    • Industry examples (whatever is publicly known about them) - Semi-programmable ML accelerators
  5. Other compute accelerators:
    • Media encoders/decoders (e.g., H264)
    • Network switches and network processors
    • Digital signal pr)ocessor1