Professor Guy Lemieux

I am a Professor in the Department of Electrical and Computer Engineering at the University of British Columbia in Vancouver, British Columbia, Canada.

My research is concerned with programmable chips known as FPGAs, which is short for Field-Programmable Gate Arrays. These are universal chips, capable of emulating any other digital chip. Of course, this emulation capability comes with some overhead in the form of cost and performance -- a key goal of my research is to drive down the cost as well as improve the speed and power dissipation of these chips. I have done this through optimization at various levels including the transistor-level design and architecture (internal organization) of the device, as well as the CAD tools that map circuits into the device.

My latest work focuses improving designer productivity, primarily by making FPGAs easier to use. I'm especially interested in compute-oriented applications. FPGAs have a reputation for being very difficult to "program", particularly among software-oriented designers, the key users in compute-oriented applications. To help, I am a strong advocate for the use of overlay architectures, which are digital circuits built on top of FPGAs that make them easier to program. Overlays are like a new type of FPGA, in that they themselves are programmable, but they are more application-specific and have fewer users, making them unlikely to be built as custom chips. Nevertheless, my research has shown that regular C programs can be easily mapped to processor-like overlays, and they can be accelerated significantly.

I co-founded VectorBlox Computing which was acquired by Microchip in September 2019. At VectorBlox, we designed a vector accelerator system known as the VectorBlox MXP (MatriX Processor) that operates directly on 1D, 2D and 3D tensors. MXP provides four key architectural features that provide gains in efficiency: scratchpad, hardware DMA, sub-word SIMD, and custom instructions. Instead of a traditional named vector register, MXP uses an addressable scratchpad. Being addressable, vectors are simply pointers in C; this allows any number of vectors of arbitrary length to be formed without any internal fragmentation, reduces the need for data duplication and data movement, and allows use of a stack-based ABI for nesting accelerated vector functions. Hardware DMA makes efficient use of a wide, dedicated path to external memory for transfering 1D and 2D tensors and operates concurrently with computation. Sub-word SIMD provides increased parallelism for operations on byte and halfword elements. Custom instructions make it easy to attach highly pipelined hardware into a C-programmed environment, where the hardware design effort focuses on data operations not on data storage/staging/movement (which is left within C). Furthermore, the MXP is fully portable, allowing the use of almost any host processor and almost any C compiler without the need for any compiler modifications. We measured nearly 10,000 times speedup on an N-body physics problem. With this level of acceleration, compiler autovectorization is almost useless because it demands careful planning of code structure and data layout which are best done manually using compiler intrinsics.

My work on interconnect design for FPGAs resulted in a book, published in November 2003. I received a Best Paper Award at the 2004 IEEE International Conference on Field-Programmable Technology. My 2001 paper Using Sparse Crossbars within LUT Clusters is included as part of FPGA20, the Top 25 contributions in the First 20 Years of the International Symposium on FPGAs between 1992 and 2011.

Some of my past work on multiprocessing can be found at the University of Toronto.


Google Scholar Profile

ACM Digital Library Author Profile


Contact Information

Links to Commercial FPGA/FPGA-like Vendors



Current Students
Degree Name Email Graduation Thesis Topic
Ph.D. Zhonghua (Sebastian) Zhou tbd est. 2021 Machine Learning for ASIC Routing
M.A.Sc. Mariko Tatsumi tbd est. 2021 Machine Learning on FPGAs
M.A.Sc. Fredy Augusto Maciel Alves tbd est. 2021 Processor Design for FPGAs
M.A.Sc. Caroline White tbd est. 2021 tbd
M.A.Sc. John Deppe tbd est. 2021 tbd; co-supervised with Mieszko Lis

Completed Students
Degree Name Current Position Completion Thesis / Project
M.Sc. May Young Vancouver April 2020 Dynamic Race Detection for Non-Coherent Accelerators pdf
primary supervisor was Alan Hu
Ph.D. Hossein Omidian Xilinx, San Jose October 2018 Automated Space/Time Scaling of Streaming Task Graphs on Field Programmable Gate Arrays pdf
M.A.Sc. Maximilian Golub Mercedes-Benz, Seattle August 2018 DropBack: Continuous Pruning During Deep Neural Network Training pdf
M.A.Sc. Joseph Edwards VectorBlox Computing, Vancouver July 2018 Real-time Computer Vision in Software using Custom Vector Overlays pdf
M.Eng. Nathan van Woudenberg Programming + Machine Learning Support in ECE Robotics Control Lab, UBC May 2016 n/a
M.Eng. Gene Lai unknown May 2016 n/a
Ph.D. Ameer Abdelhadi ameer June 2016 Architecture of Block-RAM-Based Massively Parallel Memory Structures: Multi-Ported Memories and Content-Addressable Memories pdf
M.A.Sc. Keith Lee Gumstix Inc., Vancouver January 2016 The DEVBOX development environment: an environment for introducing Verilog to young students pdf
video demo
M.Eng. Danting Li unknown December 2015 n/a
Ph.D. Aaron Severance VectorBlox Computing Inc. March 2015 Broadening the Applicability of FPGA-based Soft Vector Processors pdf
M.A.Sc. Michael (Xi) Yue unknown October 2014 Rapid Overlay Building for FPGAs pdf
M.Eng. Douglas (Hak Hian) Sim Recon Instruments May 2014 n/a
M.A.Sc. Alex Brant Altera Toronto November 2012 Coarse and Fine Grain Programmable Overlay Architectures for FPGAs pdf
Please check out the open source repository for the ZUMA FPGA Overlay
M.A.Sc. Zhiduo Liu upon graduation: Altera San Jose
currently: Google, CA
September 2012 Accelerator Compiler for the VENICE Vector Processor pdf
M.A.Sc. Chris Wang upon graduation: Xilinx, CA
currently: Google, CA
October 2011 Scalable and Deterministic Timing-driven Parallel Placement for FPGAs pdf
Ph.D. David Grant Altera Toronto August 2011 CAD Algorithms and Performance of Malibu: An FPGA with Time-Multiplexed Coarse-Grained Elements pdf
Ph.D. Usman Ahmed Altera Toronto April 2011 Impact of custom interconnect masks on cost and performance of structured ASICs pdf
co-supervised with Steve Wilton
M.A.Sc. Chris Chou PMC-Sierra April 2010 VIPERS II: A Soft-core Vector Processor with Single-copy Scratchpad Memory pdf
M.A.Sc. Darius Chiu Independent Sept 2009 Congestion-driven Re-clustering CAD Flow for Low-cost FPGAs pdf
M.A.Sc. Johnny Ho upon graduation: Ixia, CA
next: Quantlab, CA
currently: Microsoft, WA
Sept 2009 PERG-Rx: An FPGA-based pattern-matching engine with limited regular expression support for large pattern databases pdf
M.A.Sc. Patrick Dong Xilinx San Jose Sept 2009 Period and Glitch Reduction via Clock Skew Scheduling, Delay Padding and GlitchLess pdf
M.A.Sc. Paul Teehan upon graduation: Ph.D. student, UBC
next: EnerNOC
currently: Travel Audience, Germany
October 2008 Reliable High-throughput FPGA Interconnect using Source-synchronous Surfing and Wave Pipelining pdf
Ph.D. Mehdi Alimadadi Linear Technology July 2008 Recycling Clock Network Energy in High-performance Digital Designs using On-chip DC-DC Converters pdf
90nm chip layout (4MB bitmap)
co-supervised with Patrick Palmer
M.A.Sc. Jason Yu Intel Canada May 2008 Vector Processing as a Soft-CPU Accelerator pdf
M.Eng. Eric Lai April 2008 n/a
M.A.Sc. Mark Yamashita upon graduation: IBM Canada
next: Oxford MBA
currently: at large
November 2007 A Combined Clustering and Placement Algorithm for FPGAs pdf
M.Eng. Shirley Ma McKesson Canada December 2007 n/a
M.Eng. David Yeager upon graduation: IBM Canada
currently: Dynimize
December 2006 Interconnect Estimation for FPGAs pdf
M.A.Sc. David Leong Nokia Canada December 2006 Incremental Placement for FPGAs pdf
M.Eng. Wilson Lo unknown November 2006 Power Model for Small Custom Embedded Memories
supervised by André Ivanov
M.A.Sc. Edmund Lee Altera Toronto Summer 2006 Interconnect Driver Design for Long Wires in Field-Programmable Gate Arrays pdf
co-supervised with Shahriar Mirabbasi
M.A.Sc. Marvin Tom Major Tech Firm in USA Spring 2006 Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Grate Arrays pdf
M.A.Sc. Anthony Yu Intel Canada Fall 2005 Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy pdf
M.A.Sc. Victor Aken'Ova PMC-Sierra Spring 2005 Bridging the Gap between Soft and Hard eFPGA Design pdf
supervised by Resve Saleh

e-mail addresses above are (unless otherwise noted)


Financial support and donations from the following organizations is gratefully acknowledged.


You might also enjoy my old home pages as a graduate student at the University of Toronto.

Try searching Library and Archives Canada. They have Canadian theses and other publications.

If you are looking for information about my book, try here

According to data in this list, my Erdős number is 4.
Lemieux → SevcikKlaweErdős