Professor Guy Lemieux

Guy is a Professor in Computer Engineering at the University of British Columbia University of British Columbia where he teaches advanced digital design and computer systems/architecture related courses. His research focuses on improving FPGA devices and CAD tools, in particular making them easier to use and more efficient for computing tasks. His research has shown how to design FPGA interconnect to be more efficient, that CAD tool performance can be enhanced through parallelism, and that overlays are a much easier way to program FPGAs . His latest research is attempting to make machine learning and artificial intelligence applications more efficient on FPGAs through low-precision arithmetic, such as TinBiNN for binary neural networks, and using custom accelerator interfaces.

Prof. Lemieux graduated from University of Toronto where he was part of a team that designed NUMAchine, a cache-coherent multiprocessor built from scratch using MIPS R4400 CPUs, custom PCBs and FPGAs. Throughout his career, he's designed or co-designed various soft processors and accelerators (MIPS and NIOS clones, VIPERS , VEGAS , VENICE , ORCA RISC-V, and Saturn-V) and co-founded VectorBlox Computing which developed the MXP (Matrix Processor), a tensor accelerator. Within RISC-V International, he is an elected member of the Technical Steering Committee and chair of the SoftCPU Special Interest Group . Throughout the COVID-19 pandemic, he worked with a small team from the SIG to develop new technology for managing custom instructions without namespace collisions in the ISA, resulting in Google's CFU-Playground as well as CFU interfaces in Lattice's RISC-V RX IP core and Efinix's Titanium FPGAs. After extending this further to provide virtualization and protection (like virtual memory), it is being offered as a basis specification for the new Composable Extensions (CX) Task Group which he helped launch. He was also a member of the RISC-V Vector and the early Cache Management Operations committees. Finally, he serves as a Voting Member and Editor within the Working Group for IEEE P3109 Standard for Arithmetic Formats for Machine Learning which has been releasing updates through its Interim Report.

FPGAs, short for Field-Programmable Gate Arrays, are are universal logic chips, capable of emulating any other digital chip. Of course, this emulation capability comes with some overhead in the form of cost and performance -- a key goal of Prof. Lemieux's research is to drive down the cost as well as improve the speed and power dissipation of these chips. He has done this through optimization at various levels including the transistor-level design, architecture (internal organization) of the device, and CAD tools that map circuits into the device.

Prof. Lemieux is especially interested in the acceleration of compute-oriented applications using custom hardware. For this purpose, FPGAs have shown much promise, but they also have a reputation for being very difficult to "program", particularly among software-oriented designers, the key users in such applications. To help, he is a strong advocate for the use of overlay architectures, which are digital circuits built on top of FPGAs that make them easier to program. Overlays are like a new type of FPGA, in that they themselves are programmable, but they are more application-specific and have fewer users, making them unlikely to be built as custom chips. Nevertheless, his research has shown that regular C programs can be easily mapped to processor-like overlays and offer levels of performance that competitive with GPUs.

His startup, VectorBlox Computing, was acquired by Microchip in September 2019 and is now called the VectorBlox Accelerator SDK . VectorBlox designed a vector accelerator system known as the VectorBlox MXP (MatriX Processor) that operates directly on 1D, 2D and 3D tensors. MXP provides four key architectural features that provide gains in efficiency: scratchpad, hardware DMA, sub-word SIMD, and custom instructions. Instead of a traditional named vector register, MXP uses an addressable scratchpad. Being addressable, vectors are simply pointers in C; this allows any number of vectors of arbitrary length to be formed without any internal fragmentation, reduces the need for data duplication and data movement, and allows use of a stack-based ABI for nesting accelerated vector functions. Hardware DMA makes efficient use of a wide, dedicated path to external memory for transfering 1D and 2D tensors and operates concurrently with computation. Sub-word SIMD provides increased parallelism for operations on byte and halfword elements. Custom instructions make it easy to attach highly pipelined hardware into a C-programmed environment, where the hardware design effort focuses on data operations not on data storage/staging/movement (which is left within C). Furthermore, the MXP is fully portable, allowing the use of almost any host processor and almost any C compiler without the need for any compiler modifications. VectorBlox measured nearly 10,000 times speedup on an N-body physics problem. With this level of acceleration, compiler autovectorization is almost useless because it demands careful planning of code structure and data layout which are best done manually using compiler intrinsics.

His work on interconnect design for FPGAs resulted in a book, published in November 2003. He received a Best Paper Award at the 2004 IEEE International Conference on Field-Programmable Technology. His 2001 paper Using Sparse Crossbars within LUT Clusters is included as part of FPGA20, the Top 25 contributions in the First 20 Years of the International Symposium on FPGAs between 1992 and 2011. His 2017 paper Real-time Object Detection in Software with Custom Vector Instructions and Algorithm Changes, was nominated for best paper. The accompanying face detection demo clearly demonstrates its overall accuracy and speed. This work shows how success can be achieved using software-first overlay approach; the entire application was coded in C++ and used less than 300 lines of custom VHDL, but it out-performed all prior art including custom-built chips dedicated to that purpose.

Affiliations

UBC System-on-Chip Research Group
Institute for Computing, Information & Cognitive Systems (ICICS)
CMC Microsystems
IEEE
ACM
Conference organization and program committees...
IEEE International Conference on Field-Programmable Custom Computing Machines (FCCM)
International Conference on Field-Programmable Technology (ICFPT)
International Symposium on Field-Programmable Logic and Applications (FPL)

Other Distinctions...
Associate Editor, Hindawi International Journal of Reconfigurable Computing
IEEE Senior Member (December 2007)
ACM Senior Member (December 2009)
Registered Professional Engineer with Association of Professional Engineers and Geoscientists of BC (APEGBC)

Students

Current Students

Degree	Name	Email	Graduation	Thesis Topic
Ph.D.	Zhonghua (Sebastian) Zhou	tbd	est. 2021	Machine Learning for ASIC Routing
M.A.Sc.	Mariko Tatsumi	tbd	est. 2021	Machine Learning on FPGAs
M.A.Sc.	Fredy Augusto Maciel Alves	tbd	est. 2021	Processor Design for FPGAs
M.A.Sc.	Caroline White	tbd	est. 2021	tbd
M.A.Sc.	John Deppe	tbd	est. 2021	tbd; co-supervised with Mieszko Lis

Completed Students

Degree	Name	Current Position	Completion	Thesis / Project
M.Sc.	May Young	Vancouver	April 2020	Dynamic Race Detection for Non-Coherent Accelerators pdf primary supervisor was Alan Hu
Ph.D.	Hossein Omidian	Xilinx, San Jose	October 2018	Automated Space/Time Scaling of Streaming Task Graphs on Field Programmable Gate Arrays pdf
M.A.Sc.	Maximilian Golub	Mercedes-Benz, Seattle	August 2018	DropBack: Continuous Pruning During Deep Neural Network Training pdf
M.A.Sc.	Joseph Edwards	VectorBlox Computing, Vancouver	July 2018	Real-time Computer Vision in Software using Custom Vector Overlays pdf
M.Eng.	Nathan van Woudenberg	Programming + Machine Learning Support in ECE Robotics Control Lab, UBC	May 2016	n/a
M.Eng.	Gene Lai	unknown	May 2016	n/a
Ph.D.	Ameer Abdelhadi	ameer	June 2016	Architecture of Block-RAM-Based Massively Parallel Memory Structures: Multi-Ported Memories and Content-Addressable Memories pdf
M.A.Sc.	Keith Lee	Gumstix Inc., Vancouver	January 2016	The DEVBOX development environment: an environment for introducing Verilog to young students pdf video demo
M.Eng.	Danting Li	unknown	December 2015	n/a
Ph.D.	Aaron Severance	VectorBlox Computing Inc.	March 2015	Broadening the Applicability of FPGA-based Soft Vector Processors pdf
M.A.Sc.	Michael (Xi) Yue	unknown	October 2014	Rapid Overlay Building for FPGAs pdf
M.Eng.	Douglas (Hak Hian) Sim	Recon Instruments	May 2014	n/a
M.A.Sc.	Alex Brant	Altera Toronto	November 2012	Coarse and Fine Grain Programmable Overlay Architectures for FPGAs pdf Please check out the open source repository for the ZUMA FPGA Overlay
M.A.Sc.	Zhiduo Liu	upon graduation: Altera San Jose currently: Google, CA	September 2012	Accelerator Compiler for the VENICE Vector Processor pdf
M.A.Sc.	Chris Wang	upon graduation: Xilinx, CA currently: Google, CA	October 2011	Scalable and Deterministic Timing-driven Parallel Placement for FPGAs pdf
Ph.D.	David Grant	Altera Toronto	August 2011	CAD Algorithms and Performance of Malibu: An FPGA with Time-Multiplexed Coarse-Grained Elements pdf
Ph.D.	Usman Ahmed	Altera Toronto	April 2011	Impact of custom interconnect masks on cost and performance of structured ASICs pdf co-supervised with Steve Wilton
M.A.Sc.	Chris Chou	PMC-Sierra	April 2010	VIPERS II: A Soft-core Vector Processor with Single-copy Scratchpad Memory pdf
M.A.Sc.	Darius Chiu	Independent	Sept 2009	Congestion-driven Re-clustering CAD Flow for Low-cost FPGAs pdf
M.A.Sc.	Johnny Ho	upon graduation: Ixia, CA next: Quantlab, CA currently: Microsoft, WA	Sept 2009	PERG-Rx: An FPGA-based pattern-matching engine with limited regular expression support for large pattern databases pdf
M.A.Sc.	Patrick Dong	Xilinx San Jose	Sept 2009	Period and Glitch Reduction via Clock Skew Scheduling, Delay Padding and GlitchLess pdf
M.A.Sc.	Paul Teehan	upon graduation: Ph.D. student, UBC next: EnerNOC currently: Travel Audience, Germany	October 2008	Reliable High-throughput FPGA Interconnect using Source-synchronous Surfing and Wave Pipelining pdf
Ph.D.	Mehdi Alimadadi	Linear Technology	July 2008	Recycling Clock Network Energy in High-performance Digital Designs using On-chip DC-DC Converters pdf 90nm chip layout (4MB bitmap) co-supervised with Patrick Palmer
M.A.Sc.	Jason Yu	Intel Canada	May 2008	Vector Processing as a Soft-CPU Accelerator pdf
M.Eng.	Eric Lai	Amazon.com	April 2008	n/a
M.A.Sc.	Mark Yamashita	upon graduation: IBM Canada next: Oxford MBA currently: at large	November 2007	A Combined Clustering and Placement Algorithm for FPGAs pdf
M.Eng.	Shirley Ma	McKesson Canada	December 2007	n/a
M.Eng.	David Yeager	upon graduation: IBM Canada currently: Dynimize	December 2006	Interconnect Estimation for FPGAs pdf
M.A.Sc.	David Leong	Nokia Canada	December 2006	Incremental Placement for FPGAs pdf
M.Eng.	Wilson Lo	unknown	November 2006	Power Model for Small Custom Embedded Memories supervised by André Ivanov
M.A.Sc.	Edmund Lee	Altera Toronto	Summer 2006	Interconnect Driver Design for Long Wires in Field-Programmable Gate Arrays pdf co-supervised with Shahriar Mirabbasi
M.A.Sc.	Marvin Tom	Major Tech Firm in USA	Spring 2006	Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Grate Arrays pdf
M.A.Sc.	Anthony Yu	Intel Canada	Fall 2005	Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy pdf
M.A.Sc.	Victor Aken'Ova	PMC-Sierra	Spring 2005	Bridging the Gap between Soft and Hard eFPGA Design pdf supervised by Resve Saleh

e-mail addresses above are @ece.ubc.ca (unless otherwise noted)

Funding

Financial support and donations from the following organizations is gratefully acknowledged.

Links

You might also enjoy my old home pages as a graduate student at the University of Toronto.

Try searching Library and Archives Canada . They have Canadian theses and other publications.

If you are looking for information about my book, try here

According to data in this list, my Erdős number is 4.
Lemieux → Sevcik → Klawe → Erdős

Professor Guy Lemieux

Publications

Google Scholar Profile

ACM Digital Library Author Profile

Downloads

Contact Information

Links to Commercial FPGA/FPGA-like Vendors

Affiliations

Students

Funding

Links