Wilson Fung's homepage

Wilson Fung
Actually, just to distinguish myself from other "Wilson Fung"s in the world, my full name is Wilson Wai Lun Fung :-)
I am currently a PhD student in Computer Engineering in the University of British Columbia.
I have been awarded the NVIDIA Graduate Fellowship for 2012.
~~I will be graduating later this year and is currently available on the job market. Here is my resume.~~

Email:
Office:

wwlfung(at)ece(dot)ubc(dot)ca
KAIS 4075

Research Interests

Being part of the computer architecture research lab, I am interested in anything that can improve computing power via architecture (so to enable new applications for computers). There are just too many interesting problems in computer architecture waiting to be solved. Here is a list of topics that interest me (the most!):

General Purpose Computing on GPU - In the past, GPU used to be a application-specific processor that render 3D graphics in computer, but now it has evolved into a massive parallel computing machine, whose throughput can be 100x of the fastest general purpose CPU in the market. Is it possible to harness this computing power for purpose other than 3D rendering? Can we modify the GPU architecture so that it can do well in both 3D rendering and other general purpose computing?

GPU Performance Simulator - This is related to the problem above. Basically simulation is the answer to the question: How can we know that this architecture is better without building the chip first? So, to design a GPU, we need to build a GPU simulator first. Unlike CPU architecture, general framework of a GPU architecture often undergoes significant changes between generations, so the simulator should allow users to reconfigure the whole architecture with minimal effort.

I have been one of the main contributors to the development of GPGPU-Sim, an open-source GPU microarchitecture simulator used widely in GPU-related architecture research.

Architecture for QCA - Quantum-Dot Cellular Automata (QCA) is a special device that uses quantum tunneling and electron coupling to perform logic operation. This allow high-level computing techniques to scale beyond transistors! Despite of this compatibility, architecture used in older devices (transistors) may not be optimal for QCA. This opens a quest for the architecture that would best suits the long pipeline nature of QCA. Taking this to the extreme, would QCA be made into a practical architecture at all?

Bio

Education
Master of Applied Science in Computer Engineering, The University of British Columbia - Completed October 2008.
Bachelor of Applied Science in Computer Engineering, The University of British Columbia - Completed August 2006.

Recent Employment
May 2009 - August 2009: Architecture Intern, NVIDIA Co., Santa Clara
May 2008 - August 2008: Architecture Intern, NVIDIA Co., Santa Clara
June 2004 - May 2005: Researcher Internship, SANYO Electric Co. Ltd., Tokyo
January 2004 - April 2004: Test Engineer, PMC Sierra Inc., Vancouver
September 2002 - April 2003: Software Developer, UBC Department of Electrical and Computer Engineering, Vancouver

Publications

Wilson W. L. Fung, Tor M. Aamodt. Energy Efficient GPU Transactional Memory via Space-Time Optimizations. 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46), 2013. [PowerPoint slides]

Hadi Jooybar, Wilson W. L. Fung, Mike O'Connor, Joseph Devietti, Tor M. Aamodt. GPUDet: A Deterministic GPU Architecture. 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2013), 2013.

Inderpreet Singh, Arrvindh Shriraman, Wilson W. L. Fung, Mike O'Connor, Tor M. Aamodt. Cache Coherence for GPU Architectures. 19th IEEE International Symposium on High-Performance Computer Architecture (HPCA-19), 2013. Selected for IEEE Micro Top Picks

Wilson W. L. Fung, Inderpreet Singh, Andrew Brownsword, Tor M. Aamodt. Kilo TM: Hardware Transactional Memory for GPU Architectures. IEEE Micro, Special Issue: Micro's Top Picks from 2011 Computer Architecture Conferences. Volume 32, No. 3, pp. 7-16, May/June 2012.

Wilson W. L. Fung, Inderpreet Singh, Tor M. Aamodt. Kilo TM Correctness: ABA Tolerance and Validation-Commit Indivisibility. Technical Report, University of British Columbia, 24 May 2012.

Wilson W. L. Fung, Inderpreet Singh, Andrew Brownsword, Tor M. Aamodt. Hardware Transactional Memory for GPU Architectures. 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44), 2011. Selected for IEEE Micro Top Picks

Wilson W. L. Fung, Tor M. Aamodt. Thread Block Compaction for Efficient SIMT Control Flow. 17th IEEE International Symposium on High-Performance Computer Architecture (HPCA-17), 2011. [slides]

Aaron Arial, Wilson W. L. Fung, Andrew M. Turner, Tor M. Aamodt. Visualizing Complex Dynamics in Many-Core Accelerator Architectures. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2010): 164-174

Wilson W. L. Fung, Ivan Sham, George Yuan, Tor M. Aamodt. Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware. ACM Transactions on Architecture and Code Optimization (TACO). Volume 6, Issue 2, Article 7. (June 2009), 1-37.

Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, Tor M. Aamodt. Analyzing CUDA Workloads Using a Detailed GPU Simulator. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2009): 163-174

Wilson W. L. Fung, Ivan Sham, George Yuan, Tor M. Aamodt. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007): 407-420

Wilson Wai Lun Fung, Akiomi Kunisa. Rotation, Scaling, and Translation-Invariant Multi-Bit Watermarking based on Log-Polar Mapping and Discrete Fourier Transform. ICME 2005

Robert Rohling, Wilson Fung, Pedram Lajevardi. PUPIL: Programmable Ultrasound Platform and Interface Library. MICCAI (2) 2003: 424-431

Resume

<pdf>

Handy Tools

color-make download
This is a wrapper around "make" to add color to its output for easier debugging. Why this is a better solution than color-gcc? This script will run anywhere with a standard Perl installation with zero setup (i.e. you do not need root access to install it!).

perlSPICE download
This is a handy (and SIMPLE!) script that I wrote when I was in a course doing SPICE simulation. While I am sure that there are tools allowing you to generate circuits automatically, nothing beats the flexibility to use perl code inside a netlist file to generate some circuit using a mini program.

To run the code,
>./perlspice.pl <spice netlist file containing perl code>

To prevent confusion with the generated spice netlist, I decided to let the script to only work on input files with extension .spl. Further usage of the code can be found inside the script. Be sure to check if the path to spice is correct.

Optimization Tips

This is a optimization tip discovered by Henry Wong:
There is a maximum operator (>?=) in GNU's gcc. It is much faster than the normal if-else statement because it got optimized into cmov (rather than jne) in x86. Too bad it is depreciated as it is not part of ANSI-C standard, but then here is a form that meets the standard and is translated to the same code.

Just replace a >?= b; with:

a = (a >= b) ? a : b;

and to avoid the extra typing, just define a macro :-)

#define MAX(x, y) (((x) >= (y))? (x) : (y))

Update: The problem with this macro is that if the input (x or y) comes with an increment/decrement operator (++ or --), then this macro will screw up. In the end, one should just know what the compiler (and the preprocessor) is doing...