Below is a list of my publications in reverse chronological order.
My google scholar profile is here .

2023

Deval Shah, Zi Yu Xue, Karthik Pattabiraman, Tor Aamodt, Characterizing and Improving Resilience of Accelerators to Memory Errors in Autonomous Robots, To appear in ACM Transactions on Cyber-Physical Systems.
Yuan Hsi Chou, Tyler Nowicki, Tor M. Aamodt, Treelet Prefetching For Ray Tracing, In proceedings of the 56th IEEE/ACM International Symposium on Microarchitecture (MICRO 2023), Toronto, ON, Canada, Oct 28-Nov 1, 2023. (acceptance rate: 101/424 ≈ 23.8%) code
Lufei Liu, Mohammadreza Saed, Yuan Hsi Chou, Davit Grigoryan, Tyler Nowicki, Tor M. Aamodt, LumiBench: A Benchmark Suite for Hardware Ray Tracing, In proceedings of the 2023 IEEE International Symposium on Workload Characterization (IISWC 2023), Ghent, Belgium, Oct 1-3, 2023. (acceptance rate: 15/47 ≈ 31.9%) Best Paper Award code.
Deval Shah, Ningfeng Yang, Tor M. Aamodt, Energy-Efficient Realtime Motion Planning, In proceedings of the IEEE/ACM International Symposium on Computer Architecture (ISCA 2023), Orlando, FL, USA, June 17-21, 2023. (acceptance rate: 79/372 ≈ 21.2%)
Deval Shah, Tor M. Aamodt, Learning Label Encodings for Deep Regression, In proceedings of the 11th International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, May 1-5, 2023. (Spotlight presentation)

2022

Gideon Uchehara, Tor M. Aamodt, Olivia Di Matteo, Rotation-Inspired Circuit Cut Optimization, In proceedings of the 3rd IEEE/ACM International Workshop on Quantum Computing Software (in conjuction with SC 2022), Dallas, TX, November 13, 2022 video
Mohammadreza Saed, Yuan Hsi Chou, Lufei Liu, Tyler Nowicki, Tor M. Aamodt, Vulkan-Sim: A GPU Architecture Simulator for Ray Tracing, In proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO 2022), Chicago, IL, October 1-5, 2022 (acceptance rate: 83/369 ≈ 22.5%) code video
Jonathan Lew, Yunpeng Liu, Wenyi Gong, Negar Goli, R. David Evans, Tor M. Aamodt, Anticipating and Eliminating Redundant Computations in Accelerated Sparse Training, In proceedings of the IEEE/ACM International Symposium on Computer Architecture (ISCA 2022), New York City, New York, USA, June 11–15, 2022. (acceptance rate: 67/400 ≈ 17%) video
Deval Shah, Zi Yu Xue, Tor Aamodt, Label Encoding for Regression Networks, In proceedings of the Tenth International Conference on Learning Representations (ICLR 2022), Virtual Conference, Apr 25-29, 2022. (Spotlight presentation) video

2021

R Dave Evans, Tor M. Aamodt, AC-GC: Lossy Activation Compression with Guaranteed Convergence, In proceedings of the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021), Virtual-only Conference, Dec 6-14, 2021. (acceptance rate: 26%) code video
Lufei Liu, Wesley Chang, Francois Demoullin, Yuan Hsi Chou, Mohammadreza Saed, David Pankratz, Tyler Nowicki, Tor M. Aamodt, Intersection Prediction for Accelerated GPU Ray Tracing, In proceedings of the 54th IEEE/ACM International Symposium on Microarchitecture (MICRO 2021), Athens, Greece, October 16-20, 2021 (acceptance rate: 94/430 ≈ 21.9%) code video
Vijay Kandiah, Scott Peverelle, Mahmoud Khairy, Junrui Pan, Amogh Manjunath, Timothy G. Rogers, Tor M. Aamodt, Nikos Hardavellas, AccelWattch: A Power Modeling Framework for Modern GPUs, In proceedings of the 54th IEEE/ACM International Symposium on Microarchitecture (MICRO 2021), Athens, Greece, October 16-20, 2021 (acceptance rate: 94/430 ≈ 21.9%)

2020

Md Aamir Raihan, Tor M. Aamodt, Sparse Weight Activation Training, In proceedings of the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020), Virtual-only Conference, Dec 6-12, 2020. (acceptance rate: 1900/9454 ≈ 20.1%) Presentation

Yuan Hsi Chou, Christopher Ng, Shaylin Cattell, Jeremy Intan, Matthew D. Sinclair, Joseph Devietti, Timothy G. Rogers, Tor M. Aamodt, Deterministic Atomic Buffering, In proceedings of the 53rd IEEE/ACM International Symposium on Microarchitecture (MICRO 2020), Global Online Event, October 17-21, 2020 (acceptance rate: 82/422 ≈ 19.4%) Simulator source code, Videos: Full talk, Lightning talk

Jiho Kim, Sanghun Cho, Minsoo Rhu, Ali Bakhoda, Tor M. Aamodt, John Kim, Bandwidth Bottleneck in Network-on-Chip for High-Throughput Processors, In proceedings of the ACM/IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT 2020), Oct 5-7, 2020. (poster presentation)

Negar Goli, Tor M. Aamodt, ReSprop: Reuse Sparsified Backpropagation, In proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), pp. 1548-1558, Seattle, WA, June 16-18, 2020. (Oral presentation) Video of Negar's CVPR Talk

R. David Evans, Lufei Liu, Tor M. Aamodt, JPEG-ACT: Accelerating Deep Learning via Transform-based Lossy Compression, In proceedings of the 2020 IEEE/ACM International Symposium on Computer Architecture (ISCA 2020), Valencia, Spain, May 30-June 3, 2020. Video of Dave's ISCA Talk

Mahmoud Khairy, Zhesheng Shen, Tor M. Aamodt, Timothy G. Rogers, Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling, In proceedings of the 2020 IEEE/ACM International Symposium on Computer Architecture (ISCA 2020), Valencia, Spain, May 30-June 3, 2020.

Milad Mohammadi, Song Han, Ehsan Atoofian, Amirali Baniasadi, Tor M. Aamodt, William J. Dally, Energy Efficient On-Demand Dynamic Branch Prediction Models, IEEE Transactions on Computers, pp. 453 - 465, v69, n3, March 1, 2020.

2019

Tayler Hetherington, Maria Lubeznov, Deval Shah, Tor M. Aamodt, EDGE: Event-Driven GPU Execution, In proceedings of the ACM/IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT 2019), Seattle, WA, September 21-25, 2019.

Ayub Gubran, Tor M. Aamodt, Emerald: Graphics Modeling for SoC Systems, In proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA'19), pp. 169-182, Phoenix, Arizona, June 22-26, 2019. (slides)

Md Aamir Raihan, Negar Goli, Tor M. Aamodt, Modeling Deep Learning Accelerator Enabled GPUs, In proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 79-92, Madison, Wisconsin, March 24-26, 2019.
Mahmoud Khairy, Akshay Jain, Tor Aamodt, Timothy G. Rogers, A Detailed Model for Contemporary GPU Memory Systems, In proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Madison, Wisconsin, March 24-26, 2019. (Poster)
Jonathan Lew, Deval Shah, Suchita Pati, Shaylin Cattell, Mengchi Zhang, Amruth Sandhupatla, Christopher Ng, Negar Goli, Matthew Sinclair, Timothy Rogers, Tor Aamodt, Analyzing Machine Learning Workloads Using a Detailed GPU Simulator, In proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Madison, Wisconsin, March 24-26, 2019. (Poster)

2018

Tor M. Aamodt, Wilson Wai Lun Fung, Timothy G. Rogers, General-Purpose Graphics Processor Architectures, Morgan & Claypool Publishers, 140 pages, May 2018.

Andreas Moshovos, Jorge Albericio, Patrick Judd, Alberto Delmas Lascorz, Sayeh Sharify, Zissis Poulos, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Exploiting Typical Values to Accelerate Deep Learning, IEEE Computer, Volume 51, Issue 5, May 2018.

Ahmed ElTantawy, Tor M. Aamodt, Warp Scheduling for Fine-Grained Synchronization, In proceedings of the 24th IEEE International Symposium on High-Performance Computer Architecture (HPCA-20), February 24-28 2018, Vienna, Austria.

Andreas Moshovos, Jorge Albericio, Patrick Judd, Alberto Delmas Lascorz, Sayeh Sharify, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Value-Based Deep-Learning Acceleration, IEEE Micro, Volume 38, Issue 1, January/February 2018.

2017

Milad Mohammadi, Tor M Aamodt, William J Dally, CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance, ACM Transactions on Architecture and Code Optimization (TACO), Volume 14, Issue 4, December 2017.

Shadi Asadi, Jennifer Ongko, Tor M. Aamodt, A State Machine Block for High-Level Synthesis, In proceedings of the IEEE International Conference on Field Programmable Technology (FPT), Melbourne, Australia, December 11-13, 2017.

Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Raquel Urtasun, Andreas Moshovos, Proteus: Exploiting Precision Variability in Deep Neural Networks, Parallel Computing, Volume 73, April 2018.

2016

Ahmed ElTantawy, Tor M. Aamodt, MIMD Synchronization on SIMT Architectures, In proceedings of the ACM/IEEE Int'l Symposium on Microarchitecture (MICRO'16), Taipei, Taiwan, Oct. 15-19, 2016. (acceptance rate: 61/288 ≈ 21.2%)

Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M. Aamodt, Andreas Moshovos, Stripes: Bit-Serial Deep Neural Network Computing, in proceedings of the ACM/IEEE Int'l Symposium on Microarchitecture (MICRO'16), Taipei, Taiwan, Oct. 15-19, 2016. (acceptance rate: 61/288 ≈ 21.2%) IEEE Micro Top Picks "Honorable Mention"

Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Andreas Moshovos, Cnvlutin: Ineffectual-Neuron-Free Deep Convolutional Neural Network Computing, in proceedings of the ACM/IEEE Int'l Symposium on Computer Architecture (ISCA'16), Seoul, Korea, June 18-22 2016. (acceptance rate: 54/288 ≈ 18.8%)

Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Andreas Moshovos, Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks, in proceedings of the ACM International Conference on Supercomputing (ICS 2016), Istanbul, Turkey, June 1-3, 2016. (acceptance rate: 43/183 ≈ 23.5%)

Dongdong Li, Tor M. Aamodt, Inter-core Locality Aware Memory Scheduling, in IEEE Computer Architecture Letters, vol. 15, no. 1, pp. 25-28, Jan.-June 2016.

William J. Dally, R. Curtis Harting, Tor M. Aamodt, Digital Design Using VHDL: A Systems Approach, Cambridge University Press, 721 pages, 2016.

Subhasis Das, Tor M Aamodt, William J Dally, Reuse distance-based probabilistic cache replacement, ACM Transactions on Architecture and Code Optimization (TACO), Volume 12, Issue 4, January 2016.

2015

Tayler H. Hetherington, Mike O'Connor, Tor M. Aamodt, MemcachedGPU: Scaling-up Scale-out Key-value Stores, in proceedings of the ACM Symposium on Cloud Computing (SoCC'15), pp. 43-57, Kohala Coast, Hawaii, August 27-29, 2015. (acceptance rate: 34/157 ≈ 21.7%)

Subhasis Das, Tor M. Aamodt, William J. Dally, SLIP: Reducing Wire Energy in the Memory Hierarchy, in proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA 2015), pp. 349-361, Portland, OR, June 13-17, 2015. (acceptance rate: 58/305 ≈ 19.0%)

Milad Mohammadi, Song Hang, Tor M. Aamodt, William J. Dally, On-Demand Dynamic Branch Prediction, in IEEE Computer Architecture Letters, vol. 14, no. 1, pp. 50-53, June 2015.

2014

Timothy G. Rogers, Mike O'Connor, Tor M. Aamodt, Learning Your Limit: Managing Massively Multithreaded Caches Through Scheduling, in Communications of the ACM, vol. 57, no. 12, pp. 91-98, December 2014.

Inderpreet Singh, Arrvindh Shriraman, Wilson W. L. Fung, Mike O'Connor, Tor M. Aamodt, Cache Coherence for GPU Architectures, IEEE Micro, Special Issue: Micro's Top Picks from 2013 Computer Architecture Conferences, Vol. 34, No. 3, pp. 69-79, May/June 2014.

Ahmed ElTantawy, Jessica Wenjie Ma, Mike O'Connor, Tor M. Aamodt, A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow, In proceedings of the 20th IEEE International Symposium on High-Performance Computer Architecture (HPCA-20), pp. 248 - 259, Orlando, FL, February 15-19, 2014.

2013

Wilson W. L. Fung, Tor M. Aamodt, Energy Efficient GPU Transactional Memory via Space-Time Optimizations, in proceedings of the 46th IEEE/ACM International Symposium on Microarchitecture (MICRO-46), pp. 408-420, Davis, CA, December 7-11, 2013. (acceptance rate: 39/239 ≈ 16.3%), simulator code, benchmarks, slides

Timothy G. Rogers, Mike O'Connor, Tor M. Aamodt, Divergence-Aware Warp Scheduling, in proceedings of the 46th IEEE/ACM International Symposium on Microarchitecture (MICRO-46), pp. 99-110, Davis, CA, December 7-11, 2013. (acceptance rate: 39/239 ≈ 16.3%), slides

Ali Bakhoda, John Kim, Tor M. Aamodt, Designing On-Chip Networks for Throughput Accelerators, ACM Transactions on Architecture and Code Optimization (TACO), Vol. 10, No. 3, Article 21, September 2013.

Jingwen Leng, Tayler Hetherington, Ahmed ElTantawy, Syed Gilani, Nam Sung Kim, Tor M. Aamodt, Vijay Janapa Reddi, GPUWattch: Enabling Energy Optimizations in GPGPUs, In proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA 2013), pp. 487-498, Tel-Aviv, Israel, June 23-27, 2013. (acceptance rate: 56/288 ≈ 19.4%), GPUWattch is included in GPGPU-Sim 3.2.1 onward

Timothy G. Rogers, Mike O'Connor, Tor M. Aamodt, Cache-Conscious Thread Scheduling for Massively Multithreaded Processors, IEEE Micro, Special Issue: Micro's Top Picks from 2012 Computer Architecture Conferences, Vo. 33, No. 3, pp. 78-85, May/June 2013.

Vitaly Zakharenko, Tor M. Aamodt, Andreas Moshovos, Characterizing the Performance Benefits of Fused CPU/GPU Systems Using FusionSim, Design, Automation and Test in Europe (DATE), pp. 685-688, Grenoble, France, 18-22 March, 2013. (interactive presentation) FusionSim website

Hadi Jooybar, Wilson W. L. Fung, Mike O'Connor, Joseph Devietti, Tor M. Aamodt, GPUDet: A Deterministic GPU Architecture, In proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2013), pp. 1-12, Houston, Texas, March 16-20, 2013. (acceptance rate: 44/191 ≈ 23.0%) slides, simulator code+benchmarks

Inderpreet Singh, Arrvindh Shriraman, Wilson W. L. Fung, Mike O'Connor, Tor M. Aamodt, Cache Coherence for GPU Architectures, In proceedings of the 19th IEEE International Symposium on High-Performance Computer Architecture (HPCA-19), pp. 578-590, Shenzhen, China, February 23-27, 2013. simulator code, benchmarks, slides (acceptance rate: 51/249 ≈ 20.5%) Selected for IEEE Micro Top Picks

2012

Jimmy Kwa, Tor M. Aamodt, Small Virtual Channel Routers on FPGAs Through Block RAM Sharing In proceedings of the IEEE International Conference on Field Programmable Technology (FPT), pp. 71-79, Seoul, Korea, December 10-12, 2012. download RTL (acceptance rate: 24/114 ≈ 21.1%)

Timothy G. Rogers, Mike O'Connor, Tor M. Aamodt, Cache-Conscious Wavefront Scheduling, In proceedings of the 45th IEEE/ACM International Symposium on Microarchitecture (MICRO-45), pp. 72-83, Vancouver, BC, December 1-5, 2012. (acceptance rate: 40/228 ≈ 17.5%) Best paper runner up, Selected for IEEE Micro Top Picks, CACM Research Highlight, simulator code + benchmarks

Marcel Gort, Flavio M. De Paula, Johnny J.W. Kuan, Tor M. Aamodt, Alan J. Hu, Steven J.E. Wilton, Jin Yang, Formal-Analysis-Based Trace Computation for Post-Silicon Debug, IEEE Transactions on Very Large Scale Integration Systems, Vol. 20, No. 11, pp. 1997-2010, November 2012.

Xi E. Chen and Tor M. Aamodt, Modeling Cache Contention and Throughput of Multiprogrammed Manycore Processors, IEEE Transactions on Computers, Vol. 61, No. 7, pp. 913-927, July 2012.

Wilson W. L. Fung, Inderpreet Singh, Andrew Brownsword, Tor M. Aamodt, Kilo TM: Hardware Transactional Memory for GPU Architectures, IEEE Micro, Special Issue: Micro's Top Picks from 2011 Computer Architecture Conferences, Vol. 32, No. 3, pp. 7-16, May/June 2012.

Tayler H. Hetherington, Timothy G. Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt, Characterizing and Evaluating a Key-Value Store Application on Heterogeneous CPU-GPU Systems, In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 88-98, New Brunswick, NJ, April 1-3, 2012. download code, slides. (acceptance rate: 20/65 ≈ 30.8%)

2011

Wilson W. L. Fung, Inderpreet Singh, Andrew Brownsword, Tor M. Aamodt, Hardware Transactional Memory for GPU Architectures, In proceedings of the 44th IEEE/ACM International Symposium on Microarchitecture (MICRO-44), pp. 296-307, Porto Alegre, Brazil, December 3-7, 2011. slides, longer talk, simulator as used in MICRO 2011 paper, simulator with recent changes to GPGPU-Sim 3.x, benchmarks (acceptance rate: 44/209 ≈ 21.0%) Selected for IEEE Micro Top Picks

Xi E. Chen and Tor M. Aamodt, Hybrid Analytical Modeling of Pending Cache Hits, Data Prefetching, and MSHRs, ACM Transactions on Architecture and Code Optimization (TACO), Vol. 8, No. 3, Article 10 (October 2011), 28 pages.

Johnny J.W. Kuan, Tor M. Aamodt, Progressive-BackSpace: Efficient Predecessor Computation for Post-Silicon Debug, In proceedings of the 12th IEEE International Workshop on Microprocessor Test and Verification), (MTV 2011), Austin, TX, December 5-7, 2011.

Wilson W. L. Fung, Tor M. Aamodt, Thread Block Compaction for Efficient SIMT Control Flow, In proceedings of the 17th IEEE International Symposium on High-Performance Computer Architecture (HPCA-17), pp. 25-36, San Antonio, Texas, February 12-16 2011. pre-print, slides, simulator code (acceptance rate: 42/227 ≈ 18.5%)

2010

Ali Bakhoda, John Kim, Tor M. Aamodt, Throughput-Effective On-Chip Networks for Manycore Accelerators, In proceedings of the 43rd IEEE/ACM International Symposium on Microarchitecture (MICRO-43), pp. 421-432, Atlanta, Georgia, December 4-8, 2010. pre-print, BibTeX (acceptance rate: 45/248 ≈ 18.1%)

Tor M. Aamodt, Hong Wang, Per Hammarlund, John P. Shen, Steve Shih-wei Liao, Perry H. Wang, Method and apparatus for efficient resource utilization for prescient instruction prefetch, United States Patent #7,818,547, Issued October 19, 2010. Assignee: Intel Corporation.

Hong Wang, Tor M. Aamodt, Pedro Marcuello, Jared W. Stark, John P. Shen, Antonio Gonzalez, Per Hammarlund, Gerolf F. Hoflehner, Perry H. Wang, Steve Shih-wei Liao, Speculative multi-threading for instruction prefetch and/or trace pre-build, United States Patent #7,814,469, Issued October 12, 2010. Assignee: Intel Corporation.

Ali Bakhoda, John Kim, Tor M. Aamodt, On-Chip Network Design Considerations for Compute Accelerators, In Nineteenth International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 535-536, Vienna, Austria, September 11-15, 2010. pre-print, BibTeX Best poster award, 2nd place

Aaron Ariel, Wilson W. L. Fung, Andrew Turner, Tor M. Aamodt, Visualizing Complex Dynamics in Many-Core Accelerator Architectures, In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 164-174, White Plains, NY, March 28-30, 2010. pre-print, BibTeX (acceptance rate: 22/64 ≈ 34.4%)

Johnny Kuan, Steve J. E. Wilton, Tor M. Aamodt, Accelerating Trace Computation in Post-Silicon Debug, In Proceedings of the 11th IEEE International Symposium on Quality Electronic Design (ISQED 2010), pp. 244-249, San Jose, CA, March 22-24, 2010. pre-print, BibTeX (poster presentation)

Hong Wang, Tor Aamodt, Per Hammarlund, John P. Shen, Xinmin Tian, Milind Girkar, Perry Wang, Steve Shih-wei Liao Safe store for speculative helper threads, United States Patent #7,657,880, Issued February 2, 2010. Assignee: Intel Corporation.

2009

George L. Yuan, Ali Bakhoda, Tor M. Aamodt, Complexity Effective Memory Access Scheduling for Many-Core Accelerator Architectures, In proceedings of the 42nd IEEE/ACM International Symposium on Microarchitecture (MICRO-42), pp. 34-44, New York, NY, December 12-16, 2009. slides pre-print, BibTeX (acceptance rate: 52/209 ≈ 24.9%)

Tor M. Aamodt, Architecting Graphics Processors for Non-Graphics Compute Acceleration, In proceedings of the 2009 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, Special Session on Computer Architecture (PACRIM-09), Victoria, BC, August 23-26, 2009. (invited paper)

Wilson W. L. Fung, Ivan Sham, George Yuan, and Tor M. Aamodt, Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware, ACM Transactions on Architecture and Code Optimization (TACO), Vol. 6, No. 2, Article 7 (June 2009), 37 pages. BibTeX

Henry Wong and Tor M. Aamodt, The Performance Potential for Single Application Heterogeneous Systems, 8th Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD 2009), (in conjunction with ISCA 2009), Austin, Texas, June 21, 2009. slides

George L. Yuan and Tor M. Aamodt, A Hybrid Analytical DRAM Performance Model, 5th Workshop on Modeling, Benchmarking and Simulation (MoBS 2009), (in conjunction with ISCA 2009), Austin, Texas, June 21, 2009.

Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, Tor M. Aamodt, Analyzing CUDA Workloads Using a Detailed GPU Simulator, In proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 163-174, Boston, MA, April 26-28, 2009. slides pre-print, simulator, BibTeX (acceptance rate: 24/86 ≈ 27.9%)

Tor M. Aamodt, Hong Wang, John P. Shen, Per Hammarlund, Methods and apparatus for generating speculative helper thread spawn-target points, United States Patent #7,523,465, Issued April 21, 2009. Assignee: Intel Corporation.

Xi E. Chen and Tor M. Aamodt, A First-Order Fine-Grained Multithreaded Throughput Model, In proceedings of the 15th IEEE International Symposium on High-Performance Computer Architecture (HPCA-15), pp. 329-340, Raleigh, North Carolina, February 14-18, 2009 hpca pre-print, BibTeX, (acceptance rate: 35/184 ≈ 19.0%) -- journal version

2008

Xi E. Chen and Tor M. Aamodt, Hybrid Analytical Modeling of Pending Cache Hits, Data Prefetching, and MSHRs, In proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture (MICRO-41), pp. 59-70, Lake Como, Italy, November 8-12, 2008. pre-print, BibTeX (acceptance rate: 40/210 ≈ 19.0%)

Henry Wong, Anne Bracy, Ethan Schuchman, Tor M. Aamodt, Jamison D. Collins, Perry H. Wang, Gautham Chinya, Ankur Khandelwal Groen, Hong Jiang, and Hong Wang, Pangaea: A Tightly-Coupled IA32 Heterogeneous Chip Multiprocessor, In proceedings of the 17th IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 52-61, Toronto, ON, October 25-29, 2008. pre-print, BibTeX (acceptance rate: 30/159 ≈ 18.9%)

Tor M. Aamodt, Hong Wang, Per Hammarlund, John P. Shen, Steve Shih-wei Liao, Perry H. Wang, Method and apparatus for efficient utilization for prescient instruction prefetch, United States Patent #7,404,067, Issued July 22, 2008. Assignee: Intel Corporation.

Xi E. Chen and Tor M. Aamodt, An Improved Analytical Superscalar Microprocessor Memory Model, 4th Workshop on Modeling, Benchmarking and Simulation (MoBS 2008), (in conjunction with ISCA 2008), pp. 7-16, Beijing, China, June 22, 2008.

Ali Bakhoda and Tor M. Aamodt, Extending the Scalability of Single Chip Stream Processors with On-chip Caches, 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects (CMP-MSI 2008), (in conjunction with ISCA 2008), 9 pages, Beijing, China, June 22, 2008.

Tor M. Aamodt, Paul Chow, Compile-Time and Instruction Set Methods for Improving Floating- to Fixed-Point Conversion Accuracy, ACM Transactions on Embedded Computing Systems (TECS), Vol. 7, No. 3, Article 26 (April 2008), 27 pages.

2007

Wilson W. L. Fung, Ivan Sham, George Yuan, and Tor M. Aamodt, Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow, In proceedings of the 40th IEEE/ACM International Symposium on Microarchitecture (MICRO-40), pp. 407-418, Chicago, IL, December 1-5, 2007. slides. pre-print, BibTeX (acceptance rate: 35/166 ≈ 21.1%)

Tor M. Aamodt and Paul Chow, Optimization of Data Prefetch Helper Threads with Path-Expression Based Statistical Modeling, In proceedings of the 21st ACM International Conference on Supercomputing (ICS), pp. 210-221, Seattle, WA, June 16-20, 2007. BibTeX (acceptance rate: 29/123 ≈ 23.6%)

2004

Tor M. Aamodt, Paul Chow, Per Hammarlund, Hong Wang, and John P. Shen, Hardware Support for Prescient Instruction Prefetch, In proceedings of the 10th IEEE International Symposium on High Performance Computer Architecture (HPCA-10), pp. 84-95, Madrid, Spain, February 14-18, 2004. (acceptance rate: 27/153 ≈ 17.6%) BibTeX

2003

Tor M. Aamodt, Pedro Marcuello, Paul Chow, Antonio Gonzalez, Per Hammarlund, Hong Wang, and John P. Shen, A Framework for Modeling and Optimization of Prescient Instruction Prefetch, In proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems(SIGMETRICS 2003), pp. 13-24, San Diego, CA, June 10-14, 2003. slides. BibTeX (acceptance rate: 26/222 ≈ 11.7%)

2002

Tor Aamodt, Pedro Marcuello, Paul Chow, Per Hammarlund, and Hong Wang, Prescient Instruction Prefetch, MTEAC-6, (in conjunction with MICRO-35 ), pp. 3-10, Istanbul Turkey, November 2002. Best student paper award

2001

Tor Aamodt, Andreas Moshovos, and Paul Chow, The Predictability of Computations that Produce Unpredictable Outcomes, MTEAC-5, (in conjunction with MICRO-34 ), pp. 23-34, Austin Texas, December 2001. slides

2000

Tor Aamodt and Paul Chow, Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation, In proceedings of the 3rd ACM International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES-2000), pp. 128-137, San Jose, CA, November 17-18, 2000. slides. (acceptance rate: 25/56 ≈ 44.6%)

1999

Tor Aamodt and Paul Chow, Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation, MPDSP-1 (in conjunction with MICRO-32), pp. 3-12, Haifa Israel, November 1999. slides

Technical Reports

Wilson W. L. Fung, Inderpreet Singh, and Tor M. Aamodt, Kilo TM Correctness: ABA Tolerance and Validation-Commit Indivisibility, Technical Report, University of British Columbia, 24 May 2012.

Owen Kirby, Shahriar Mirabbasi, and Tor M. Aamodt, Mixed-Signal Neural Network Branch Prediction, Technical Report, University of British Columbia, 8 June 2007.

Tor Aamodt, Andreas Moshovos, and Paul Chow, The Predictability of Computations that Produce Unpredictable Outcomes Technical Report #TR-01-08-01, EECG, University of Toronto, August 2001.

Tor M. Aamodt, Modeling and Optimization of Speculative Threads, Doctoral Thesis, University of Toronto, 2006.

Tor Aamodt, Floating-Point to Fixed-Point Compilation and Embedded Architectural Support, Masters Thesis, University of Toronto, January 2001.

Tor Aamodt, Intelligent Control via Reinforcement Learning: State Transfer and Stabilization of a Rotational Inverted Pendulum, Bachelors Thesis, University of Toronto, April 1997.