profile photo

Monil Shah

 |  News  |  Experience  |  Publications  |  Projects  |  Contact  |  Responsibilities  |  Skills  | 

I am a graduate student at the University of California - San Diego where I'm pursuing my M.S. in Computer Engineering . I am currently doing summer internship at Qualcomm . Alongside my masters I am also a Research Assistant under Processor Tajana Rosing I am mainly interested in Computer Architecture involving CPU/GPU architectures, Network-On-Chips, Processor Design.

Prior to joining graduate school, I have worked in industry for 3+ years at NVIDIA within their PCIe hardware team at Bangalore Office. I delved in designing and verifying PCIe transaction layer features for multiple generations of protocol.

Before working full-time I was pursing my Bachelors at BITS (Pilani), Hyderabad in Electrical and Electronics Engineering. During my pre-final year of undergaduate studies, I interned in the PCIe team at NVIDIA Bangalore working on workflow automation, coverage closure, regression debugging and testbench fixes. In my sophomore year, I worked as a researcher under Professor Soumya J at BITS (Pilani), Hyderabad on Fault tolerant routing for Network-On-Chips.

In my junior years I worked as a Web developer freelancer and contributed to lot of projects (College fest management system, College fest websites, Startup websites, Personal projects on expense tracker, College Android application backend handling) . I also Interned at Ethnus Consultancy in the backend team on Redis and PHP in Summer '16

Feel free to check out my CV or drop me an e-mail if you want to chat with me!


 ~  Email  |  CV  |  Google Scholar  |  Github  |  LinkedIn  |  Facebook  ~ 


Jun '22  

Joined Qualcomm for Summer Internship. (Until Sep '22)

Apr '22  

Teaching Assistant for CSE140L: Digital Systems Laboratory under Prof Bryan Chin. (Until Jun '22)

Jan '22  

Teaching Assistant for CSE 140: Digital Systems under Prof Tajana Simunic Rosing. (Until Mar '22)

Sep '21  

Started Fall 2021 Quarter - U.C. San Diego!

Aug '21  

Last day of working (6 Aug) at NVIDIA before I left for Masters.

July '21  

Completed my 3 years working as a full-time Engineer at NVIDIA.

Jun '21  

Promoted to Senior ASIC Engineer at NVIDIA.

Apr '20  

Joined PCIe Design Team for a year.

Sep '19  

Promoted to ASIC Engineer II at NVIDIA.

Jul '18  

Joined NVIDIA as full time Engineer in the PCIe Verification Team.

Jun '18  

Graduated from BITS(Pilani) Hyderabad.

May '18  

Worked under Prof Soumya J in Fault Tolerant Routing for Network-on-Chips.

Dec '17  

Left NVIDIA as PCIe Verification Team Intern to finish graduation.

Jul '17  

Joined NVIDIA as PCIe Verification Team Intern.

Aug '16  

Appointed Head of Department of Technical Arts for ATMOS Technical Fest

May '16  

Joined as Intern in Backend team at Ethnus Consultancy Pvt Ltd. (2.5 months Summer Intern)

Aug '14  

Joined BITS (Pilani) Hyderabad for Bachelors in Electrical and Electronics Engineering.

University of California - San Diego

Master of Science | Electrical & Computer Engineering
Major: Computer Engineering
Sep '21 - Present

Relevant Coursework:
(Spring 22)   CSE-260 : Parallel Computing
(Spring 22)   CSE-240B: Parallel Computer Architecture
(Winter 22) CSE-240C: Advanced Microarchitecture
(Winter 22) ECE-277 : CUDA Programming
(Winter 22) CSE-120 : Operating Systems
(Fall 21)         ECE-284 : VLSI Implementation of Machine Learning Algorithms
(Fall 21)         ECE-260A: VLSI System Design
(Fall 21)         CSE-240A: Principles of Computer Architecture

Birla Institute of Technology and Science (Pilani), Hyderabad

Bachelor of Engineering (Hons) | Electrical and Electronics Engineering
Aug '14 - May '18
Relevant Coursework:
Embedded System Design
Microprocessor and Interfacing
FPGA Programming
Principles of Computer Architecture


Graphics Research Intern | Qualcomm, San Diego
Jun '22 - Present

  • Opencl Benchmark analysis and workload profiling for Adreno GPUs
  • OpenCL Kernel optimization for specific workloads


  • Graduate Research Assistant (UCSD) | Prof. Tajana Simunic Rosing
    Sep '21 - Present

    Working under the supervision of Prof. Tajana Simunic Rosing.

  • Responsible for Design (System Verilog), Verification (System Verilog) and Synthesis (Design Compiler) of CNN hardware Accelerator (PatterNet) for Tapeout
  • Exploring GEM5 modifications to collect configuration statistics for prediction


  • Graduate Teaching Assistant (UCSD) | CSE 140L and 140
    Jan '22 - Jun '22

    Working under the supervision of Prof. Bryan Chin and Prof. Tajana Simunic Rosing at UC San Diego. CSE 140L covers verilog based programming assignments and CSE 140 covers basic concepts related to digital designs (Boolean, combinational, sequential , HLSM)

  • Responsible to conduct Office hours to solve doubts per student basis
  • Responsible to conduct discussion sessions to explain more about assignments
  • Responsible to design, solve, grade HW assignments and exams
  • Responsible to monitor Q/A forum


  • ASIC Design and Verification Engineer | NVIDIA Bangalore
    Jul '18 - Aug '21

    RTL Design
  • Designed PCIe protocol defined features like error reporting and downstream port containment for root port IP
  • Responsible for Timing fixes, Clock Gating fixes (SLCG), Code Coverage analysis, Flow Automation, Top level connections, Cluster debug
  • Interacted with Security, Driver, Cluster, Fullchip, SOCD and PD teams
  • RTL Verification
  • Worked on various aspects of verification methodology like Test-planning, microarchitecture discussions, infrastructure setup, functional coverage closure, regression debug
  • Extended UVM based infrastructure in system verilog for verifying PCIe Transaction Layer features like ResetWidth checks, MSI Ordering Checks, Address Blocker IP, Error Detection and Reporting(AER) , Register Access Security, RAS Error Injection
  • Responsible for ISO26262 - Industry automotive safety standard- Verification closure of PCIe IP within GPU



  • Teaching Assistant (BITS) | Microprocessors and Interfacing
    Jan '18 - May '18

  • One of the Teaching Assistant for Course on Microprocessors with a strength of around 400 Students under Professor Soumya J . Responsible for Labwork supervision and mentoring and assisted in End-Semester Grading

  • Head of Department of Technical Arts (BITS) | ATMOS '16
    Aug '16 - May '17

  • Managed department of over 60 students with focus on design Creatives, Website development and Application development for College Fest. We helped achieve a footfall over 10,000 and Social Outreach of 100,000

  • Backend Developer | Ethnus Consultancy Bangalore
    May '16 - Jul '16

  • As a backend developer, I created and tested a framework supporting operations for a video call service, designed an Admin Dashboard , both frontend and Backend, using PHP and Redis , and tested a audio call service using API. My primary focus was to learn new things while adding value to the organisation



  • My Google scholar profile can be found here

  • B. Khaleghi, U. Mallappa, D. Yaldiz, H. Yang, M. Shah, J. Kang, T. Rosing "PatterNet: Explore and Exploit Filter Patterns for Efficient Deep Neural Networks", DAC, 2022.

  • Multi-application Based Network-on-Chip Design for Mesh-of-Tree Topology Using Global Mapping and Reconfigurable Architecture, Mohit Upadhyay, Monil Shah, P Veda Bhanu, J Soumya, Linga Reddy Cenkeramaddi, 32nd International Conference on VLSI Design, VLSID 2019 DOI

  • A Novel Fault-Tolerant Routing Technique for Mesh-of-Tree based Network-on-Chip Design, Mohit Upadhyay, Monil Shah, P Veda Bhanu, J Soumya, Linga Reddy Cenkeramaddi, Henning Idsøe, IEEE TENCON 2018 DOI

  • Fault Tolerant Routing Methodology for Mesh-of-Tree based Network-on-Chips using Local Reconfiguration, Mohit Upadhyay, Monil Shah, P Veda Bhanu, J Soumya, Linga Reddy Cenkeramaddi, International Conference on High Performance Computing and Simulation, HPCS 2018 DOI

  • A Novel Fault-Tolerant Routing Algorithm for Mesh-of-Tree Based Network-on-Chips, Monil Shah, Mohit Upadhyay, P Veda Bhanu, J Soumya, Linga Reddy Cenkeramaddi, 22nd International Symposium on VLSI Design and Test, VDAT 2018 DOI

  • Parallel Computer Architecture

    C++, Coherence, Simulator | Apr. 2022 - Jun. 2022

  • Simulated single level bus-snooping based cache coherence simulator for L1 cache - Main Memory system using cache starter code[report]
  • Parallel Computing

    K80, T4, AVX2, C, CUDA, MPI, Expanse, nvprof, tau | Apr. 2022 - Jun. 2022

  • Accelerated matrix multiplication in C using blocking and vectorization on Intel AVX2 from Naive implementation of 2 Gflops to 23 Gflops for a 1024x1024 matrix [report]
  • Accelerated matrix multiplication in CUDA using blocking and shared memory on K80 and T4 from Naive implementation of 95G Gflops on K80 to 500+ Gflops [report]
  • Implemented Aliev-Panfilov solver using C++, MPI on Expanse supercomputer [report]
  • Advanced microarchitecture

    Prefetchers, Branch Predictors, Cache Replacement, Spectre, Meltdown, ChampSim | Jan. 2022 - Mar. 2022

  • Performed workload analysis and design space exploration for prefetcher and cache replacement on champsim simulator [report]
  • Explored microarchitectural optimizations to create gadgets similar to spectre and meltdown for security attacks[report]
  • CUDA Programming

    CUDA, Multithreaded programming, Heterogenous Programming | Jan. 2022 - Mar. 2022

  • Used reinforcement learning to implement mine-sweeper game using multithreaded programming model
  • Implemented seed table construction on GPU for finding arbitrary length kmer in a string of DNA sequence [report]
  • Modelling Branch predictor and L1 cache

    C , Docker | Sep. 2021 - Dec. 2021

  • Modelled branch predictors like Gshare, Tournament, Perceptron, TAGE [Pedictor code]
  • Modelled L1 Cache structure with FiFo Replacement policy [Cache code]
  • Hardware Implementation of Machine Learning

    Pytorch , Verilog | Sep. 2021 - Dec. 2021

  • Mapped layers of VGGnet by tiling on a 8*8 2D systolic array designed in Verilog. Performed optimizations like Quantization and Pruning to implement clock gating and reduce dynamic power [report]
  • Fault Tolerant Routing for Network-on-Chips

    C | Jan. 2018 - May. 2018 [report]

  • Devised and modelled static scheduling algorithm to support packet routing on a Mesh-of-Tree Topology based NoC under Router faults ensuring scalability in topology size.
  • By modelling a two-phase approach of mapping and reconfiguration using additional hardware, the distance between communicating cores was reduced thereby reducing intercommunication latency
  • 5 Stage pipelined processor

    Verilog, Vivado | Jan. 2017 - May. 2017 [report]

  • Designed a 5 stage pipelined processor in verilog based on limited MIPS ISA that handles Forwarding and Stalling for Control and Data Hazards.

  • Head of Department of Technical Arts | ATMOS '16

    Aug. 2016 - May. 2017

  • As Head of Technical Department, I coordinated the design creatives , website and Android App for our Technical fest - ATMOS , which managed to secure a footfall of 10000+ and social reach of 100000+ . My role and aim was focused on meeting tight deadlines for the creatives, amid huge volumes to be delivered and scarce workforce , without compromising the quality
  • Head Boy | Student's Council DPS Surat

    Mar. 2013 - Feb. 2014

  • As Head Boy and part of Student Representative Council for the academic year 2013-14, I was the point of contact of students to administration and worked closely with teachers and students for coordinating curricular and extracurricular activities.

  • Languages : Verilog, System Verilog, Perl, Bash, C, C++
  • Tools : Champsim, GEM5, Design Compiler, Vivado

  • This template is a modification to Jon Barron's website. Find the source code to my website here.