I’m a graduate student in the College of Computing at Georgia Institute of Technology, advised by Dr. Rich Vuduc.
I work on high performance computing, focusing primarily on accelerated computing and programming models for heterogeneous architectures. My research helps HPC developers write speed of light applications and kernels that take advantage of modern hardware capabilities in a way that does not make them want to pull their hair out.
My interests mostly lie at the intersection of hardware and software for accelerated computing. I have worked at various hardware shops to this end including Arm, Cerebras Systems, and Nvidia on hardware modeling, high performance kernels, and library design. Good folks at Oak Ridge national lab’s OLCF remain my long time collaborators on application level projects.
Currently, I work at NVIDIA full time while I finish my PhD. At NVIDIA, I collaborate closely with Cris Cecka from NVR to lead the design of next generation linear algebra libraries, namely, CUTLASS 3.0, a project I have been working on since its inception. I also work on the exposure of next generation tensor core hardware features into the CUDA C++ programming model for peak developer productivity without performance compromises.
News
After having worked as an intern for nearly a year and half at NVIDIA, I have decided to join full time as a compute architect in the DL architecture group, in the fast kernels team!
I will continue my collaboration with Cris Cecka from NVR PSA to lead the design of next generation linear algebra library, CUTLASS 3.0, which I will also be basing much of my PhD thesis material on.
I am happy to announce that I will be joining Nvidia Research for an extended internship starting January 2022!
Publications
[DBLP]
Conference Papers
Exaflops Biomedical Knowledge Graph Analytics.
Supercomputing 2022. ACM Gordon Bell award finalist.
Scalable all-pairs shortest paths for huge graphs on multi-GPU clusters.
Scalable knowledge graph analytics at 136 petaflop/s.
Supercomputing 2020. ACM Gordon Bell award finalist.
Conditioning deep generative raw audio models for structured automatic music.