About Me

PhD student at Stony Brook University, NY, USA advised by Michael Ryoo.
Former Student Researcher at Google Research with Srikumar Ramalingam.
Former Research Intern at Meta with Tsung-Yu Lin.
Former Research Intern at Apple with Jonathon Shlens and Alexander Toshev.
Prior to PhD, Research Assistant at MBZUAI with Salman Khan, Muzammal Naseer, and Fahad Khan.
Interested in Computer Vision and Machine Learning with focus on the subdomains of Self-Supervised Learning, Video Understanding, and Vision-Language Representations.
Enjoy ballroom dancing, cooking, and theatre during my leisure time.

Have a look at my Curriculum Vitae for more details and Google Scholar for full list of papers.

Updates

LangRepo was accepted at ACL 2025!
LangToMo for language conditioned robot control is now on arxiv.
Checkout LatentCRF for efficient text-to-image generation.

Selected Publications

April 2025: Understanding Long Videos with Multimodal Language Models, ICLR 2025.
April 2025: LLaRA: Large Language and Robotics Assistant, ICLR 2025.
June 2024: Localization in Visual-LLMs Improves Reasoning, CVPR 2024.
May 2023: Language-based Video Self-Supervised Learning, NeurIPS 2023.
November 2022: T2I Diffusion Models are Zero-Shot Segmentors, CVPR workshop 2023.
October 2022: Perceptual Grouping in Contrastive VLMs, ICCV 2023.
November 2021: Self-supervised Video Transformers, CVPR 2022 (oral).
July 2021: Adversarial Transferability of Vision Transformers, ICLR 2022 (spotlight).
May 2021: Intriguing Properties of Vision Transformers, NeurIPS 2021 (spotlight).
March 2021: Orthogonal Projection Loss, ICCV 2021.
January, 2021: Conditional Generative Modeling, ICLR 2021.
September 2019: Activity Recognition in Videos, TCSVT journal.

Featured Work

Our Diffusion Illusions work (CVPR ‘23 Best Demo) featured on Stony Brook News.