About Me
Google Scholar | CV Link | Github | Email
- PhD student at Stony Brook University, NY, USA advised by Michael Ryoo.
- Research Intern at Salesforce AI Research with Juan Carlos Niebles.
- Former Intern at Google Research with Srikumar Ramalingam, Meta with Tsung-Yu Lin, and Apple with Jonathon Shlens and Alexander Toshev.
- Former Researcher at MBZUAI with Salman Khan, Muzammal Naseer, and Fahad Khan.
- Interested in Computer Vision and Machine Learning with focus on Video Understanding, Vision-Language Representations, and Robot Learning.
- Enjoy ballroom dancing, cooking, and theatre during my leisure time.
Have a look at my Curriculum Vitae for more details and Google Scholar for full list of papers.
Updates
- LangToMo for language conditioned robot control is now on arxiv.
- Checkout LatentCRF for efficient text-to-image generation.
Selected Publications
- July, 2025: Language Repository for Long Video Understanding, ACL Findings 2025.
- April 2025: Understanding Long Videos with Multimodal Language Models, ICLR 2025.
- April 2025: LLaRA: Large Language and Robotics Assistant, ICLR 2025.
- June 2024: Localization in Visual-LLMs Improves Reasoning, CVPR 2024.
- May 2023: Language-based Video Self-Supervised Learning, NeurIPS 2023.
- November 2022: T2I Diffusion Models are Zero-Shot Segmentors, CVPR workshop 2023.
- October 2022: Perceptual Grouping in Contrastive VLMs, ICCV 2023.
- November 2021: Self-supervised Video Transformers, CVPR 2022 (oral).
- July 2021: Adversarial Transferability of Vision Transformers, ICLR 2022 (spotlight).
- May 2021: Intriguing Properties of Vision Transformers, NeurIPS 2021 (spotlight).
- March 2021: Orthogonal Projection Loss, ICCV 2021.
- January, 2021: Conditional Generative Modeling, ICLR 2021.
- September 2019: Activity Recognition in Videos, TCSVT journal.
Featured Work
- Our Diffusion Illusions work (CVPR ‘23 Best Demo) featured on Stony Brook News.