Edson Araujo

I'm a PhD Student at University of Tübingen, working with Prof. Hilde Kuehne and co-advised by Dr. Jim Glass (MIT CSAIL). Our work is part of the MIT-IBM Watson AI Sight and Sound Project, where I focus on audio-visual reasoning, multimodal large language models, and test-time adaptation.

I did my Master's in Computer Science at UFMG under the supervision of Prof. Erickson Nascimento, period in which I was able to collaborate in different research topics such as video summarization and image descriptors.

Email:

Portrait photo of Edson Araujo

News

05.2026Accepted to the CVPR 2026 Doctoral Consortium!

04.2026Recognized as a 'Top 200' reviewer at ICLR 2026.

12.2025We are organizing the fifth edition of the "What is Next in Multimodal Foundation Models?" Workshop (CVPR 2026)

08.2025Omni-R1 was accepted to ASRU 2025! (shortlisted for Best Student Paper!)

05.2025Omni-R1, our latest work from the MIT-IBM Watson AI Sight and Sound Project, is out on ArXiv!

05.2025CAV-MAE Sync is also going to be presented at the LatinX, MMFM and Sight and Sound Workshops at CVPR 2025!

02.2025CAV-MAE Sync was accepted to CVPR 2025 as a poster presentation. Paper is on ArXiv.

Research

I'm interested in audio-visual reasoning, multimodal large language models, self-supervised learning, and test-time adaptation. Some papers are highlighted.

Selected Publications

TTA-Vid framework diagram for test-time adaptation on instructional videos
TTA-Vid: Generalized Test-Time Adaptation for Video Reasoning
Soumya Shamarao Jahagirdar*, Edson Araujo*, Anna Kukleva, M. Jehanzeb Mirza, Saurabhchand Bhati, Samuel Thomas, Brian Kingsbury, Rogerio Feris, James R. Glass, Hilde Kuehne

Adapts video-language models at test time using step-by-step frame reasoning and multi-armed bandit frame selection. No labels required.