SPRING 2025

Post-generation ASR Hypothesis Reranking Utilizing Visual Contexts (3.0)

Marcus Au * , Tommy Shu * , Yirui Song , Zitong Huang , Catherine Lu
* Project Lead   

Speech Recognition models are capable of producing multiple transcripts. Often, the best transcript is not the one the model thinks is best, but rather a different transcript in the top-k generated transcripts. Typical approaches to re-scoring these transcripts to maximize ASR performance use language modeling. This project aims to improve the re-ranking process using an additional modality: visual input, specifically in the embodied agent domain. This work is an extension of previous work done by CAIS++ in speech recognition for embodied agents in Automatic Speech Recognition (ASR) pipeline proposed in 'Multimodal Speech Recognition for Language-Guided Embodied Agents' (Chang et al.).

Recovering Text from Scans of Preserved Scrolls (2.0)

Leslie Moreno * , Jaiv Doshi * , Aditya Kumar , Aryan Gulati , Allison Lim
* Project Lead   

Efforts to decode the ancient Herculaneum scrolls, preserved in the aftermath of Mount Vesuvius, have made significant strides through machine learning and computer vision, with the 2023 Vesuvius Challenge awarding over $1,000,000 for breakthroughs such as recovering Epicurean philosophy from Scroll 1. However, current ink detection methods, reliant on crackle patterns, fail to generalize to other scrolls. This project seeks to scale these techniques by leveraging unlabeled segmented data through self-supervised denoising and fine-tuning on labeled fragments, with the virtual unwrapping process—transforming 3D X-ray volumes into 2D surfaces—serving as the foundation for training models capable of generalizing across various scrolls, ultimately unlocking the hidden historical knowledge within.

Nanopore Signal for Radiation Damage Detection (2.0)

Vidur Mushran * , Mo Jiang , Pratyush Jaishanker , Ryan Nene , Vayun Mathur , Aram Modrek *Khoi Huynh * 
* Project Lead    * Project Advisor

Assessing radiation exposure and DNA damage in human cells currently relies on methods that lack the ability to provide real-time, quantitative insights into molecular changes. Nanopore sequencing offers a promising approach by enabling high-resolution analysis of DNA and RNA structures through electrical signal detection. However, interpreting nanopore signals for radiation-induced DNA damage remains a complex and underexplored challenge. This project focuses on developing a machine learning-based model to predict radiation exposure levels and detect specific biomarkers of DNA damage, facilitating applications in precision radiotherapy dosing, exposure monitoring in space and nuclear environments, ecological assessments after nuclear incidents, and early cancer prevention through timely intervention.

Fire Localization

Maia Piechocki * , Yash Gupta , Sam Shindich , Vanya Shrivastava , Andrew Choi , Arjun Bedi , Barath Raghavan * 
* Project Lead    * Project Advisor

Wildfires are increasingly frequent and devastating, yet current detection systems often suffer from high latency and poor coverage in remote regions. Building on the foundation of 'FireLoc: Low-Latency Multi-Modal Wildfire Geolocation' (Raghavan et al.), which demonstrated robust wildfire presence prediction in low-information environments, this project focuses on developing a full-stack pipeline for real-world deployment. The system will integrate low-power hardware installations at potential ignition sites, optimize machine learning inference for environments with limited computational resources, and implement a scalable alert system capable of rapidly notifying authorities upon fire detection. This end-to-end solution aims to enable earlier interventions, minimize wildfire spread, and protect vulnerable ecosystems and communities.

Quadruped Manipulation & Navigation

Leyaa George * , Kyle Macasilli-Tan * , Cole Sevier , Freddie Liang , Aidan Parris , Daniel Seita * 
* Project Lead    * Project Advisor

Quadruped robots offer promising mobility in complex environments, but their reliance on fragile sensors and hardware leaves them vulnerable to failure. Preliminary efforts have explored multimodal sensing for cross-modal compensation in manipulation tasks and framed sensor failure correction as a reinforcement learning challenge. This project builds on that foundation by extending cross-modal compensation techniques to navigation and investigating collaborative multi-agent reinforcement learning frameworks to enhance fault tolerance. By developing resilient quadrupeds capable of adapting to sensory and hardware disruptions, this work advances reliable autonomous operation in real-world, unpredictable conditions.

Indexing Global Testimonies

Shreeya Chand * , Sanjana Ilango , Annie Gao , Angela Zhuang , Anura Deshpande , Maddie Muller , Sandra Aguilar * 
* Project Lead    * Project Advisor

The USC Shoah Foundation houses a vast video archive of 58,000 survivor testimonies across 42 languages, preserving the lived experiences of individuals impacted by the Holocaust and other genocides. Traditionally, indexing these testimonies—assigning terms from a curated thesaurus of 72,000 entries to one-minute video segments—has required intensive manual effort. This project aims to automate the indexing process using AI, developing models that analyze transcript data and accurately assign relevant terms to corresponding video segments. By enabling scalable, consistent tagging across the archive, this work will enhance accessibility, deepen historical scholarship, and ensure that these critical stories remain discoverable for future generations.

Knowledge Graphs for Story Generation

Alvin Tan * , Jessica Fu , Naina Panjwani , Jay Campanell , Rida Faraz , Richa Misra , Joel Walsh * 
* Project Lead    * Project Advisor

While large language models (LLMs) have significantly advanced story generation, they often struggle with maintaining long-term coherence, character consistency, and causal structure. To address these limitations, prior work has incorporated knowledge graphs to explicitly model event sequences, entity relationships, and world-building constraints. This project explores a knowledge-graph-driven framework for interactive story generation, where an evolving knowledge graph guides narrative construction in real-time. Users can modify the graph mid-generation, enabling the story to adapt dynamically while preserving logical consistency. Beyond creative writing, this approach enables personalized and supplemental learning by allowing users—especially students—to generate stories tailored to specific educational domains, reinforcing concepts introduced in traditional curricula through narrative exploration and active engagement.

Multi-Agent Reinforcement Learning with Physics

Rajakrishnan Somou * , Spencer Tran * , Justin Yang , Sascha Manalili , Michael Blumberg , Brice Patchou , Bernice (Bingling) Huang * 
* Project Lead    * Project Advisor

Simulation-to-reality (sim-to-real) transfer remains a critical bottleneck in deploying AI systems for robotics and physical modeling. Traditional approaches often categorize physical knowledge by domain, limiting generalization across tasks. This project proposes integrating multi-agent reinforcement learning (MARL) with physics-informed constraints, organizing physical priors by underlying mathematical structures such as Differential-Algebraic Equations (DAE), Boundary-Parameterized Conditions (BPC), and Pressure-Potential-Velocity (PPV) relationships. Using fluid dynamics as an initial testbed, the project aims to develop AI agents capable of robust generalization across real-world robotic control, material simulation, and computational fluid dynamics applications, advancing the reliability and transferability of AI-driven physical systems.

FALL 2024

Substance Abuse Treatment Engagement

Naina Panjwani * , Jay Campanell * , Andrew Choi , Joanne Lee , Jimena Arce Cantu , Daniel Yang
* Project Lead   

Existing approaches to predicting treatment success in individuals with substance use disorders predominantly utilize model-specific explainable AI (XAI) techniques, which are constrained by their reliance on specific model architectures and thus limit generalizability. This project advances prior research by investigating the application of model-agnostic and post-hoc XAI methods, which provide model-independent explanations and are applied after model training, respectively. Through the integration of random forest models with these sophisticated XAI methodologies, the study aims to identify key factors that influence treatment success or failure. The findings are intended to optimize resource allocation, enable timely interventions, and enhance treatment outcomes for individuals with substance use disorders.

Digital Well-being for College Students

Darius Mahjoob * , Rachita Jain * , Spencer Tran , Kailin Xia , Catherine He , Yixue Zhao *Wei Xuan * 
* Project Lead    * Project Advisor

Understanding college student mental health often relies on self-reported surveys, which lack depth and real-time insights. The College Experience Dataset, a five-year mobile sensing study, combines demographics, sensor data, and ecological momentary assessments (EMA) to explore mental health and resilience. Analyzing patterns like the pandemic's impact, gender differences, and anomalies in behaviors such as sleep or phone usage remains complex. This project aims to use machine learning to uncover meaningful insights, enhancing understanding and guiding interventions.

Nanopore Signal for Radiation Damage Detection (1.0)

Vidur Mushran * , Mo Jiang , Pratyush Jaishanker , Ryan Nene , Anisha Chitta , Vayun Mathur , Aram Modrek *Khoi Huynh * 
* Project Lead    * Project Advisor

Assessing radiation exposure and DNA damage in human cells currently relies on methods that lack the ability to provide real-time, quantitative insights into molecular changes. Nanopore sequencing offers a promising approach by enabling high-resolution analysis of DNA and RNA structures through electrical signal detection. However, interpreting nanopore signals for radiation-induced DNA damage remains a complex and underexplored challenge. This project focuses on developing a machine learning-based model to predict radiation exposure levels and detect specific biomarkers of DNA damage, facilitating applications in precision radiotherapy dosing, exposure monitoring in space and nuclear environments, ecological assessments after nuclear incidents, and early cancer prevention through timely intervention.

Recovering Text from Scans of Preserved Scrolls (1.0)

Leslie Moreno * , Jaiv Doshi * , Aditya Kumar , Aryan Gulati
* Project Lead   

Efforts to decode the ancient Herculaneum scrolls, preserved in the aftermath of Mount Vesuvius, have made significant strides through machine learning and computer vision, with the 2023 Vesuvius Challenge awarding over $1,000,000 for breakthroughs such as recovering Epicurean philosophy from Scroll 1. However, current ink detection methods, reliant on crackle patterns, fail to generalize to other scrolls. This project seeks to scale these techniques by leveraging unlabeled segmented data through self-supervised denoising and fine-tuning on labeled fragments, with the virtual unwrapping process—transforming 3D X-ray volumes into 2D surfaces—serving as the foundation for training models capable of generalizing across various scrolls, ultimately unlocking the hidden historical knowledge within.

Multimodale Hate and Misinformation Detection Toolkit

Jonathan Aydin * , Siddarth Rudraraju * , Youqi Huang , Arjun Bedi , Malina Freeman , Jonathan Gomez * 
* Project Lead    * Project Advisor

Current tools for detecting hate speech and misinformation on social media often lack transparency and adaptability, making it challenging for researchers and policymakers to address these issues effectively. This project aims to develop an explainable, open-source toolkit for analyzing multimodal social media video content, leveraging a dataset of over 32TB of videos, 200 million posts, and 1 million images from the Parler platform. While existing models focus primarily on single-modal data or closed systems, this toolkit will integrate advanced machine learning techniques to analyze text, audio, and visual components simultaneously. By prioritizing explainability, the project seeks to uncover patterns in hate speech and misinformation, fostering a better understanding of their spread and enabling more effective intervention strategies.

Post-generation ASR Hypothesis Reranking Utilizing Visual Contexts (2.0)

Marcus Au * , Tommy Shu * , Yirui Song , Zitong Huang , Catherine Lu
* Project Lead   

The Automatic Speech Recognition (ASR) pipeline proposed in "Multimodal Speech Recognition for Language-Guided Embodied Agents" (Chang et al.) processes both unimodal (audio-only) and multimodal (audiovisual) data to generate multiple ranked hypotheses based on a given ground truth statement. However, the model often fails to rank the hypothesis with the lowest Word Error Rate (WER) as the top choice. To address this issue, we propose a multimodal reranking pipeline that leverages the same visual cues used in the ASR process.

Computer Vision and Machine Learning on Optical Coherence Tomography for Middle Ear Pathology Detection (3.0)

Claude Yoo * , Lucia Zhang * , Irika Katiyar , Will Dolan , Matthew Rodriguez , Lauren Sun , Sana Jayaswal , Brian Applegate * 
* Project Lead    * Project Advisor

Current diagnostic methods for middle ear diseases in otology are primarily qualitative and limited to examining only the surface of the tympanic membrane (TM). Optical Coherence Tomography (OCT) offers a non-invasive, quantitative imaging technique that enables three-dimensional reconstruction of the TM and middle ear, providing more detailed information than traditional methods. However, manually interpreting OCT scans can be time-consuming and challenging, and while OCT-based disease detection models are well-established in retinal imaging and ophthalmology, their application in otology remains relatively unexplored. This project focuses on creating a multi-classification machine learning model capable of identifying conditions such as retraction pockets, perforations, and cholesteatomas, and distinguishing them from healthy ear scans.