FALL 2022

Toward Robust Multimodal Speech Recognition for Embodied Agents

Allen Chang *Xiaoyuan Zhu,  Aarav Monga,  Seoho Ahn
* Project Lead

In Vision Language Navigation (VLN) tasks, an embodied agent must navigate a 3D environment by utilizing both natural language instruction given from an oracle and visual observation of the surroundings. Due to the difficulty of the task, VLN agents are trained under the assumption that the oracle will offer standard instructions. For text-based instructions, this means commands contain few content errors and are written in impeccable grammar. Adding speech into the mix introduces another layer of complexity. Since speech input widely varies between oracles, VLN agents can struggle with decoding meaning from this form of instruction. This makes training a VLN agent on speech particularly challenging. Yet in VLN, the agent has access to visual observations from the environment, which can aid in determining plausible meaning during moments of ambiguous instruction. Our solution is to expand this intuition and develop a robust Automatic Speech Recognition (ASR) model that will utilize visual context to recover semantic meaning from corrupted commands. Ultimately, our project aims to make VLN agents more robust to non-standard speech instruction.

ProjectX: Stress Recognition for Health Workers using Human Activity Recognition

Jordan Cahoon *Josheta Srinivasan,  Armando Chirinos,  Jonathan Qin
* Project Lead

ProjectX is the world’s largest undergraduate machine learning research competition with competing teams from top universities across the world. The winning team from each of the three subtopic focuses will be awarded a cash prize of CAD $25,000, and all participants will be invited to attend the annual UofT AI Conference in January 2023 where the ProjectX award ceremony will take place. Last year we had ~800 participants and our keynote speaker was Geoffrey Hinton.

Computer Vision for Quality Assurance

Eric Cheng *,  Jaiv Doshi *Jarret Spino,  Seena Pourzand,  Irika Katiyar,  Leo Zhuang
* Project Lead

We are partnering with Wintec Industries to introduce an automated system for quality assurance on their manufactured computer modules, including PCB boards, SSD drives, and other hardware components. Currently, the main method of quality assurance is done through manual inspection. With manual inspection, there are a few main limitations: 1) throughput is slow; 2) reliability is variable, especially when considering worker fatigue; 3) manual, repeatable labor can be costly. We want to introduce a computer vision system to automatically detect damages to these modules. There are two main components to this project: 1) consulting with Wintec to advise the optimal hardware for data collection; 2) using the collected data, construct a deep learning model to recognize damages and defects.

Indigenous Language Translation with Sparse Data

Aryan Gulati *,  Leslie Moreno *Abhinav Gupta,  Nathan Huh,  Zaid Abdulrehman
* Project Lead

Imperialism has led to a loss of many indigenous cultures and with this, their languages. Based on the NeurIPS 2022 Competition “Second AmericasNLP Competition: Speech-to-Text Translation for Indigenous Languages of the Americas,” this project aims to use machine translation (MT) and automatic speech recognition (ASR) approaches to develop a translator for endangered or extinct indigenous languages. This will involve finding and/or building an appropriately sized corpus and using this to train MT and ASR models due to the sparsity of data on these indigenous languages.