Indigenous Language Translation with Sparse Data (3.0)
Aryan Gulati * ,
Leslie Moreno *
,
Abhinav Gupta ,
Aditya Kumar
,
Jonathan May *
* Project Lead
* Project Advisor
Imperialism has led to a loss of many indigenous cultures and with this, their languages. Based on the NeurIPS 2022 Competition “Second AmericasNLP Competition: Speech-to-Text Translation for Indigenous Languages of the Americas,” this project aims to use machine translation (MT) and automatic speech recognition (ASR) approaches to develop a translator for endangered or extinct indigenous languages. This will involve finding and/or building an appropriately sized corpus and using this to train MT and ASR models due to the sparsity of data on these indigenous languages.