Broadly, I am interested in how human perception can be modeled, and have mostly been focused on the auditory domain. Speech is something we perceive and use effortlessly but is nontrivial to machine learning. With decades of efforts, speech processing in ML still encounters difficulty in low-resource and noisy scenarios. In my PhD, I study speech processing (in machines) and auditory perception (in humans) and how we can use advances in one to inform the other.

In my MS paper in Computer Science, I use differentiability to bring auditory neuroscience models into deep learning frameworks. In other words, I implement neuroscience models in signal processing steps so that parameters along the way can be updated through back-propagation, allowing physics-based models to be used in part of the deep learning pipeline.

In a PhD project in Cognitive Science, I use various machine learning models to simulate human behavior of distinguishing between certain languages and not others, and revealed that temporal regularities are not important to any of the machine learning models – and therefore may not be relevant in driving the human behavior either.


As an undergrad, I was involved in a project using machine learning models to study early phonetic learning. I led the investigation of training models under varying input conditions in a audiobook corpus I constructed, and found that models trained on unevenly distributed speaker — perhaps more similar to infants’ learning environments — do not generalize well to new settings. This work was presented in CogSci 2020.

I also actively contributed towards building a speech corpus of Mandarin-accented English. The corpus contains speech of native American English and Mandarin-Accented English, with both isolated words and connected speech. A preliminary report of the corpus can be found here. I also learned to use (Bayesian) ideal adaptor models to model a listener’s adaptation to such accent using this corpus data.

Publications and Presentations (updated April, 2024)

2024

Famularo, R. L., Aboelata, A., Schatz, T., Feldman, N.H., 2024. Language discrimination may not rely on rhythm: A computational study. To be presented at CogSci 2024. [paper]

2022

Li, R.1, Schatz, T., Feldman, N.H., 2022. Modeling rhythm in speech as in music: Towards a unified cognitive representation, in: Cognitive Computational Neuroscience (CCN). [paper]

2020

Li, R., Schatz, T., Matusevych, Y., Goldwater, S., Feldman, N.H., 2020. Input matters in the modeling of early phonetic learning, in: Proceedings of the Annual Conference of the Cognitive Science Society. oral presentation, online. [paper]

2019

Li, R., Schatz, T., Matusevych, Y., Goldwater, S., Feldman, N.H., 2019. Modeling early phonetic learning: The effect of input size and speaker distribution, in: The Young Female Researchers in Speech Workshop. poster, Graz, Austria. [abstract | poster]

2018

Li, R., Xie, X., Jaeger, T.F., 2018. A corpus of native and non-native speech for speech production research, in: The 24th Annual Conference on Architectures and Mechanisms for Language Processing (AMLaP). poster, Berlin, Germany. [abstract | poster]

Xie, X., Li, R., Jaeger, T.F., 2018. Disentangling intra- and inter-talker variability in L2 phonetic production: L2 speech, but not talkers, is more variable, in: The 24th Annual Conference on Architectures and Mechanisms for Language Processing (AMLaP). poster, Berlin, Germany.

Xie, X., Li, R., & Jaeger, T. F., 2018. Perceiving native- and foreign-accented speech: A problem of probabilistic inference under uncertainty, in: International Max Planck Research School for the Language Sciences Conference (IMPRS). poster, Nijmegen, Netherlands.

Li, R, 2018. Speech variability across talkers and accidents, in: The National Conference of Undergraduate Research (NCUR). oral presentation, Oklahoma city, OK.

  1. Published under maiden name (Ruolan Li). Similarly hereinafter.