Luke D. Gessler
Hi, I'm Luke! I'm an assistant professor at Indiana University's Department of Linguistics. Previously, I was a postdoc in the NALA Group at CU Boulder, and I got my Ph.D. in computational linguistics at Georgetown University with the Corpling lab and NERT.
I am interested in low-resource NLP and language resource development, particularly in the context of endangered language documentation. Linguists are often asked how many languages they speak, so let me share that mine include English (native), Latin (reading), Hindi-Urdu (conversational), Sahidic Coptic (reading), and bits and pieces of others.
Research
Modern methods in natural language processing (NLP) from the past 10 years are powerful but require great amounts of data. This has led to a performance gap between a handful of high-resource languages which have enough data to fully exploit models' capabilities (like Russian or English), and low-resource languages (like Mohawk or Uyghur), which lack the data volume required to fully realize models' potential. This is a regrettable circumstance, as poorer performing models may induce many inequities (such as socioeconomic ones), and most of the world's languages may not be spoken by the close of the century and are in need of high-quality language technologies to aid efforts to document and revitalize them.
My work is broadly aimed at addressing this performance gap between high- and low-resource languages by developing linguistically-sophisticated resources and algorithms which can help low-resource languages overcome the onerous demands of the deep learning methods which have become dominant in NLP. Most of my work can be seen as belonging to at least one of the three following threads:
- Language resource development: creation and maintenance of natural language corpora enriched with linguistic analyses.
- NLP-capable language documentation systems: developing systems aimed at language documentation, i.e. the process of collecting and describing data in a particular language, with an emphasis on deep integration with NLP systems to facilitate the documentary process.
- Low-resource NLP: development of methods specifically for languages with little data.
Online
Here is my…
- Email: lukegessler@gmail.com
- CV
- GitHub
- ACL Anthology page
- Google Scholar page
- Hacker News account
Also, I maintain the Map of Applications for Linguistic Annotation.