Luke D. Gessler

Hi, I'm Luke! Starting in August 2024, I will join Indiana University's Department of Linguistics as an assistant professor. Currently, I am a postdoc in the NALA Group with Katharina von der Wense at CU Boulder, where I went after I got my Ph.D. in computational linguistics at Georgetown University.

I am interested in low-resource NLP and language resource development, particularly in the context of endangered language documentation. Linguists are often asked how many languages they speak, so let me share that mine include English (native), Latin (reading), Hindi-Urdu (conversational), Sahidic Coptic (reading), and bits and pieces of others.


Modern methods in natural language processing (NLP) from the past 10 years are powerful but require great amounts of data. This has led to a performance gap between a handful of high-resource languages which have enough data to fully exploit models' capabilities (like Russian or English), and low-resource languages (like Mohawk or Uyghur), which lack the data volume required to fully realize models' potential. This is a regrettable circumstance, as poorer performing models may induce many inequities (such as socioeconomic ones), and most of the world's languages may not be spoken by the close of the century and are in need of high-quality language technologies to aid efforts to document and revitalize them.

My work is broadly aimed at addressing this performance gap between high- and low-resource languages by developing linguistically-sophisticated resources and algorithms which can help low-resource languages overcome the onerous demands of the deep learning methods which have become dominant in NLP. Most of my work can be seen as belonging to at least one of the three following threads:

  • Language resource development: creation and maintenance of natural language corpora enriched with linguistic analyses.
  • NLP-capable language documentation systems: developing systems aimed at language documentation, i.e. the process of collecting and describing data in a particular language, with an emphasis on deep integration with NLP systems to facilitate the documentary process.
  • Low-resource NLP: development of methods specifically for languages with little data.


Recent work

(), """.". . (), . In: . (.) : , . , .
[ link ]
[ link ]