Luke Gessler

Hi, I’m Luke! I’m an assistant professor at Indiana University’s Department of Linguistics, with adjunct appointments in Computer Science and Middle Eastern Languages and Cultures. Previously, I was a postdoc in the NALA Group at CU Boulder, and I got my Ph.D. in computational linguistics at Georgetown University with the Corpling lab and NERT.
I am interested in low-resource NLP and language resource development, particularly in the context of endangered language documentation. Linguists are often asked how many languages they speak, so let me share that mine include English (native), Latin (reading), Hindi-Urdu (conversational), Sahidic Coptic (reading), and bits and pieces of others.
Research
Modern methods in natural language processing (NLP) are powerful but require great amounts of data. This has led to a performance gap between a handful of high-resource languages which have enough data to fully exploit models' capabilities (like Russian or English), and low-resource languages (like Mohawk or Uyghur), which lack the data volume required to fully realize models' potential. This is a regrettable circumstance, as poorer performing models may induce many inequities (such as socioeconomic ones), and most of the world’s languages may not be spoken by the close of the century and are in need of high-quality language technologies to aid efforts to document and revitalize them.
My work is broadly aimed at addressing this performance gap between high- and low-resource languages by developing linguistically-sophisticated resources and algorithms which can help low-resource languages overcome the onerous demands of the deep learning methods which have become dominant in NLP. Most of my work can be seen as belonging to at least one of the three following threads:
-
Language resource development: creation and maintenance of natural language corpora enriched with linguistic analyses.
-
NLP-capable language documentation systems: developing systems aimed at language documentation, i.e. the process of collecting and describing data in a particular language, with an emphasis on deep integration with NLP systems to facilitate the documentary process.
-
Low-resource NLP: development of methods specifically for languages with little data.
Online
Here is my…
Also, I’m webmaster of the langdoc.net discussion forum, a place for anyone interested in language documentation and language technology. Come join us!
Publications
- Ali Marashian, Enora Rice, Luke Gessler, Alexis Palmer, and Katharina von der Wense. 2025. From priest to doctor: Domain adaptation for low-resource neural machine translation. In Proceedings of the 31st International Conference on Computational Linguistics, pages 7087–7098, Abu Dhabi, UAE. Association for Computational Linguistics.
- Luke Gessler. 2024. PrOnto: Language model evaluations for 859 languages. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13243–13256, Torino, Italia. ELRA \& ICCL.
- Luke Gessler and Katharina von der Wense. 2024. NLP for language documentation: Two reasons for the gap between theory and practice. In Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024), pages 1–6, Mexico City, Mexico. Association for Computational Linguistics.
- Enora Rice, Ali Marashian, Luke Gessler, Alexis Palmer, and Katharina von der Wense. 2024. TAMS: Translation-assisted morphological segmentation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6752–6765, Bangkok, Thailand. Association for Computational Linguistics.
- Amir Zeldes, Tatsuya Aoyama, Yang Janet Liu, Siyao Peng, Debopam Das, and Luke Gessler. 2024. eRST: A signaled graph theory of discourse relations and organization. Computational Linguistics, pages 1–50.
- Tatsuya Aoyama, Shabnam Behzad, Luke Gessler, Lauren Levine, Jessica Lin, Yang Janet Liu, Siyao Peng, Yilun Zhu, and Amir Zeldes. 2023. GENTLE: A genre-diverse multilayer challenge set for English NLP and linguistic evaluation. In Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII), pages 166–178, Toronto, Canada. Association for Computational Linguistics.
- Luke Gessler and Nathan Schneider. 2023. Syntactic inductive bias in transformer language models: Especially helpful for low-resource languages? In Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL), pages 238–253, Singapore. Association for Computational Linguistics.
- Luke Gessler. 2022. Closing the NLP gap: Documentary linguistics and NLP need a shared software infrastructure. In Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 119–126, Dublin, Ireland. Association for Computational Linguistics.
- Luke Gessler, Austin Blodgett, Joseph C. Ledford, and Nathan Schneider. 2022. Xposition: An online multilingual database of adpositional semantics. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1824–1830, Marseille, France. European Language Resources Association.
- Luke Gessler, Lauren Levine, and Amir Zeldes. 2022. Midas loop: A prioritized human-in-the-loop annotation for large scale multilayer data. In Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022, pages 103–110, Marseille, France. European Language Resources Association.
- Luke Gessler and Amir Zeldes. 2022. MicroBERT: Effective training of low-resource monolingual BERTs through parameter reduction and multitask learning. In Proceedings of the 2nd Workshop on Multi-lingual Representation Learning (MRL), pages 86–99, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Luke Gessler, Shabnam Behzad, Yang Janet Liu, Siyao Peng, Yilun Zhu, and Amir Zeldes. 2021. DisCoDisCo at the DISRPT2021 shared task: A system for discourse segmentation, classification, and connective detection. In Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021), pages 51–62, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Luke Gessler and Nathan Schneider. 2021. BERT has uncommon sense: Similarity ranking for word sense BERTology. In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 539–547, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Aryaman Arora, Luke Gessler, and Nathan Schneider. 2020. Supervised grapheme-to-phoneme conversion of orthographic schwas in Hindi and Punjabi. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7791–7795, Online. Association for Computational Linguistics.
- Luke Gessler, Siyao Peng, Yang Janet Liu, Yilun Zhu, Shabnam Behzad, and Amir Zeldes. 2020. AMALGUM – a free, balanced, multilayer English web corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5267–5275, Marseille, France. European Language Resources Association.
- Luke Gessler, Shira Wein, and Nathan Schneider. 2020. Supersense and sensibility: Proxy tasks for semantic annotation of prepositions. In Proceedings of the 14th Linguistic Annotation Workshop, pages 117–126, Barcelona, Spain. Association for Computational Linguistics.
- Graham Neubig, Shruti Rijhwani, Alexis Palmer, Jordan MacKenzie, Hilaria Cruz, Xinjian Li, Matthew Lee, Aditi Chaudhary, Luke Gessler, Steven Abney, Shirley Anugrah Hayati, Antonios Anastasopoulos, Olga Zamaraeva, Emily Prud'hommeaux, Jennette Child, Sara Child, Rebecca Knowles, Sarah Moeller, Jeffrey Micher, Yiyuan Li, Sydney Zink, Mengzhou Xia, Roshan Sharma, and Patrick Littell. 2020. A summary of the first workshop on language technology for language documentation and revitalization. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pages 342–351, Marseille, France. European Language Resources association.
- Mitchell Abrams, Luke Gessler, and Matthew Marge. 2019. B. rex: a dialogue agent for book recommendations. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, pages 418–421, Stockholm, Sweden. Association for Computational Linguistics.
- Luke Gessler. 2019. Developing without developers: choosing labor-saving tools for language documentation apps. In Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers), pages 6–13, Honolulu. Association for Computational Linguistics.
- Luke Gessler, Yang Janet Liu, and Amir Zeldes. 2019. A discourse signal annotation system for RST trees. In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, pages 56–61, Minneapolis, MN. Association for Computational Linguistics.