MALA
The Map of Applications for Linguistic Annotation
MALA MALA is a curated list of applications for collecting and annotating natural language text.
Scope MALA intends to be broad in scope, including apps used by the computer science/NLP community and those used by the linguistics community.
Contributing If you have an application you'd like to add, or you want to submit a correction to an already-listed app, please see our GitHub page.
ClassificationsData Types
Apps are coarsely classified by supported data types. This is a statement about what the app is capable of representing internally, NOT what the app is capable of exporting/importing. Recognized values are:- tei: TEI XML
- tags: Annotations which are one-to-one aligned with tokens. Examples include POS tags and interlinear glosses.
- time-aligned: Time alignments between text/annotations and audio/video.
- chunk-spans: Spans (i.e., contiguous sets of tokens), which are NOT allowed to overlap or nest.
- overlapping-spans: Spans (i.e., contiguous sets of tokens), which MAY overlap or nest.
- constituency-tree: Constituency trees, e.g. for PTB-style syntax trees or Rhetorical Structure Theory trees.
- dependency-tree: Dependency trees, e.g. those used for Universal Dependencies treebanks.
- graph: Non-tree graph structures, e.g. for graph-based semantic representations like AMR or UCCA.
Import/Export Formats
Apps are also coarsely classified by what formats they are capable of importing from and exporting to. Recognized values are:- plaintext: A textual representation which might or might not be formatted visually using whitespace, but does not contain any markup.
- proprietary: Some format which is not in wide use outside of the app, and is likely limited just to use within the app.
- csv: A comma- or tab-delimited format.
- xml: Some XML format which may or may not be proprietary.
- json: Some JSON format which may or may not be proprietary.
- conllu: The CoNLL-U format.
- elan-xml: The XML format used by ELAN save files.
- flex-xml: The XML format used in exports by FieldWorks Lexicon Explorer.