Annotations


UCE is compatible with a variety of annotations, provided they exist within the UIMA format. Within UCE, these annotations are used situationally for features or search enhancements, depending on the annotation.

Below you will find an ever-expanding list of importable and compatible annotations within UCE, ranging from standard Named-Entity annotations to more situational taxon or time annotations. All of these annotations can be generated and annotated within the corpus through the Docker Unified UIMA Interface.


OCR

Since much of the literature has yet to be digitized, UCE provides support for corpora containing documents that have undergone Optical Character Recognition (OCR) extraction. These annotations assist in reconstructing the physical layout of the pages within UCE.

More Details

Sentence

Divides the documents into their respective sentences.

More Details

Named-Entity

Extracts named entities from a document, categorizing them into four types: organization (ORG), person (PER), location (LOC), and miscellaneous (MISC).

More Details

Lemma, POS & Morphological Features

Lemmatization reduces inflected words to their root form. Within UCE, searches are enhanced by considering these root forms.

More Details

Semantic Role Labels (SRL)

SRL identifies semantic relations between the lexical constituents of a sentence, assigning labels to words or phrases that indicate their semantic roles, such as agent, goal, or result.

More Details

Time

Extracts temporal expressions, including time and date formats, from a document, analogous to Named-Entity Recognition tasks.

More Details

UceDynamicMetadata

Offers a dynamic and easy way to annotate key-value filters, which are then imported and used within UCE for the creation of custom filters.

More Details

Taxon

The recognition of unambiguous names of biological entities is referred to as a taxon.

More Details

WikiLinks

Maps potential words and phrases to their corresponding Wikidata URLs, facilitating the retrieval and access of additional information.

More Details

UnifiedTopic

Extracts topics from a document in the form of a list of keywords or categories, which can be used to summarize the content or identify its main theme. The list of categories depends on the model used for annotation.

More Details

GeoNames

The recognition of locations within texts and their annotation with hierarchical data, alternate and historical names, and tagging with unique identifiers. (Under construction)

More Details