Configurations
UCE has three different configuration levels and with it, three different config files. These levels are:
- INSTANCE
- CORPUS
- DEVELOPER
If you're a user only setting up UCE via Docker, then the DEVELOPER
level is of no interest to you. In the following, we outline the different configurations and their usage within UCE.
UCE Configuration
(INSTANCE)
UCE is customizable in a variety of ways, including color scheme, corpora identity, metadata, and more. To inject your UCE instance with your configuration, the uceConfig.json
file exists. Through it, you can model the UCE instance within JSON and then pass that uceConfig.json
file into the Web Portal through the command line.
You can copy the example uceConfig.json
below and create your own configuration from it.
uceConfig.json
{
"meta": {
"name": "John Doe Lab",
"version": "1.0.0",
"description": "The John Doe Lab works in the field of finance analysis and, in this context, gathers large amounts of data for their sentiment or entailment tasks. This data is made available through the <b>Finance</b> corpus. Herein, ..."
},
"corporate": {
"team": {
"description": "The team behind the Finance corpus is part of the <a target='_blank' href='https://www.john-doe-lab.org/'>John Doe Lab</a> of the Doe-University.",
"members": [
{
"name": "Prof. John Doe",
"role": "Supervisor",
"description": "Mr. Doe is the supervisor of the lab.",
"contact": {
"name": "Prof. Dr. John Doe",
"email": "doe@doe-university.de",
"website": "https://john-doe.org/team/john-doe/",
"address": "Doe-Street 10<br/>11111 Doe"
},
"image": "FILE::https://upload.wikimedia.org/wikipedia/commons/9/99/Sample_User_Icon.png"
},
]
},
"contact": {
"name": "John Doe Lab",
"email": "doe@doe-university.de",
"website": "https://www.john-doe-lab.org/contact",
"address": "Doe-Street 10<br/>11111 Doe"
},
"website": "https://www.john-doe-lab.org",
"logo": "FILE::https://upload.wikimedia.org/wikipedia/commons/9/99/Sample_User_Icon.png",
"name": "John Doe Lab",
"primaryColor": "#00618f",
"secondaryColor": "rgba(35, 35, 35, 1)"
},
"settings": {
"rag": {
"model": "ChatGPT",
"apiKey": ""
}
}
}
Property | Description |
---|---|
name | Name of your project or your lab, shown on the front page of the web portal. |
version | Your personal version counts. |
description | A description shown on the front page of the portal. Use it to describe your UCE instance. |
Property | Description |
---|---|
team | Outline and display your team in a dedicated Teams-Tab within your UCE instance. |
team.description | Describe the team working on this project. |
team.members | Create a list of member-objects to model your team and each member. |
contact | The contact information is shown in the footer of the webportal. Deposit contact information such as name, website and email for others to contact you through the UCE instance. |
website | The website of your lab or corporation. |
logo | The logo is shown in the top left of the web portal. You can inject the logo via a file path FILE::{PATH} (works with online paths as well) or directly through Base64-encoded images BASE64::data:image/png;base64,{BASE64} . |
name | The name of your lab or corporation. |
primaryColor | Set the primary color for the UCE web portal and model your color scheme. |
secondaryColor | Set the secondary color for the UCE web portal and model your color scheme. |
Property | Description |
---|---|
rag | Set the settings for the RAGbot (if applicable). |
rag.model | The language model that UCE is supposed to power the RAGBot with. Currently, out of the box, only ChatGPT is applicable. |
apiKey | The API key, if the RAGBot utilizes an LLM that is not hosted locally. In case of ChatGPT, for example, fill in your OpenAI API key. |
Within the source code, you also find a defaultUceConfig.json
that you can mirror. This is also the configuration UCE uses if no explicit config is provided. Inject the uceConfig.json
into the UCE web portal by means of command line arguments, as outlined in earlier sections.
Corpus Configuration
(CORPUS)
As the name suggests, the corpusConfig.json
holds metadata about a single corpus within UCE. Unlike the uceConfig.json
, the corpus config is obligatory and needs to be imported by the Corpus-Importer.
You can copy the example corpusConfig.json
below and create your own configuration from it.
corpusConfig.json
{
"name": "Corpus_Name",
"author": "University Doe",
"language": "de-DE/en-EN/...",
"description": "The corpus was gathered as part of the John Doe project.",
"addToExistingCorpus": true,
"annotations": {
"annotatorMetadata": false,
"OCRPage": false,
"OCRParagraph": false,
"OCRBlock": false,
"OCRLine": false,
"srLink": false,
"lemma": false,
"namedEntity": false,
"sentence": false,
"taxon": {
"annotated": false,
"//comment": "[Are the taxons annotated with biofid onthologies through the 'identifier' property?]",
"biofidOnthologyAnnotated": false
},
"time": false,
"wikipediaLink": false
},
"other": {
"//comment": "[Is this corpus also available on https://sammlungen.ub.uni-frankfurt.de/? Either true or false]",
"availableOnFrankfurtUniversityCollection": false,
"includeKeywordDistribution": false,
"enableEmbeddings": false,
"enableRAGBot": false
}
}
Property | Description |
---|---|
name | The name assigned to the corpus. |
author | The entity or institution that created the corpus. |
language | Languages included in the corpus, specified in locale format (e.g., "de-DE", "en-EN"). |
description | A brief overview of the corpus and its purpose. |
addToExistingCorpus | Boolean flag indicating whether to append this data to an existing corpus *(looks by name)*, or a new corpus should be created. |
annotations | Object outlining how the corpus was annotated. |
annotations.annotatorMetadata | Boolean flag indicating if metadata about the annotator is included. |
annotations.OCRPage | Boolean flag indicating if OCR data at the page level is included. |
annotations.OCRParagraph | Boolean flag indicating if OCR data at the paragraph level is included. |
annotations.OCRBlock | Boolean flag indicating if OCR data at the block level is included. |
annotations.OCRLine | Boolean flag indicating if OCR data at the line level is included. |
annotations.srLink | Boolean flag indicating if semantic role links are annotated. |
annotations.lemma | Boolean flag indicating if lemmatization is performed. |
annotations.namedEntity | Boolean flag indicating if named entities are annotated. |
annotations.sentence | Boolean flag indicating if sentence boundaries are annotated. |
annotations.taxon | Object containing details about taxon annotations. |
annotations.taxon.annotated | Boolean flag indicating if taxons are annotated. |
annotations.taxon.biofidOnthologyAnnotated | Boolean flag indicating if taxons are annotated with biofid ontologies through the 'identifier' property. |
annotations.time | Boolean flag indicating if temporal expressions are annotated. |
annotations.wikipediaLink | Boolean flag indicating if Wikipedia links are included. |
other | Object containing additional properties related to the corpus. The following flags require the setup of the RAG-Service. |
other.includeKeywordDistribution | Boolean flag indicating if keyword distribution data is included. If enabled, the Corpus-Importer will create and cache those upon import. |
other.enableEmbeddings | Boolean flag indicating if embeddings should be enabled. If enabled, the Corpus-Importer will create and cache those upon import. |
other.enableRAGBot | Boolean flag indicating if the RAGBot feature should be enabled. |
Common Configuration
(DEVELOPER)
In the source code's uce.common
module, you'll find a common.conf
file. In it, you can adjust and edit any configurations needed to run the application, such as DB connection strings, API endpoints, and the like. To properly run UCE in a development setting, you need to ensure that all the local connection strings match your setup. For that, the most relevant ones are:
Property | Description |
---|---|
rag.webserver.base.url | The base url to the RAG-service's webserver (if setup), e.g.: http://localhost:5678/. |
sparql.host | The base url to the Sparql-service's webserver (if setup), e.g.: http://localhost:3030/ |
sparql.endpoint | The endpoint of the Sparql-service's webserver, e.g.: my-ontology/sparql |
postgresql.hibernate.connection.url | The connection string to the Postgresql-DB-service, e.g.: jdbc:postgresql://localhost:5433/uce |
You'll also find two more files, called common-release.conf
and common-debug.conf
. Since, for the release, most of the connections differ from the local setup, you can store your local/release config in separate files and copy-paste the needed configuration into common.conf
depending on the case. Only the common.conf
file is used within UCE — the other two are ignored.