ARCHER’s first evaluation campaign starts next Monday, September 15th. Participants will then have until September 26th to submit their hypothesis. Four different tasks—automatic speech recognition (ASR), machine translation (MT), named entity recognition (NER) and optical character recognition (OCR)—are involved in this campaign, which will focus on the French and English languages.
The ASR task consists of transcribing human speech into written text, combining diarization and transcription. In addition, each segment of speech is tagged with a speaker label.
MT aims to translate human-written text from one language (source) to another (target). At this campaign, participants will have to translate documents from English to French.
The NER task consists of identifying predefined categories of named entities, i.e. segments of text referring to concrete entities. Examples of named entities include people, locations (real or fictional), dates, or amounts.
OCR is the task of transcribing typed, handwritten, or printed text into machine-encoded text. The input is a set of images scanned from single or multi-page documents, and the expected output is a per-page text in reading order.
This first campaign is a dry run designed to allow participants to familiarize themselves with the data formats and evaluation tools.