Informática Aplicada a la Traducción Building and Using Translation Memories 4. Building a Parallel Corpus 4. Clause combining 2 4.1 What is a Parallel Corpus? • A “Parallel Corpus” consists of a set of sentences (or other segments of text) in one language, each linked to the translation of that sentence into another language. 1 4. Building a Parallel Corpus 4. Clause combining 2 4.1 What is a Parallel Corpus? • In the context of a Translation Memory system, such parallel corpora are called “Translation Memories”. • In this class and the next, we will explore: – Various tools for converting two texts into a parallel corpus. – Various uses for parallel corpora apart from TMs. 4. Building a Parallel Corpus 4. Clause combining 2 4.2 Software for Building a parallel corpus • There are various tools available for converting two texts into a parallel corpus. 1. Manual editing in a spreadsheet (e.g., Microsoft Excel) 2. Commercial Sentence Alignment systems a) WinAlign (part of Trados, thus expensive, but good) http://blog.quillslanguage.com/2008/11/trados-winalign/ b) DejaVu contains a sentence aligner 3. Free/Open Source systems a) Microsoft Bilingual Sentence Aligner b) LF Aligner http://sourceforge.net/projects/aligner/ c) Bitext2tmx http://bitext2tmx.sourceforge.net d) More at: http://www.cse.unt.edu/~rada/wa/#softwareSA 2 4. Building a Parallel Corpus 4. Clause combining 2 4.2 Software for Building a parallel corpus WinAlign: 4. Building a Parallel Corpus 4. Clause combining 2 4.2 Software for Building a parallel corpus DejaVu: 3 4. Building a Parallel Corpus 4. Clause combining 2 4.2 Software for Building a parallel corpus Automatic Sentence Alignment • Given two texts, the system works out which segment of text 1 corresponds to which sentence of text 2: ? When iPhone is locked, nothing happens if you touch the screen. ? Bloquear el iPhone: Pulse el botón de encendido/apagado. Cuando el iPhone está bloqueado, no ocurre nada si toca la pantalla. ? El iPhone puede seguir recibiendo llamadas, mensajes de texto y otras actualizaciones. 4. Building a Parallel Corpus 4. Clause combining 2 4.2 Software for Building a parallel corpus Automatic Sentence Alignment Basically, find the sentence close to the source sentence which contains the most words which translate the words of the source sentence (using a translation dictionary) When iPhone is Locked nothing happens if you touch the screen. Bloquear el iPhone Pulse el botón de encendido apagado 3 out of 11 English words present in Spanish sentence 3 out of 9 Spanish words present in English sentence Thus, weak match (30%) 4 4. Building a Parallel Corpus 4. Clause combining 2 4.2 Software for Building a parallel corpus Automatic Sentence Alignment Basically, find the sentence close to the source sentence which contains the most words which translate the words of the source sentence (using a translation dictionary) When iPhone is Locked nothing happens if you touch the screen. Cuando el iPhone está bloqueado, no ocurre nada si toca la pantalla. 10 out of 11 English words present in Spanish sentence 10 out of 12 Spanish words present in English sentence Thus, strong match (87%) Aligning a Parallel Corpus 4. Clause combining 2 4.3 Various uses for parallel corpora apart from TMs • • Aligned translation corpora are the input to Translation Memory systems However, they can also be used in other ways by the translator – – In place of a terminological dictionary: look up a term in one language to see how it is translated into the other language, also showing the context of use of the term (i.e., how it is used in a sentence as a whole). As a source of word frequency data: One can ask corpus management software to tell you which words are “key” to this corpus (most important to this kind of text). These are terms that you should probably put in your translation lexicon. 5