Discovering bilingual collocations in parallel corpora: a first attempt at using distributional semantics

This chapter presents a method that exploits parallel corpora to automatically extract bilingual collocation equivalents. First, we use dependency parsing and statistical measures to identify collocation candidates in corpora. Then, we leverage the parallel corpora to extract bilingual word-embeddin...

Descripción completa

Detalles Bibliográficos
Autores: García González, Marcos, García Salido, Marcos, Alonso Ramos, Margarita
Tipo de recurso: capítulo de libro
Fecha de publicación:2019
País:España
Institución:Universidad de Santiago de Compostela (USC)
Repositorio:Minerva. Repositorio Institucional de la Universidad de Santiago de Compostela
Idioma:inglés
OAI Identifier:oai:minerva.usc.gal:10347/45906
Acceso en línea:https://hdl.handle.net/10347/45906
Access Level:acceso abierto
Palabra clave:Learning
Conventionalized lexical combinations
Collocations
Bilingual collocation equivalents
Descripción
Sumario:This chapter presents a method that exploits parallel corpora to automatically extract bilingual collocation equivalents. First, we use dependency parsing and statistical measures to identify collocation candidates in corpora. Then, we leverage the parallel corpora to extract bilingual word-embeddings. Finally, we use these distributional models as probabilistic dictionaries in order to identify bilingual collocation equivalents. To evaluate our strategy we carry out a set of experiments in Portuguese and Spanish focusing on verb-object collocations, for example, “reach the maturity” (“atingir a maturidade” in Portuguese, “alcanzar la madurez” in Spanish). The results of our experiments show that this method is useful to automatically identify thousands of bilingual collocation equivalents, achieving a precision of 86%