Measuring Language Distance of Isolated European Languages

Phylogenetics is a sub-field of historical linguistics whose aim is to classify a group of languages by considering their distances within a rooted tree that stands for their historical evolution. A few European languages do not belong to the Indo-European family or are otherwise isolated in the Eur...

Descripción completa

Detalles Bibliográficos
Autores: Gamallo, Pablo, Pichel Campos, José Ramom, Alegría Loinaz, Iñaki
Tipo de recurso: artículo
Fecha de publicación:2020
País:España
Institución:Universidad del País Vasco
Repositorio:Addi. Archivo Digital para la Docencia y la Investigación
OAI Identifier:oai:addi.ehu.eus:10810/42972
Acceso en línea:http://hdl.handle.net/10810/42972
Access Level:acceso abierto
Palabra clave:language distance
phylogenetics
perplexity
clustering
kullback leibler divergence
Descripción
Sumario:Phylogenetics is a sub-field of historical linguistics whose aim is to classify a group of languages by considering their distances within a rooted tree that stands for their historical evolution. A few European languages do not belong to the Indo-European family or are otherwise isolated in the European rooted tree. Although it is not possible to establish phylogenetic links using basic strategies, it is possible to calculate the distances between these isolated languages and the rest using simple corpus-based techniques and natural language processing methods. The objective of this article is to select some isolated languages and measure the distance between them and from the other European languages, so as to shed light on the linguistic distances and proximities of these controversial languages without considering phylogenetic issues. The experiments were carried out with 40 European languages including six languages that are isolated in their corresponding families: Albanian, Armenian, Basque, Georgian, Greek, and Hungarian.