Non-subjective evaluation of the efficacy for similarity functions

Within the context of the Approximate Personal Name Matching problem, we present some metrics to evaluate and compare the efficacy of similarity functions (or distances). Usually the evaluation is done by asking the opinion, relevance judgements, of people, in order to determine for each pair of nam...

Descripción completa

Detalles Bibliográficos
Autor: Camps Pare, Rafael
Tipo de recurso: informe técnico
Fecha de publicación:2002
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:español
OAI Identifier:oai:upcommons.upc.edu:2117/371197
Acceso en línea:https://hdl.handle.net/2117/371197
Access Level:acceso abierto
Palabra clave:Computer programming
Programació (Ordinadors)
Àrees temàtiques de la UPC::Informàtica
Descripción
Sumario:Within the context of the Approximate Personal Name Matching problem, we present some metrics to evaluate and compare the efficacy of similarity functions (or distances). Usually the evaluation is done by asking the opinion, relevance judgements, of people, in order to determine for each pair of names if it is a pair-with-error or not. That approach for evaluation is typical of the IR field, but their results are mainly subjective and often contradictory. Here we present a more objective method, consisting in measuring the contrast between a real file of pairs-with-error and a file with pairs-without-error. The metrics are mainly the same that are traditionally used in the IR field, but without human relevance judgements. The percentage of existing names that are actually similar to the pattern beign searched, is a very important factor to be included in the evaluation of the efficacy.