Non-subjective evaluation of the efficacy for similarity functions
Within the context of the Approximate Personal Name Matching problem, we present some metrics to evaluate and compare the efficacy of similarity functions (or distances). Usually the evaluation is done by asking the opinion, relevance judgements, of people, in order to determine for each pair of nam...
| Autor: | |
|---|---|
| Tipo de recurso: | informe técnico |
| Fecha de publicación: | 2002 |
| País: | España |
| Institución: | Universitat Politècnica de Catalunya (UPC) |
| Repositorio: | UPCommons. Portal del coneixement obert de la UPC |
| Idioma: | español |
| OAI Identifier: | oai:upcommons.upc.edu:2117/371197 |
| Acceso en línea: | https://hdl.handle.net/2117/371197 |
| Access Level: | acceso abierto |
| Palabra clave: | Computer programming Programació (Ordinadors) Àrees temàtiques de la UPC::Informàtica |
| Sumario: | Within the context of the Approximate Personal Name Matching problem, we present some metrics to evaluate and compare the efficacy of similarity functions (or distances). Usually the evaluation is done by asking the opinion, relevance judgements, of people, in order to determine for each pair of names if it is a pair-with-error or not. That approach for evaluation is typical of the IR field, but their results are mainly subjective and often contradictory. Here we present a more objective method, consisting in measuring the contrast between a real file of pairs-with-error and a file with pairs-without-error. The metrics are mainly the same that are traditionally used in the IR field, but without human relevance judgements. The percentage of existing names that are actually similar to the pattern beign searched, is a very important factor to be included in the evaluation of the efficacy. |
|---|