Finding relevant variables in PAC model with membership queries

A new research frontier in AI and data mining seeks to develop methods to automatically discover relevant variables among many irrelevant ones. In this paper, we present four algorithms that output such crucial variables in PAC model with membership queries. The first algorithm executes the task und...

Descripción completa

Detalles Bibliográficos
Autores: Guijarro Guillem, David, Tarui, Jun, Tsukiji, Tatsuie
Tipo de recurso: informe técnico
Fecha de publicación:1999
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/93089
Acceso en línea:https://hdl.handle.net/2117/93089
Access Level:acceso abierto
Palabra clave:AI
Data mining
Artificial intelligence
Membership queries
Unknown distribution
Arbitrary distribution
Uniform distribution
Àrees temàtiques de la UPC::Informàtica::Informàtica teòrica
Descripción
Sumario:A new research frontier in AI and data mining seeks to develop methods to automatically discover relevant variables among many irrelevant ones. In this paper, we present four algorithms that output such crucial variables in PAC model with membership queries. The first algorithm executes the task under any unknown distribution by measuring the distance between virtual and real targets. The second algorithm exhausts virtual version space under an arbitrary distribution. The third algorithm exhausts universal set under the uniform distribution. The fourth algorithm measures influence of variables under the uniform distribution. Knowing the number $r$ of relevant variables, the first algorithm runs in almost linear time for $r$. The second and the third ones use less membership queries than the first one, but run in time exponential for $r$. The fourth one enumerates highly influential variables in quadratic time for $r$.