SkyMap: a generative graph model for GNN benchmarking

Graph Neural Networks (GNNs) have gained considerable attention in recent years. Despite the surge in innovative GNN architecture designs, research heavily relies on the same 5-10 benchmark datasets for validation. To address this limitation, several generative graph models like ALBTER or GenCAT hav...

Descripción completa

Detalles Bibliográficos
Autores: Wassington, Axel, Abadal Cavallé, Sergi|||0000-0003-0941-0260, Higueras, Raúl
Tipo de recurso: artículo
Fecha de publicación:2024
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/419897
Acceso en línea:https://hdl.handle.net/2117/419897
https://dx.doi.org/10.3389/frai.2024.1427534
Access Level:acceso abierto
Palabra clave:Graph Neural Network (GNN)
Machine learning datasets
Graph generation model
Mixing matrix
Degree distribution
Benchmark
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic
Descripción
Sumario:Graph Neural Networks (GNNs) have gained considerable attention in recent years. Despite the surge in innovative GNN architecture designs, research heavily relies on the same 5-10 benchmark datasets for validation. To address this limitation, several generative graph models like ALBTER or GenCAT have emerged, aiming to fix this problem with synthetic graph datasets. However, these models often struggle to mirror the GNN performance of the original graphs. In this work, we present SkyMap, a generative model for labeled attributed graphs with a fine-grained control over graph topology and feature distribution parameters. We show that our model is able to consistently replicate the learnability of graphs on graph convolutional, attention, and isomorphism networks better (64% lower Wasserstein distance) than ALBTER and GenCAT. Further, we prove that by randomly sampling the input parameters of SkyMap, graph dataset constellations can be created that cover a large parametric space, hence making a significant stride in crafting synthetic datasets tailored for GNN evaluation and benchmarking, as we illustrate through a performance comparison between a GNN and a multilayer perceptron.