Next-generation LLM inference: scalable, multimodal, and composable

Antoñanzas Acero, Jesús Maria

Next-generation LLM inference: scalable, multimodal, and composable

Global interest in LLM research and development has sparked a race in both algorithmic and systems developments. The pre-training paradigm, brought forth by the scaling revolution, seems to be reaching the point of diminishing returns. As such, inference is becoming increasingly important, enabling...

Descripción completa

Detalles Bibliográficos
Autor:	Antoñanzas Acero, Jesús Maria
Tipo de recurso:	tesis de maestría
Fecha de publicación:	2025
País:	España
Institución:	Universitat Politècnica de Catalunya (UPC)
Repositorio:	UPCommons. Portal del coneixement obert de la UPC
Idioma:	inglés
OAI Identifier:	oai:upcommons.upc.edu:2117/448586
Acceso en línea:	https://hdl.handle.net/2117/448586
Access Level:	acceso embargado
Palabra clave:	Machine learning Sistemes d'IA Inferència amb LLMs Intel·ligència artificial Aprenentatge automàtic Systems for AI LLM inference Artificial intelligence Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic

Descripción
Sumario:	Global interest in LLM research and development has sparked a race in both algorithmic and systems developments. The pre-training paradigm, brought forth by the scaling revolution, seems to be reaching the point of diminishing returns. As such, inference is becoming increasingly important, enabling the next generation of models: multi-modal reasoners with massive context lengths. In this work, we present a production-grade LLM inference system created for the coming generation of AI applications. Designed and built from the ground-up, it's uncompromisingly efficient and scalable, supporting extreme context lengths and dynamically allocating resources across 100s of GPUs. Featuring elegant abstractions, our system is robust yet simple and advances the state-of-the-art in inference systems with novel engineering solutions.

Next-generation LLM inference: scalable, multimodal, and composable

Similares en LA Referencia