Evaluating the impact of task aggregation in workflows with shared resources environments
We study the relative impact of task aggregation, or wrapping, which is a technique meant for computational workflows that bundles jobs into a single submission to be sent to remote schedulers. Experiments inside the Earth Science community can be lengthy and compriseseveral steps with many dependen...
| Autor: | |
|---|---|
| Tipo de recurso: | tesis de maestría |
| Fecha de publicación: | 2023 |
| País: | España |
| Institución: | Universitat Politècnica de Catalunya (UPC) |
| Repositorio: | UPCommons. Portal del coneixement obert de la UPC |
| Idioma: | inglés |
| OAI Identifier: | oai:upcommons.upc.edu:2117/404041 |
| Acceso en línea: | https://hdl.handle.net/2117/404041 |
| Access Level: | acceso abierto |
| Palabra clave: | Workflow -- Management Earth sciences computació d'altes prestacions ciències de la terra fluxes de treball ambients de recursos compartits agregació de tasques batch scheduling sistema de cues computació científica high performance computing earth sciences workflows shared resources environments task aggregation queuing systems scientific computing Cicle de treball -- Gestió Ciències de la terra Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors |
| Sumario: | We study the relative impact of task aggregation, or wrapping, which is a technique meant for computational workflows that bundles jobs into a single submission to be sent to remote schedulers. Experiments inside the Earth Science community can be lengthy and compriseseveral steps with many dependencies. The community has traditionally focused in increasing the performance of the models, but the overall execution of the workflow, including the queue time, has received little interest. Aiming to reduce the time spent in queue, the developers of Autosubmit, a workflow manager developed for climate simulations, weather forecast simulations, and air quality simulations, came up with task aggregating, or wrapping. Our objective is to assess if this technique does indeed reduce the total queue time of the workflow. The complex interplay between the dynamic nature of the usage of the machine and the scheduler policy plays a central role in our analysis, which poses the main challenge of this work. Hence, we do an intricate study of the scheduling policy of the popular Slurm Workload Manager and a statistical characterization of the usage of both simulated machines: LUMI and cea-curie. With that, we perform a twofold experimentation: a simulation using dynamic workloads - where job arrival time plays a role - with a workflow composed of multiple jobs and a static workload - where all jobs in the workload are submitted at the same time - varying job and user factors that play a role into the scheduling. Results show that aggregation is beneficial in the majority of cases for the workflows that are vertically organized - that is, a chain of submissions where each job is dependent on the previous -, whilst for the horizontal arranged workflows - where jobs do not have dependencies - it might undermine the queue time depending on the user's past usage and the machine's current state. |
|---|