Ayuda
Ir al contenido

Dialnet


Resumen de A grid-hypergraph load balancing approach for agent based applications in HPC systems /

Claudio Daniel Márquez Pérez

  • Nowadays, there are a large amount of scientific and engineering problems that can be studied and solved thanks to the High Performance Computing (HPC) systems. The HPC environments solve computing problems when these become more complex and the amount of required computing power increases. Within HPC applications there is a particular case called Agent-Based Modelling and Simulation (ABMS), which allows analysing the emergent properties of a system from simulating the interactions amongst autonomous entities called agents. Agent interactions and behaviour are defined according to their procedural rules, characteristic parameters and the evolution of the whole population and the simulated environment.

    With the emergence of ABMS platforms intended HPC environments, real systems can be more accurately modelled, analysed and simulated through including larger number of agents and complex rules for representing more system details, parameters and interactions. This leads to very complex agent-based (AB) models, resulting in a high computational burden with excessive usage of computational resources. In this sense, simulating complex and realistic AB models is only feasible in a reasonable time if the simulation is executed in parallel on a HPC environment.

    In terms of application programming structure, due to its scalability and simplicity, single program multiple data (SPMD) is the dominant application structure and consists of executing the same program in all processing elements (PEs), but on a different subset of the domain. However, in complex and large SPMD AMBS applications, improper data partition policies and certain dynamic characteristics of creation and elimination of agents introduce uneven cpu computing and network communication overhead that delays the simulation and may propagate across all PEs. Here, the initial decomposition cannot offer an efficient solution and the computing/communication workload needs to be dynamically treated at. At this point, an efficient dynamic solution to readjust the workload is incredibly beneficial.

    This thesis proposes a methodology that enables dynamic performance enhancements for SPMD ABMS applications and spatially-explicit AB models. The methodology introduces a tuning strategy which dynamically minimises the gaps of the computing and communication workloads between PEs. As a result, the application will be able to process a large number of agents with complex rules as fast and efficiently as possible. The strategy adjusts the global simulation workload migrating groups of agents among the PEs according to their computation workload and their interconnectivity modelled using a hypergraph. A hypergraph is a graph generalisation that, in this case, allows more accurately modelling agent system interactions. This hypergraph is lastly partitioned using a parallel partitioning algorithm to decide a proper workload distribution.

    This approach defines two phases: monitoring and tuning. In monitoring phase, the application workload is measured at runtime in order to identify and evaluate performance problems according to a permissive imbalance value, and when necessary applying the load balancing strategy in tuning phase. The load balancing contains three main components: agent system representation (ASR), tuning decisions and agent migration. The ASR, through a clustering algorithm, represents groups of agents as grids and models the grid’s workload as a global hypergraph which is partitioned in order to determine a more balanced grid distribution. From this distribution, groups of agents are migrated to adjusting the global workload. Additionally, we provide an efficient agent communication mechanism, based on the ASR, to determine the recipient PEs.

    We have evaluated our strategy using a real HPC ABMS platform, so-called Flexible Large Scale Agent Modelling Environment (Flame), simulating three real AB models Susceptible-Infected-Remove, Colorectal Tumour Growth and Keratinocyte Colony Formation. Evaluating different aspects of our methodology, as well as an integral whole, we have obtained significant performance gains and hence an important reduction of the total execution times.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus