Skip to content

Commit 2ccb253

Browse files
authored
Merge pull request #789 from marcwitasee/source
Project Updates 2025/26 - E2CLab
2 parents af00689 + 7310d53 commit 2ccb253

3 files changed

Lines changed: 36 additions & 2 deletions

File tree

_bibliography/external/e2clab_project.bib

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,4 +168,17 @@ @online{Engage2024
168168
year = {2024},
169169
}
170170

171+
@online{ContinuumRI2025,
172+
addendum = {(accessed: 02.16.2026)},
173+
title = {{ContinuumRI Workshop 2025}},
174+
url = {https://sites.google.com/view/continuumri2025},
175+
year = {2025},
176+
}
177+
178+
@online{ContinuumRIPhotos2025,
179+
addendum = {(accessed: 02.16.2026)},
180+
title = {{ContinuumRI Workshop 2025 - Photos}},
181+
url = {https://photos.google.com/share/AF1QipMVDV6AStZ1iFAJYIAHOYhinylXhXQyZvPyGiMIbH7LzFzK8zvAqVeHb_hs6YyT-g?key=NHl1dnFVWm1fSzE3MmxjVURRZmVtQUQ2UFg5UEtR},
182+
year = {2025},
183+
}
171184

_bibliography/jlesc.bib

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1877,4 +1877,15 @@ @inproceedings{cappello2025support
18771877
booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
18781878
pages={1966--1979},
18791879
year={2025}
1880+
}
1881+
1882+
@misc{keahey_2025_15306610,
1883+
author = {Keahey, Kate and Richardson, Marc and Tolosana Calasanz, Rafael and Hunold,
1884+
Sascha and Lofstead, Jay and Malik, Tanu and Perez, Christian},
1885+
doi = {10.5281/zenodo.15306610},
1886+
month = {apr},
1887+
publisher = {Zenodo},
1888+
title = {{On Challenges of Practical Reproducibility for Systems and HPC Computer Science}},
1889+
url = {https://doi.org/10.5281/zenodo.15306610},
1890+
year = {2025},
18801891
}

collections/_projects/e2clab_project.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
layout: post
33
title: Advancing Chameleon and Grid'5000 testbeds II
44
date: 2022-07-15
5-
updated: 2025-01-31
5+
updated: 2026-02-16
66
navbar: Research
77
subnavbar: Projects
88
project_url:
@@ -176,7 +176,17 @@ This research work started during the summer internship of Cédric Prigent (INRI
176176

177177
We started this project with the objective of deploying and studying the behavior of Federated Learning (FL) systems on real-world edge devices. For this purpose, our target was to deploy air-quality stations across the UChicago campus. The first step was to effectively design such air-quality stations. We built a set of prototypes based on Raspberry Pis and air-quality sensors (monitoring particulate matter concentrations, temperature, humidity and air pressure), and developed a set of micro-services to collect air quality measurements, pre-process air-quality data, and perform FL local training tasks. Then, with the help of technical staff from UChicago, we identified suitable locations and deployed our devices in 8 fresh air intakes from the university buildings.
178178

179-
After deployments of our air-quality stations, our objective was to study the behaviour of FL systems at the Edge. More specifically, we were interested in investigating the differences between simulation, emulation and real-world deployments with regard to performance and reproducibility of experiments. We deployed the same training tasks in different settings i.e., real-world air-quality stations, emulation on distributed testbeds (Grid’5000 and Chameleon), and simulation on a single compute node, and drew several conclusions regarding the type of infrastructure that could be used to validate different experimental aspects, as well as the advantages and shortcomings of such real-world deployments. Simulation provides good results for reproducing model based performance metrics (e.g., model convergence, accuracy), whereas emulation on distributed testbeds can more accurately reproduce system related metrics (e.g., execution time, CPU and memory usage). The results of this work were put in a paper submitted to the IEEE CCGrid’2025 conference.
179+
After deployments of our air-quality stations, our objective was to study the behaviour of FL systems at the Edge. More specifically, we were interested in investigating the differences between simulation, emulation and real-world deployments with regard to performance and reproducibility of experiments. We deployed the same training tasks in different settings i.e., real-world air-quality stations, emulation on distributed testbeds (Grid'5000 and Chameleon), and simulation on a single compute node, and drew several conclusions regarding the type of infrastructure that could be used to validate different experimental aspects, as well as the advantages and shortcomings of such real-world deployments. Simulation provides good results for reproducing model based performance metrics (e.g., model convergence, accuracy), whereas emulation on distributed testbeds can more accurately reproduce system related metrics (e.g., execution time, CPU and memory usage). The results of this work were put in a paper submitted to the IEEE CCGrid'2025 conference.
180+
181+
## Results for 2025/2026
182+
183+
In 2025, the collaboration between Inria and ANL continued to deepen across multiple fronts, focusing on edge-to-cloud infrastructure, practical reproducibility, and community building.
184+
185+
To advance the collaboration on edge-to-cloud infrastructure and build community around the computing continuum, Christian Perez, Gabriel Antoniu, and Kate Keahey organized the ContinuumRI workshop on May 19th, 2025, co-located with CCGRID2025. The workshop brought together around 40 attendees from Europe and the USA and fostered lively discussion on challenges and solutions facing this research community {% cite ContinuumRI2025 --file external/e2clab_project.bib %}. Workshop pictures can be seen online {% cite ContinuumRIPhotos2025 --file external/e2clab_project.bib %}. Given the success of the first workshop, the organizers will follow up with a 2026 edition co-located with CCGRID2026, for which the call for papers is currently out.
186+
187+
On the reproducibility front, Kate Keahey and Christian Perez co-organized a workshop on practical reproducibility in systems science and co-authored a report resulting from the workshop {% cite keahey_2025_15306610 --file jlesc.bib %}. The report characterizes reproducibility challenges and potential solutions, including suggestions on how to use AI to advance the practice of reproducibility. The report is publicly available at reproduciblehpc.org.
188+
189+
As a follow-up activity to Cédric Prigent's work reported last year, interoperability tests were realized at Inria in the KerData team in the context of the STEEL project of the PEPR Cloud program. These tests enabled the use of E2Clab for the deployment of federated learning workflows across a hybrid infrastructure combining Grid'5000 and the Scaleway commercial cloud. Similar experiments are planned with another cloud provider (OVHcloud), further demonstrating the versatility and interoperability of the E2Clab platform across diverse cloud environments.
180190

181191
## Visits and meetings
182192

0 commit comments

Comments
 (0)