Skip to content

Commit 2c3e056

Browse files
committed
Add comprehensive CI documentation for research software
1 parent d5c3dd2 commit 2c3e056

2 files changed

Lines changed: 636 additions & 0 deletions

File tree

Lines changed: 309 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,309 @@
1+
---
2+
title: "Managing complex CI testing matrices for research software"
3+
search_exclude: true
4+
description: "Efficiently test research software across multiple compilers, libraries, architectures, and platforms while managing CI resource constraints and maintaining fast feedback loops"
5+
contributors: []
6+
page_id: complex_ci_testing_matrix
7+
related_pages:
8+
your_tasks: [ci_cd, task_automation_github_actions, task_automation_gitlab_ci_cd]
9+
training:
10+
- name: "Pairwise Testing with allpairspy"
11+
registry: "PyPI"
12+
url: "https://pypi.org/project/allpairspy/"
13+
- name: "GitLab Dynamic Child Pipelines"
14+
registry: "GitLab"
15+
url: "https://docs.gitlab.com/ee/ci/pipelines/downstream_pipelines.html#dynamic-child-pipelines"
16+
- name: "Docker Multi-stage Builds"
17+
registry: "Docker"
18+
url: "https://docs.docker.com/build/building/multi-stage/"
19+
---
20+
21+
## How can I efficiently test my research software across multiple compiler versions, library dependencies, and target platforms?
22+
23+
### Description
24+
25+
Research software, particularly performance-portable libraries and simulation codes, often needs to support extensive combinations of compilers, library versions, target architectures, and runtime environments. For example, accelerator abstraction libraries like [Alpaka](https://github.com/alpaka-group/alpaka) require testing across multiple GCC versions, Clang versions, CUDA SDK versions, CMake versions, and Boost versions. A naive approach testing all combinations can create thousands of test jobs, making CI pipelines impractically long and resource-intensive.
26+
27+
Consider this real-world example from accelerator development:
28+
29+
- 4 GCC compiler versions
30+
- 6 Clang compiler versions
31+
- 10 CUDA SDK versions
32+
- 4 CMake versions
33+
- 7 Boost library versions
34+
35+
This results in **2,800 potential combinations**, requiring approximately **~9.3 hours** of compute time at 6 minutes per job, even with 30 parallel runners.
36+
37+
### Considerations
38+
39+
- Combinatorial explosion: The number of possible combinations grows exponentially with each additional parameter (compilers × architectures × libraries × versions)
40+
- Resource constraints: CI runners have limited capacity, and excessive parallelization can monopolize shared infrastructure across multiple projects
41+
- Time constraints: Full test matrices can take many hours to complete, creating bottlenecks in development workflows
42+
- Hardware diversity: Different combinations may require specific hardware (NVIDIA GPUs, AMD GPUs, ARM processors, PowerPC architectures)
43+
- Invalid combinations: Some parameter combinations may be incompatible (e.g., older CUDA versions with newer GCC compilers)
44+
- Coverage vs efficiency: Need adequate test coverage without redundant or meaningless test combinations
45+
- Maintenance overhead: Large test matrices become difficult to update and debug when new versions are released
46+
47+
### Solutions
48+
49+
#### Pairwise Testing Implementation
50+
51+
- Use dynamic child pipelines: Dynamic child pipelines are essential for implementing pairwise testing as they allow programmatic generation of CI configurations at runtime. Leverage CI systems that support programmatically generated pipeline configurations based on computed test matrices, enabling runtime optimization based on available resources.
52+
53+
- Implement pairwise testing algorithms: Use mathematical approaches that ensure every combination of two parameter values appears in at least one test job, dramatically reducing total tests while maintaining comprehensive interaction coverage.
54+
55+
- Use specialized job generation libraries: Implement dynamic job generators using tools like `allpairspy`:
56+
57+
```python
58+
from allpairspy import AllPairs
59+
60+
parameters = {
61+
"host_compiler": ["gcc-8", "gcc-9", "gcc-10", "gcc-11"],
62+
"device_compiler": ["clang-10", "clang-11", "clang-12", "clang-13", "clang-14", "clang-15"],
63+
"cuda_sdk": ["cuda-10.2", "cuda-11.0", "cuda-11.1", "cuda-11.2", "cuda-11.3",
64+
"cuda-11.4", "cuda-11.5", "cuda-11.6", "cuda-11.7", "cuda-11.8"],
65+
"cmake": ["cmake-3.18", "cmake-3.19", "cmake-3.20", "cmake-3.21"],
66+
"boost": ["boost-1.68", "boost-1.70", "boost-1.72", "boost-1.74",
67+
"boost-1.75", "boost-1.76", "boost-1.78"]
68+
}
69+
70+
keys = list(parameters.keys())
71+
values = [parameters[key] for key in keys]
72+
73+
for i, pairs in enumerate(AllPairs(values)):
74+
config = dict(zip(keys, pairs))
75+
print(f"Job {i+1}: {config}")
76+
77+
# This reduces 2800 combinations to ~60-100 jobs
78+
# Each combination of compiler + CUDA version appears at least once
79+
```
80+
81+
- Develop domain-specific combination rules: Create libraries that encode your project's specific compatibility requirements and testing priorities, such as the [Alpaka](https://github.com/alpaka-group/alpaka) approach.
82+
83+
- Implement exclusion logic: Define rules to automatically exclude known incompatible combinations:
84+
85+
```yaml
86+
# Example exclusion rules for GPU computing
87+
exclusions:
88+
- cuda_version: "11.0"
89+
gcc_version: "gcc-11" # Incompatible combination
90+
- architecture: "ppc64le"
91+
cuda_version: "*" # CUDA not available on PowerPC
92+
```
93+
94+
#### Mathematical Optimization Analysis
95+
96+
| Testing Approach | Total Jobs | Estimated Runtime (30 jobs parallel, 6 mins/job) | Coverage Type |
97+
|-----------------------|------------|---------------------------------------------------|---------------|
98+
| Full matrix (naive) | 2800 | ~9.3 hours | 100% combinations |
99+
| Pairwise testing | ~60–100 | ~20–30 minutes | All 2-way interactions |
100+
| Random sampling | ~200 | ~40 minutes | Statistical coverage |
101+
102+
## How can I optimize CI pipeline performance while maintaining comprehensive testing?
103+
104+
### Description
105+
106+
Even with reduced test matrices, complex research software CI pipelines face performance challenges. Multiple optimization strategies are needed to provide fast developer feedback while maintaining thorough testing coverage across diverse computing environments.
107+
108+
### Considerations
109+
110+
- Build time bottlenecks: Repeatedly compiling large dependency sets (like HPC libraries, scientific computing frameworks, or large C++ template libraries) wastes significant time
111+
- Resource competition: Simultaneous job execution can overwhelm shared CI infrastructure, affecting other projects
112+
- Failure feedback delays: Critical bugs may not be detected until late in pipeline execution
113+
- Development vs production workflows: Full test suites may be unnecessary during iterative development
114+
- Storage and bandwidth: Large scientific computing containers and datasets impact transfer times
115+
- Platform-specific testing: Different hardware platforms may have varying performance characteristics
116+
117+
### Solutions
118+
119+
#### Container Optimization Strategies
120+
121+
- Implement pre-built container strategies: Create and maintain container images with pre-compiled dependencies. Multi-stage builds allow you to separate dependency installation from application code, producing smaller, faster final images by copying only necessary artifacts from build to runtime:
122+
123+
```dockerfile
124+
# Multi-stage build for scientific computing dependencies
125+
FROM nvidia/cuda:11.8-devel as builder
126+
RUN apt-get update && apt-get install -y \
127+
gcc-10 g++-10 clang-12 \
128+
cmake libboost-all-dev \
129+
libomp-dev libfftw3-dev
130+
131+
FROM builder as runtime
132+
COPY --from=builder /usr/local /usr/local
133+
# Application-specific layers added dynamically
134+
```
135+
136+
- Deploy container registry optimization: Host container images in the same data center as CI runners to minimize transfer times and bandwidth costs. Use container registries that support layer caching (reusing unchanged layers between builds) and delta compression (only transferring changed parts of images). Check your registry documentation if it advertises support for these features - most modern registries like GitLab Container Registry, Harbor, and AWS ECR support them.
137+
138+
- Optimize container layer caching: Structure container builds to maximize reuse of intermediate layers and minimize rebuild times. Group frequently changing components in separate layers from stable dependencies. For best practices, see [Docker's layer caching guide](https://docs.docker.com/build/cache/). You can also follow this tutorial for hands-on learning: [Docker Layer Caching Tutorial by Earthly](https://docs.earthly.dev/earthly-0.6/docs/guides/cache).
139+
140+
#### Wave Scheduling Implementation
141+
142+
Running all jobs simultaneously can overwhelm shared infrastructure and delay results. By grouping jobs into sequential stages ("waves"), critical tests can run earlier and free up resources faster. Hence, wave scheduling prevents resource monopolization by running jobs in stages, allowing other projects to use CI infrastructure between waves while providing early feedback on critical tests.
143+
144+
- Use wave scheduling for resource management: Distribute jobs across pipeline stages to periodically release CI resources:
145+
146+
```yaml
147+
# GitLab CI wave scheduling example
148+
stages:
149+
- wave1_critical
150+
- wave2_compatibility
151+
- wave3_performance
152+
- wave4_extended
153+
154+
# Critical tests run first for fast feedback
155+
test_core_functionality:
156+
stage: wave1_critical
157+
script: [run core unit tests]
158+
159+
# Extended testing runs after resources freed
160+
test_gpu_performance:
161+
stage: wave4_extended
162+
script: [run performance benchmarks]
163+
```
164+
165+
- Implement intelligent job prioritization: Order jobs to maximize early failure detection:
166+
- Place strict compiler configurations in early waves
167+
- Run compatibility tests with cutting-edge tool versions first
168+
- Schedule resource-intensive performance tests in later stages
169+
170+
- Visualize wave scheduling structure:
171+
172+
```
173+
Wave 1 ─────▶ Fast compile checks, style, small matrix
174+
175+
Wave 2 ─────▶ Medium-sized combinations, functional tests
176+
177+
Wave 3 ─────▶ Full matrix, slowest GPU/HPC tests
178+
```
179+
180+
This structure helps fail early and frees resources for other users.
181+
182+
#### Development Workflow Optimization
183+
184+
- Enable selective testing during development: Allow developers to run targeted subsets of CI pipeline during development using commit-message-based filtering to avoid running full pipeline for iterative development work. This reduces pipeline load during focused development:
185+
186+
```yaml
187+
# GitLab CI example - Allow developers to run tests based on commit message tags
188+
rules:
189+
- if: '$CI_COMMIT_MESSAGE =~ /\[cuda-only\]/'
190+
variables:
191+
TEST_FILTER: "cuda"
192+
- if: '$CI_COMMIT_MESSAGE =~ /\[cpu-only\]/'
193+
variables:
194+
TEST_FILTER: "cpu"
195+
```
196+
197+
```bash
198+
# Example usage - commit message to run only CUDA tests
199+
git commit -m "Add CUDA kernel optimization [ci:cuda-only]"
200+
```
201+
202+
## How can I manage the infrastructure complexity required for multi-platform research software testing?
203+
204+
### Description
205+
206+
Supporting comprehensive test matrices for research software requires sophisticated CI infrastructure that can handle diverse hardware requirements, manage resources efficiently across multiple projects, and provide reliable service for computationally intensive workloads.
207+
208+
### Considerations
209+
210+
- Hardware diversity requirements: Research software often targets HPC systems, requiring testing on multiple CPU architectures (x86, ARM, PowerPC), GPU vendors (NVIDIA, AMD), and specialized accelerators
211+
- Resource scheduling complexity: Balancing competing demands from multiple research projects while ensuring fair resource allocation
212+
- Performance benchmarking: Validating not just correctness but also performance characteristics across different hardware configurations
213+
- HPC system integration: Connecting CI pipelines with production HPC environments for realistic performance testing
214+
- Cost and sustainability: Managing infrastructure costs while supporting open-source research software development
215+
- Reliability at scale: Maintaining consistent performance as research groups add more complex testing requirements
216+
217+
### Solutions
218+
219+
#### Performance Testing Integration
220+
221+
- Implement performance regression detection: Integrate performance benchmarking into CI pipelines to catch performance regressions early:
222+
223+
```yaml
224+
# Example performance testing job
225+
performance_benchmark:
226+
stage: performance
227+
script:
228+
- cmake --build build --target benchmark
229+
- python benchmark_analysis.py --baseline previous_results.json
230+
- python performance_regression_check.py
231+
artifacts:
232+
reports:
233+
performance: performance_results.json
234+
```
235+
236+
- Configure performance thresholds: Establish automated performance regression detection with configurable thresholds for different hardware configurations and algorithm implementations.
237+
238+
#### Comprehensive Testing Strategy Implementation
239+
240+
- Monitor and profile pipeline performance: Track job duration, resource usage, and failure patterns to continuously optimize the pipeline structure:
241+
242+
| Metric | Target | Monitoring Method |
243+
|--------|---------|------------------|
244+
| Job Duration | <10 minutes average | Pipeline analytics |
245+
| Queue Time | <5 minutes | Runner utilization metrics |
246+
| Failure Rate | <5% for stable configurations | Historical trend analysis |
247+
| Resource Utilization | 70-90% of capacity | Real-time monitoring |
248+
249+
These metrics can be obtained from your CI platform's monitoring dashboard analytics ([Gitlab CI/CD Analytics](https://docs.gitlab.com/ee/user/analytics/ci_cd_analytics.html), GitHub Actions insights) or third-party monitoring tools like Prometheus or Grafana with GitLab Runner exporters, or APIs.
250+
251+
- Use matrix optimization libraries: Leverage existing tools and libraries for combinatorial testing, such as specialized job matrix libraries developed for performance-portable software testing.
252+
253+
## How can I implement this approach for my research software project?
254+
255+
### Description
256+
257+
Transitioning from simple CI testing to comprehensive multi-platform testing matrices requires careful planning, tool selection, and gradual implementation to avoid disrupting existing development workflows.
258+
259+
### Considerations
260+
261+
- Current CI maturity: Existing testing infrastructure and team familiarity with CI/CD concepts
262+
- Project complexity: Size of parameter space and critical compatibility requirements
263+
- Resource availability: Access to diverse hardware platforms and CI infrastructure budgets
264+
- Team expertise: Developer familiarity with containerization, CI configuration, and testing strategies
265+
- Integration requirements: Compatibility with existing development tools and workflows
266+
267+
### Solutions
268+
269+
- Start with parameter identification: Systematically catalog all dimensions that require testing validation:
270+
271+
```python
272+
# Example parameter definition for scientific computing library
273+
testing_parameters = {
274+
'compilers': ['gcc-9', 'gcc-10', 'gcc-11', 'clang-12', 'clang-13', 'clang-14'],
275+
'cuda_versions': ['11.0', '11.2', '11.4', '11.6', '11.8', '12.0'],
276+
'cmake_versions': ['3.18', '3.20', '3.22', '3.24'],
277+
'boost_versions': ['1.72', '1.75', '1.78', '1.80', '1.82'],
278+
'architectures': ['x86_64', 'arm64'],
279+
'build_types': ['Release', 'Debug']
280+
}
281+
```
282+
283+
- Implement gradual migration strategy:
284+
1. Begin with core compatibility testing using pairwise algorithms
285+
2. Add specialized hardware testing incrementally
286+
3. Introduce performance testing for stable configurations
287+
4. Expand to full multi-platform validation
288+
289+
- Use established toolchains: Leverage proven solutions from successful research software projects:
290+
- Job matrix generation: Implement using libraries like `allpairspy` or domain-specific tools
291+
- Container strategies: Base images on established scientific computing containers
292+
- CI integration: Use GitLab dynamic child pipelines or GitHub Actions matrix strategies
293+
294+
- Document testing rationale: Maintain clear documentation explaining testing parameter choices and exclusion rules to facilitate maintenance and onboarding.
295+
296+
- Consider resource sustainability: Even with optimized matrices, extensive testing may be technically possible, but consumes computational resources and energy. Balance testing thoroughness with environmental impact by running full matrices only when necessary (e.g., before releases) and using smaller subsets for regular development work. Consider tradeoffs between coverage and efficiency when designing your matrix and scheduling jobs.
297+
298+
## References
299+
300+
This approach was successfully implemented by the Helmholtz-Zentrum Dresden-Rossendorf for the Alpaka performance-portability library and PIConGPU particle-in-cell simulation code, demonstrating significant reductions in CI resource usage while maintaining comprehensive testing coverage across multiple compilers, accelerator platforms, and HPC architectures.
301+
302+
Further resources:
303+
304+
- [Continuous Integration in Complex Research Software - Handling Complexity](https://zenodo.org/records/14643958)
305+
- [PIConGPU](https://github.com/ComputationalRadiationPhysics/picongpu)
306+
- [Alpaka](https://github.com/alpaka-group/alpaka)
307+
- [Alpaka Job Matrix Library](https://github.com/alpaka-group/alpaka-job-matrix-library)
308+
- [Container Registry for CI Images](https://codebase.helmholtz.cloud/crp/alpaka-group-container)
309+
- [Dynamic CI Pipelines in GitLab](https://docs.gitlab.com/ee/ci/pipelines/downstream_pipelines.html#dynamic-child-pipelines)

0 commit comments

Comments
 (0)