Skip to content

Commit 06a9409

Browse files
ci: align integration test runners with downstream CIs, add multi-GPU peft
Key changes after digging into each downstream project's own CI: Runner updates: - transformers: T4 → A10G (bandb-aws-g5-4xlarge-plus). Current upstream transformers quantization CI runs on g5.4xlarge (A10G); our earlier T4 choice came from a stale Feb-2024 fork. - peft (single GPU): A10 → L4 (bandb-aws-g6-4xlarge-plus). Matches peft's aws-g6-4xlarge-plus runner group exactly. PEFT filter: - Switched from `-m "single_gpu_tests and bitsandbytes"` (both test files) to Benjamin Bossan's recommendation: `-m single_gpu_tests -k PeftBnbGPUExampleTests tests/test_gpu_examples.py`. Narrower scope (20 vs 86 tests) focused on the end-to-end QLoRA-style integration signal, less noise from tests where bnb is incidental. New multi-GPU peft job: - Uses bandb-aws-g6-12xlarge-plus (4× L4, CUDA_VISIBLE_DEVICES=0,1) — mirroring the legacy peft nightly-bnb.yml deleted in peft#2858. - Filter: `-m multi_gpu_tests -k PeftBnbGPUExampleTests`. - Note: this runner is being provisioned by infra; job will fail to pick up a runner until that's done. Accelerate: - Added `-rs` to surface skip reasons. Previous run showed 26 silent skips that produced a false "pass"; -rs will print the reason for each. Report job's `needs:` updated to include test-peft-multigpu. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 2479b41 commit 06a9409

1 file changed

Lines changed: 67 additions & 8 deletions

File tree

.github/workflows/tests-integration-nightly.yml

Lines changed: 67 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,9 @@ jobs:
4343
# This reduces spurious failures from expected values calibrated on their runners.
4444

4545
test-transformers:
46-
name: Transformers bnb tests
46+
name: Transformers bnb tests (single GPU)
4747
if: github.repository == 'bitsandbytes-foundation/bitsandbytes'
48-
runs-on: bandb-aws-g4dn-4xlarge-plus-use1-public-80 # T4
48+
runs-on: bandb-aws-g5-4xlarge-plus-use1-public-80 # A10G (matches transformers CI)
4949
steps:
5050
- name: Show GPU information
5151
run: nvidia-smi
@@ -140,7 +140,7 @@ jobs:
140140
run: |
141141
mkdir -p ${GITHUB_WORKSPACE}/reports
142142
python -m pytest tests/test_quantization.py \
143-
-s -v \
143+
-s -v -rs \
144144
-k "not multi_device" \
145145
--junitxml=${GITHUB_WORKSPACE}/reports/accelerate.xml \
146146
-o junit_logging=all \
@@ -155,9 +155,9 @@ jobs:
155155
retention-days: 7
156156

157157
test-peft:
158-
name: PEFT bnb tests
158+
name: PEFT bnb tests (single GPU)
159159
if: github.repository == 'bitsandbytes-foundation/bitsandbytes'
160-
runs-on: bandb-aws-g5-4xlarge-plus-use1-public-80 # A10
160+
runs-on: bandb-aws-g6-4xlarge-plus-use1-public-80 # L4 (matches peft CI)
161161
steps:
162162
- name: Show GPU information
163163
run: nvidia-smi
@@ -196,8 +196,9 @@ jobs:
196196
run: |
197197
mkdir -p ${GITHUB_WORKSPACE}/reports
198198
python -m pytest \
199-
-m "single_gpu_tests and bitsandbytes" \
200-
tests/test_gpu_examples.py tests/test_common_gpu.py \
199+
-m single_gpu_tests \
200+
-k PeftBnbGPUExampleTests \
201+
tests/test_gpu_examples.py \
201202
-v \
202203
--junitxml=${GITHUB_WORKSPACE}/reports/peft.xml \
203204
-o junit_logging=all \
@@ -211,6 +212,64 @@ jobs:
211212
path: reports/
212213
retention-days: 7
213214

215+
test-peft-multigpu:
216+
name: PEFT bnb tests (multi GPU)
217+
if: github.repository == 'bitsandbytes-foundation/bitsandbytes'
218+
runs-on: bandb-aws-g6-12xlarge-plus-use1-public-80 # 4× L4
219+
steps:
220+
- name: Show GPU information
221+
run: nvidia-smi
222+
223+
- uses: actions/checkout@v4
224+
225+
- name: Setup Python
226+
uses: actions/setup-python@v5
227+
with:
228+
python-version: ${{ env.PYTHON_VERSION }}
229+
230+
- name: Install torch + bnb (from continuous-release)
231+
run: |
232+
pip install torch==${TORCH_VERSION} --index-url ${PYPI_INDEX}
233+
pip install "bitsandbytes[test] @ ${BNB_WHEEL_URL}"
234+
235+
- name: Install peft and clone matching tag
236+
run: |
237+
pip install peft transformers accelerate datasets
238+
PEFT_VERSION=$(pip show peft | awk '/^Version:/ {print $2}')
239+
echo "Installed peft v${PEFT_VERSION}"
240+
git clone --depth=1 --branch "v${PEFT_VERSION}" \
241+
https://github.com/huggingface/peft.git /tmp/peft
242+
243+
- name: Show environment
244+
run: |
245+
pip list
246+
python -m torch.utils.collect_env
247+
248+
- name: Run peft bnb tests
249+
working-directory: /tmp/peft
250+
env:
251+
IS_GITHUB_CI: "1"
252+
CUDA_VISIBLE_DEVICES: "0,1"
253+
shell: bash -o pipefail {0}
254+
run: |
255+
mkdir -p ${GITHUB_WORKSPACE}/reports
256+
python -m pytest \
257+
-m multi_gpu_tests \
258+
-k PeftBnbGPUExampleTests \
259+
tests/test_gpu_examples.py \
260+
-v \
261+
--junitxml=${GITHUB_WORKSPACE}/reports/peft-multigpu.xml \
262+
-o junit_logging=all \
263+
2>&1 | tee ${GITHUB_WORKSPACE}/reports/peft-multigpu.log
264+
265+
- name: Upload JUnit XML and log
266+
if: always()
267+
uses: actions/upload-artifact@v4
268+
with:
269+
name: reports-peft-multigpu
270+
path: reports/
271+
retention-days: 7
272+
214273
# ─── Consolidated report ──────────────────────────────────────────────────
215274
# Runs after all three test jobs finish (success or failure).
216275
# Downloads the JUnit XMLs, runs our report script, writes to the job
@@ -221,7 +280,7 @@ jobs:
221280

222281
report:
223282
name: Consolidated report
224-
needs: [test-transformers, test-accelerate, test-peft]
283+
needs: [test-transformers, test-accelerate, test-peft, test-peft-multigpu]
225284
if: always() && github.repository == 'bitsandbytes-foundation/bitsandbytes'
226285
runs-on: ubuntu-22.04
227286
steps:

0 commit comments

Comments
 (0)