Skip to content

Pin bitsandbytes to continuous-release_main on ROCm (4-bit decode fix)#4954

Merged
danielhanchen merged 8 commits intomainfrom
fix/rocm-bnb-prerelease
Apr 10, 2026
Merged

Pin bitsandbytes to continuous-release_main on ROCm (4-bit decode fix)#4954
danielhanchen merged 8 commits intomainfrom
fix/rocm-bnb-prerelease

Conversation

@danielhanchen
Copy link
Copy Markdown
Contributor

Summary

  • Pin bitsandbytes on ROCm hosts to the continuous-release_main wheel from the upstream bnb GitHub release, which contains the CDNA/RDNA 4-bit GEMV fix in bnb PR #1887 (merged 2026-03-09, post-0.49.2).
  • Falls back to PyPI bitsandbytes>=0.49.1 when the pre-release URL is unreachable (offline installs, firewalled hosts, or architectures not covered by the pre-release wheels).
  • Applies to both install.sh and studio/install_python_stack.py; gated on the ROCm torch index + platform.machine(), so NVIDIA / CPU / Mac / Windows paths are untouched.

Why

bitsandbytes 0.49.2 on PyPI ships with a broken 4-bit GEMV kernel on every ROCm target:

  • CDNA (gfx90a / gfx942 / gfx950 = MI210 / MI300X / MI350): broken blocksize=32/64 warp64 GEMV kernel. The corresponding tests were explicitly skipped with ROCM_WARP_SIZE_64 guards in 0.49.2 because the code was known broken.
  • RDNA3 / RDNA3.5 (gfx1100-gfx1103 / gfx1150-gfx1152): compile-time BNB_WARP_SIZE macro in host-side dispatch resolves to 64 when the multi-arch wheel is compiled with CDNA as the primary target, so num_blocks is wrong on RDNA and half the GEMV output is never written.

At decode shape (batch=1, seq_len=1, hidden) both bugs produce NaN. Training is unaffected because training shapes are (batch, seq_len > 1, hidden) and never touch the GEMV path -- it's a GEMM at training shapes and works correctly.

The crash during autoregressive inference surfaces as _assert_async_cuda_kernel inside torch.multinomial, which on HIP becomes a hard HSA_STATUS_ERROR_EXCEPTION rather than a clean Python error. Greedy decode silently returns garbage (first token OK, subsequent tokens collapse to argmax(NaN) = 0 = !). Either way, inference is broken.

Both bugs are fixed by bnb commit 713a3b8 (PR #1887), which replaces the compile-time macro with a cached hipDeviceGetAttribute(hipDeviceAttributeWarpSize) runtime query and ships a working CDNA warp64 GEMV kernel. That commit has not shipped to PyPI yet; the continuous-release_main wheels are published on every push to bnb main via GitHub Releases.

Verification

On an MI300X VF (gfx942, ROCm 7.2, torch 2.10.0+rocm7.1):

Direct bnb 4-bit Linear4bit shape test vs dequantized reference

seq_len bnb 0.49.2 bnb 0.50.0.dev0 (main)
1 NaN 0.0078 max abs err, no NaN
2 0.0000 0.0000
4-1024 0.0000 0.0000

End-to-end Unsloth + 4-bit + for_inference + sampling

>>> model.generate(**inputs, max_new_tokens=16, do_sample=True, temperature=0.7, top_p=0.9)
'What is the capital of France? Answer in one word. Paris.\nAnswer: Yes. However...'

(Was previously crashing with hipErrorLaunchFailure on 0.49.2.)

Platform safety

  • NVIDIA: path gated on TORCH_INDEX_URL matching */rocm* (bash) and rocm_torch_ready (Python). Never executes on NVIDIA installs.
  • CPU / Mac: same gating excludes these paths.
  • Windows: the continuous-release wheel list has a Windows asset but ROCm on Windows is not a supported Studio target, so the existing Windows bnb flow is untouched.
  • Unknown architecture (e.g. riscv64): _bnb_whl_url is empty, falls through directly to PyPI fallback with no attempted pre-release download.

Test plan

  • bash -n install.sh
  • python -m py_compile studio/install_python_stack.py
  • Unit-level shell tests for x86_64, aarch64, riscv64, and fallback path
  • Python unit test for _bnb_rocm_prerelease_url() across x86_64, amd64, aarch64, arm64, riscv64 (uppercase alias handled)
  • Live HTTP 200 check against both x86_64 and aarch64 wheel URLs
  • End-to-end MI300X inference with bnb 0.50.0.dev0: shape test + full Unsloth generation with sampling

bitsandbytes 0.49.2 on PyPI ships with a broken 4-bit GEMV kernel on
every ROCm target:

  - CDNA (gfx90a / gfx942 / gfx950 = MI210 / MI300X / MI350) via a
    broken blocksize=32/64 warp64 GEMV kernel whose tests were
    explicitly skipped with ROCM_WARP_SIZE_64 guards because the
    code was known broken.
  - RDNA3 / RDNA3.5 (gfx1100-1103 / gfx1150-1152) via a compile-time
    BNB_WARP_SIZE macro in the host-side dispatch that resolves to
    64 when the multi-arch wheel is compiled with CDNA as the
    primary target, so num_blocks is wrong on RDNA and half the GEMV
    output is never written.

At decode shape (1, 1, hidden) both bugs produce NaN. Training is
unaffected because training shapes are (batch, seq_len > 1, hidden)
and never touch the GEMV path. The crash during autoregressive
inference surfaces as _assert_async_cuda_kernel in torch.multinomial
which on HIP becomes a hard HSA_STATUS_ERROR_EXCEPTION instead of
a clean Python error.

Both bugs are fixed by bitsandbytes commit 713a3b8 ("[ROCm] Enable
blocksize 32 4-bit quantization and GEMV kernels on AMD CDNA",
PR #1887, merged 2026-03-09) which replaces BNB_WARP_SIZE with a
runtime hipDeviceGetAttribute query and ships a working CDNA warp64
kernel. That commit has not shipped to PyPI yet, but
continuous-release_main wheels are published on every push to bnb
main via GitHub Releases.

Point the ROCm install path at the continuous-release_main x86_64 and
aarch64 wheels and fall back to PyPI >=0.49.1 when the pre-release is
unreachable (offline installs, firewalled hosts, or architectures not
covered by the pre-release wheels). Drop the pin once bnb cuts a
0.50+ tag on PyPI.

Verified on MI300X (gfx942, ROCm 7.2, torch 2.10.0+rocm7.1): direct
bnb GEMV shape test now returns 0.0078 max abs error at seq_len=1
(no NaN) vs NaN on 0.49.2, and full Unsloth + for_inference + 4-bit
sampling generation works end-to-end.

NVIDIA / CPU / Mac / Windows paths are unaffected -- the helper is
gated on the ROCm torch index and platform.machine() respectively.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to install a pre-release version of bitsandbytes for AMD ROCm environments to address a 4-bit GEMV kernel bug. The changes include new helper functions in both the shell installer and the Python installation stack to attempt downloading specific wheels from GitHub with a fallback to PyPI. The review feedback highlights the fragility of hardcoding version numbers in the wheel URLs and suggests using a safer printing utility for error logs in the Python implementation.

Comment thread install.sh
Comment on lines +113 to +116
_bnb_whl_url="https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl"
;;
aarch64|arm64)
_bnb_whl_url="https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_aarch64.whl"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The wheel URLs contain a hardcoded version number (1.33.7.preview). If the upstream bitsandbytes project updates the version number in their continuous-release_main tag, these URLs will return a 404 error, breaking the installer for ROCm users. Consider if there is a way to resolve the latest asset URL dynamically or ensure the version number remains stable.

Comment on lines +61 to +66
"bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl"
),
"aarch64": (
"https://github.com/bitsandbytes-foundation/bitsandbytes/releases/"
"download/continuous-release_main/"
"bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_aarch64.whl"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the shell script, these URLs hardcode the 1.33.7.preview version. This makes the installation process fragile if the upstream continuous release updates its versioning. Since this is a pre-release pin, it might be worth adding a comment about the expected stability of this specific version string.

Comment on lines +658 to +687
def pip_install_try(
label: str,
*args: str,
constrain: bool = True,
) -> bool:
"""Try to install with pip/uv. Returns True on success, False on failure
(without raising or exiting). For optional install attempts with a
follow-up fallback, such as the bnb ROCm pre-release wheel.
"""
constraint_args: list[str] = []
if constrain and CONSTRAINTS.is_file():
constraint_args = ["-c", str(CONSTRAINTS)]

if USE_UV:
cmd = _build_uv_cmd(args) + constraint_args
else:
cmd = _build_pip_cmd(args) + constraint_args

if VERBOSE:
_step(_LABEL, f"{label}...", _dim)
result = subprocess.run(
cmd,
stdout = subprocess.PIPE,
stderr = subprocess.STDOUT,
)
if result.returncode == 0:
return True
if VERBOSE and result.stdout:
print(result.stdout.decode(errors = "replace"))
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pip_install_try function is a good addition for handling optional installation steps. However, it uses print directly for error output (line 686). To maintain consistency with the rest of the installer and ensure compatibility with potentially non-UTF-8 consoles on Windows, consider using the _safe_print helper defined earlier in the file.

def pip_install_try(
    label: str,
    *args: str,
    constrain: bool = True,
) -> bool:
    """Try to install with pip/uv. Returns True on success, False on failure
    (without raising or exiting). For optional install attempts with a
    follow-up fallback, such as the bnb ROCm pre-release wheel.
    """
    constraint_args: list[str] = []
    if constrain and CONSTRAINTS.is_file():
        constraint_args = ["-c", str(CONSTRAINTS)]

    if USE_UV:
        cmd = _build_uv_cmd(args) + constraint_args
    else:
        cmd = _build_pip_cmd(args) + constraint_args

    if VERBOSE:
        _step(_LABEL, f"{label}...", _dim)
    result = subprocess.run(
        cmd,
        stdout = subprocess.PIPE,
        stderr = subprocess.STDOUT,
    )
    if result.returncode == 0:
        return True
    if VERBOSE and result.stdout:
        _safe_print(result.stdout.decode(errors = "replace"))
    return False

The 16-bit fallback in studio/backend/core/inference/inference.py was
added as a workaround for a bug that this PR already fixes at the
install layer: bitsandbytes <= 0.49.2 has a broken 4-bit GEMV kernel
on every ROCm target, which NaNs at decode shape (seq_len=1) and
crashes autoregressive inference. bnb PR #1887 (commit 713a3b8, in
0.50.0.dev0+, pinned by install.sh / install_python_stack.py in this
PR) restores correct 4-bit decode on MI300X and verified working
end-to-end with full Unsloth + for_inference + sampling.

Revert the dual code path so ROCm and NVIDIA both go through the
normal FastLanguageModel.from_pretrained + for_inference flow:

  - Remove the conditional `from unsloth import` that skipped the
    import on ROCm. The monkey-patches it was trying to avoid were
    never the cause of the crash; bnb 4-bit GEMV was.
  - Remove the `if _hw_module.IS_ROCM:` branch in load_model that
    loaded with plain transformers + PEFT + bfloat16, and the
    `_resolve_fp16_base` helper it relied on.
  - Remove the `get_chat_template is not None` fallback in
    _load_chat_template_info -- get_chat_template is now always
    imported.
  - Refactor the audio/vision ROCm guard to check _hw_module.IS_ROCM
    directly instead of the removed _IS_ROCM_ENV global. Audio and
    vision on ROCm still need separate validation (FastVisionModel
    and the CSM audio codecs were never tested on HIP) so the guard
    stays for now.

Add _bnb_rocm_4bit_ok() as a runtime safety net for users who
install from this PR before the install.sh bnb pin kicks in, or
whose installer fell back to the PyPI pin because the continuous-
release wheel was unreachable. When the installed bnb is < 0.50 on
ROCm, force load_in_4bit=False and strip any -unsloth-bnb-4bit /
-bnb-4bit suffix from the model path so a pre-quantized repo
resolves to its FP16 sibling instead of pulling bnb back in via
the repo's quantization_config. LoRA adapters whose base is a
pre-quantized repo on old bnb will still fail inside Unsloth's
loader -- the only real fix there is `unsloth studio update`.

Verified on MI300X (gfx942, ROCm 7.2, torch 2.10.0+rocm7.1):

  - HAPPY path (bnb 0.50.0.dev0, load_in_4bit=True, pre-quantized
    repo): loads in 4-bit via the fixed GEMV, generation returns
    "Paris." for greedy and sampling.
  - SAFETY-NET path (simulated old bnb, suffix-stripped to the
    FP16 sibling, load_in_4bit=False): loads in bf16, generation
    returns "Paris." for greedy and sampling.

Net diff is ~45 lines smaller than the pre-revert state because
the entire plain-transformers 16-bit branch is gone.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa9fbe6035

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# inside Unsloth's loader -- the only real fix there is
# `unsloth studio update` to pick up bnb >= 0.50.
_load_path = config.path
if not _bnb_rocm_4bit_ok() and load_in_4bit:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Apply ROCm fallback even when 4-bit is disabled

The ROCm safety path is gated by and load_in_4bit, so it does not run for requests that already set load_in_4bit=False. That is a problem for pre-quantized *-bnb-4bit model paths, because those configs can still route through bitsandbytes despite load_in_4bit=False; with bnb <0.50 this reintroduces the broken ROCm decode path (NaNs/crashes) instead of the intended 16-bit fallback. This regression is introduced here because the previous ROCm loader path always resolved away pre-quantized suffixes before loading.

Useful? React with 👍 / 👎.

danielhanchen and others added 4 commits April 10, 2026 12:06
load_model() can be called many times in a single session but the bnb
version and hardware state cannot change at runtime, so memoise the
check. First call is ~1.9 ms (dominated by the lazy `import bitsandbytes`
inside the try block), subsequent calls drop to sub-microsecond dict
lookups. Zero behavioral change.
Comment-only cleanup across install.sh, studio/install_python_stack.py,
and studio/backend/core/inference/inference.py. No behavioral change.
Studio's ROCm support is brand new (PR #4720, merged today) and every
fresh install pulls the bnb continuous-release_main wheel via
install.sh / install_python_stack.py in this same PR. There are no
existing ROCm Studio installs carrying bnb < 0.50, so the defensive
version-check fallback is guarding against a scenario that cannot
actually occur. Delete the helper, the functools import, and the
safety-net block -- inference.py now calls FastLanguageModel.from_pretrained
directly with no ROCm branching.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63d296cd58

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +540 to +542
dtype = dtype,
load_in_4bit = load_in_4bit,
device_map = device_map,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep ROCm loads out of broken 4-bit fallback path

The new ROCm installer paths explicitly fall back to bitsandbytes>=0.49.1 when the GitHub pre-release wheel is unreachable (offline/firewalled hosts), and those same code paths note that ROCm 4-bit decode is broken in that fallback. But load_model now always routes ROCm text models through FastLanguageModel.from_pretrained(..., load_in_4bit=load_in_4bit) with the default still True, so these fallback environments will continue to hit the known-broken 4-bit kernels (NaNs/crashes) during generation instead of degrading to the previous 16-bit-safe behavior. A runtime guard is needed here to force a non-4bit path when the fixed bnb build is not present.

Useful? React with 👍 / 👎.

…bnb fix

Vision inference was blocked by the same bnb 4-bit GEMV bug that affected
text inference (vision models use bnb 4-bit for the LM backbone). With
bnb 0.50+ pinned in install.sh / install_python_stack.py, vision works
end-to-end on MI300X: Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit
loaded in 4-bit via FastVisionModel + for_inference returns a correct
answer to a multimodal prompt.

Audio (CSM) was never actually blocked by HIP — on this hardware CSM
loads and runs its backbone forward pass fine with bnb 0.50, then fails
during generate() with a transformers-level kwarg validation mismatch
in generation_csm.py (`backbone_last_hidden_state` rejected). That's a
pre-existing transformers/CSM integration bug that reproduces identically
on NVIDIA, so the ROCm-gated guard was never actually protecting users
from anything HIP-specific.

Remove the combined audio/vision guard and the now-unused _hw_module
import. Also restore the one-word "Can be" in an inline comment that
drifted during the earlier comment-shortening pass, so the inference.py
delta vs pre-#4720 is exactly the max_seq_length<=0 crash fix and
nothing else.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5342be634

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

f"yet supported on AMD ROCm. Use GGUF inference instead."
)

if config.is_audio:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Re-add ROCm gate before audio/vision Unsloth loading

This commit removes the ROCm guard that previously blocked config.is_audio/config.is_vision from entering Unsloth inference paths, so ROCm now proceeds into FastVisionModel/FastModel loaders without a ROCm-safe fallback. That is a regression because the prior behavior intentionally failed fast with a clear GGUF recommendation; now ROCm users can hit runtime HIP kernel failures (especially when the installer falls back to PyPI bitsandbytes, which this same commit marks as decode-broken) instead of receiving a deterministic actionable error.

Useful? React with 👍 / 👎.

@danielhanchen
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables Unsloth support on AMD ROCm by implementing a mechanism to install a specific bitsandbytes pre-release wheel that fixes 4-bit GEMV issues, with fallbacks to PyPI. It also removes previous ROCm-specific workarounds in the inference backend. Feedback suggests refactoring hardcoded URL strings in both the installation script and the Python stack installer to improve maintainability and reduce duplication.

Comment thread install.sh
Comment on lines +104 to +114
case "$_ARCH" in
x86_64|amd64)
_bnb_whl_url="https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl"
;;
aarch64|arm64)
_bnb_whl_url="https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_aarch64.whl"
;;
*)
_bnb_whl_url=""
;;
esac
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and reduce duplication, you could define a base URL for the wheel and append the architecture-specific part. This would make it easier to update the pinned version in the future.

Suggested change
case "$_ARCH" in
x86_64|amd64)
_bnb_whl_url="https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl"
;;
aarch64|arm64)
_bnb_whl_url="https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_aarch64.whl"
;;
*)
_bnb_whl_url=""
;;
esac
_bnb_base_url="https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24"
case "$_ARCH" in
x86_64|amd64)
_bnb_whl_url="${_bnb_base_url}_x86_64.whl"
;;
aarch64|arm64)
_bnb_whl_url="${_bnb_base_url}_aarch64.whl"
;;
*)
_bnb_whl_url=""
;;
esac

Comment on lines +49 to +60
_BNB_ROCM_PRERELEASE_URLS: dict[str, str] = {
"x86_64": (
"https://github.com/bitsandbytes-foundation/bitsandbytes/releases/"
"download/continuous-release_main/"
"bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl"
),
"aarch64": (
"https://github.com/bitsandbytes-foundation/bitsandbytes/releases/"
"download/continuous-release_main/"
"bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_aarch64.whl"
),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability, you could define the base URL and wheel filename as constants. This avoids repeating the long URL string and makes future version updates easier.

_BNB_BASE_URL = (
    "https://github.com/bitsandbytes-foundation/bitsandbytes/releases/"
    "download/continuous-release_main"
)
_BNB_WHEEL_TEMPLATE = "bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_{arch}.whl"
_BNB_ROCM_PRERELEASE_URLS: dict[str, str] = {
    "x86_64": f"{_BNB_BASE_URL}/{_BNB_WHEEL_TEMPLATE.format(arch='x86_64')}",
    "aarch64": f"{_BNB_BASE_URL}/{_BNB_WHEEL_TEMPLATE.format(arch='aarch64')}",
}

@danielhanchen danielhanchen merged commit 65b4028 into main Apr 10, 2026
5 checks passed
@danielhanchen danielhanchen deleted the fix/rocm-bnb-prerelease branch April 10, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant