Skip to content

Error encountered when using SDK mode #465

@hailuoS

Description

@hailuoS

I encountered this issue while running the simple math example in SDK mode. What's strange is that it ran successfully a few days ago without any modifications, but now it's throwing an error.

(TaskRunner pid=99857) ERROR:2026-03-30 02:33:06,181:[f4c7d89f-fa83-48c0-a370-324084196b05:9:3] Rollout failed: Traceback (most recent call last):
(TaskRunner pid=99857) File "/workspace/rllm/rllm/engine/agent_sdk_engine.py", line 180, in _execute_with_exception_handling
(TaskRunner pid=99857) output, session_uid = await loop.run_in_executor(self.executor, bound_func)
(TaskRunner pid=99857) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=99857) File "/usr/lib/python3.12/concurrent/futures/thread.py", line 59, in run
(TaskRunner pid=99857) result = self.fn(*self.args, **self.kwargs)
(TaskRunner pid=99857) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=99857) File "/workspace/rllm/rllm/sdk/session/base.py", line 64, in wrapped_sync
(TaskRunner pid=99857) output = agent_func(*args, **kwargs)
(TaskRunner pid=99857) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=99857) File "/workspace/rllm/examples/sdk/simple_math/train_hendrycks_math.py", line 25, in rollout
(TaskRunner pid=99857) response = client.chat.completions.create(
(TaskRunner pid=99857) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=99857) File "/usr/local/lib/python3.12/dist-packages/openai/_utils/_utils.py", line 286, in wrapper
(TaskRunner pid=99857) return func(*args, **kwargs)
(TaskRunner pid=99857) ^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=99857) File "/usr/local/lib/python3.12/dist-packages/openai/resources/chat/completions/completions.py", line 1211, in create
(TaskRunner pid=99857) return self._post(
(TaskRunner pid=99857) ^^^^^^^^^^^
(TaskRunner pid=99857) File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 1297, in post
(TaskRunner pid=99857) return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
(TaskRunner pid=99857) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=99857) File "/workspace/rllm/rllm/sdk/chat/openai.py", line 398, in request
(TaskRunner pid=99857) response = OpenAI.request(temp_client, cast_to, options, stream=stream, stream_cls=stream_cls)
(TaskRunner pid=99857) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(TaskRunner pid=99857) File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 1070, in request
(TaskRunner pid=99857) raise self._make_status_error_from_response(err.response) from None
(TaskRunner pid=99857) openai.RateLimitError: Error code: 429 - {'error': {'message': "No deployments available for selected model, Try again in 5 seconds. Passed model=/home/shl/models/DeepSeek-R1-Distill-Qwen-1.5B. pre-call-checks=False, cooldown_list=[('verl-replica-0', {'exception_received': 'litellm.InternalServerError: InternalServerError: Hosted_vllmException - Connection error.', 'status_code': '500', 'timestamp': 1774863182.0375342, 'cooldown_time': 5}), ('verl-replica-1', {'exception_received': 'litellm.InternalServerError: InternalServerError: Hosted_vllmException - Connection error.', 'status_code': '500', 'timestamp': 1774863182.7071717, 'cooldown_time': 5}), ('verl-replica-2', {'exception_received': 'litellm.InternalServerError: InternalServerError: Hosted_vllmException - Connection error.', 'status_code': '500', 'timestamp': 1774863183.0246875, 'cooldown_time': 5}), ('verl-replica-3', {'exception_received': 'litellm.InternalServerError: InternalServerError: Hosted_vllmException - Connection error.', 'status_code': '500', 'timestamp': 1774863182.2435002, 'cooldown_time': 5}), ('verl-replica-4', {'exception_received': 'litellm.InternalServerError: InternalServerError: Hosted_vllmException - Connection error.', 'status_code': '500', 'timestamp': 1774863181.8245385, 'cooldown_time': 5}), ('verl-replica-5', {'exception_received': 'litellm.InternalServerError: InternalServerError: Hosted_vllmException - Connection error.', 'status_code': '500', 'timestamp': 1774863181.8098967, 'cooldown_time': 5}), ('verl-replica-6', {'exception_received': 'litellm.InternalServerError: InternalServerError: Hosted_vllmException - Connection error.', 'status_code': '500', 'timestamp': 1774863182.1531959, 'cooldown_time': 5}), ('verl-replica-7', {'exception_received': 'litellm.InternalServerError: InternalServerError: Hosted_vllmException - Connection error.', 'status_code': '500', 'timestamp': 1774863183.4203734, 'cooldown_time': 5})]", 'type': 'None', 'param': 'None', 'code': '429'}}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions