Skip to content

feat: add screen recording capability#159

Open
AlexAlves87 wants to merge 6 commits intoopenclaw:masterfrom
AlexAlves87:feat/screen-record
Open

feat: add screen recording capability#159
AlexAlves87 wants to merge 6 commits intoopenclaw:masterfrom
AlexAlves87:feat/screen-record

Conversation

@AlexAlves87
Copy link
Copy Markdown
Contributor

@AlexAlves87 AlexAlves87 commented Apr 8, 2026

Summary

Adds screen.record, screen.record.start, and screen.record.stop to ScreenCapability. screen.record does a fixed-duration capture and returns a base64 MP4. The start/stop pair allows session-based recording with a recordingId so the caller controls when to stop.

What changed

  • ScreenCapability: three new commands with events and arg parsing
  • ScreenRecordingService: WinRT capture (D3D11CaptureFramePool), BGRA→NV12 conversion, MediaTranscoder with hardware and software fallback
  • NodeService: wired to the new capability events

Testing

  • ./build.ps1
  • dotnet test ./tests/OpenClaw.Shared.Tests/OpenClaw.Shared.Tests.csproj --no-restore
  • dotnet test ./tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj --no-restore
  • Manual: called screen.record from a connected agent, received valid MP4

Notes

OpenClaw.Shared.Tests currently has 8 pre-existing failures on this branch related to culture-sensitive number formatting. These are unrelated to this change and are covered by a separate fix.

AlexAlves87 and others added 4 commits April 8, 2026 19:19
New command in the shared capability layer:

- screen.record: fixed-duration capture; blocks until done and returns
  the video as base64 MP4.

Args: durationMs (def. 5000), fps (def. 10), screenIndex/monitor (def. 0).
The monitor→screenIndex alias keeps consistency with screen.capture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WinRT-based implementation backing screen.record:

- D3D11 + Direct3D11CaptureFramePool for GPU-backed frame acquisition
- Software BGRA→NV12 conversion (BT.601 limited range) before encoding
- MediaTranscoder pipeline with hardware acceleration and SW fallback
- No external dependencies: pure P/Invoke (d3d11.dll, combase.dll)

Records the full monitor only. Per-window capture is not yet implemented.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
NodeService instantiates ScreenRecordingService and subscribes OnScreenRecord
to ScreenCapability's RecordRequested event.

Tests cover the full surface of screen.record: missing handler error, correct
arg forwarding, defaults (durationMs=5000, fps=10, screenIndex=0), the
monitor→screenIndex alias, and exception handling in the handler.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two new commands for session-based recording:

- screen.record.start: opens a recording session and returns a recordingId
- screen.record.stop: closes the session and returns the video

ActiveSession manages the capture loop with a CancellationToken and stores
frames safely under a lock. A ConcurrentDictionary keyed by recordingId
allows concurrent sessions.

9 new tests cover: start/stop without a handler, args and monitor alias,
recordingId in the start response, full stop payload, and exception paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

🤖 This is an automated response from Repo Assist.

Welcome, @AlexAlves87! Screen recording is a great capability addition — the overall approach is solid, and the test coverage is excellent (16 new tests covering all three commands including edge cases). A few observations worth addressing before merge:


🚨 BuildRawVideoStream uses .Wait() on a WinRT async

writer.StoreAsync().AsTask().Wait();

DataWriter.StoreAsync() is a WinRT async operation. Even though EncodeToMp4Async is static and called with await (so it runs on a thread pool thread), this is a sync-over-async pattern that can be brittle. A cleaner approach:

private static async Task(InMemoryRandomAccessStream) BuildRawVideoStreamAsync(
    List(byte[]) frames, int width, int height)
{
    var stream = new InMemoryRandomAccessStream();
    var writer = new DataWriter(stream);
    foreach (var frame in frames)
        writer.WriteBytes(BgraToNv12(frame, width, height));
    await writer.StoreAsync();
    stream.Seek(0);
    return stream;
}

⚠️ Memory footprint at max duration

At the default cap of 60 s × 10 fps × 1080p BGRA, the in-memory frame buffer can reach ~400 MB before encoding. At 60 fps that's ~2.4 GB. Consider documenting recommended limits in the capability's command help/schema, or lowering MaxDurationMs or adding a per-frame size warning.


🧹 Minor: ScreenRecordResult.Fps is float but always populated from int

All paths set Fps = fps where fps comes from Math.Clamp(args.Fps, MinFps, MaxFps) — which returns int. The float type is slightly misleading and the tests assert .GetInt32() on the serialized value, which would fail for fractional fps. Could just be int.


🔴 Pre-existing test failures on this branch

The PR description mentions 8 culture-sensitive number formatting failures. These should be resolved (or confirmed unrelated to this PR's changes with a green CI run) before merge, so maintainers have a clean signal.


✅ What looks great

  • Hardware → software MediaTranscoder fallback is smart and robust
  • Math.Clamp guards on fps and durationMs are good defensive programming
  • ActiveSession.StopAsync() cleanly cancels the frame-capture loop via CancellationTokenSource
  • The event-driven wiring into NodeService fits perfectly with the existing capability pattern
  • ScreenRecordingService.Dispose() properly iterates and disposes all live sessions

Looking forward to seeing this merged! Let me know if you have any questions.

Generated by Repo Assist ·

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@cbb46ab386962aa371045839fc9998ee4e97ca64

AlexAlves87 and others added 2 commits April 9, 2026 16:46
- Fix InvalidCastException in CreateForMonitor: pass IID_IInspectable
  instead of typeof(GraphicsCaptureItem).GUID, which returns a C#/WinRT-
  generated GUID unrecognized by the native COM method (E_NOINTERFACE).
- Replace PrepareStreamTranscodeAsync with PrepareMediaStreamSourceTranscodeAsync
  + MediaStreamSource feeding NV12 samples on demand, fixing "Transcode
  failed: Unknown" on all three screen recording commands.
- Add 500 MB frame-buffer cap (MaxFrameBufferBytes) with early stop and
  warning log to prevent OOM on long or high-fps recordings.
- Save encoded MP4 to %TEMP%\openclaw\ and return filePath in the response.
- Change ScreenRecordResult.Fps from float to int.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@AlexAlves87
Copy link
Copy Markdown
Contributor Author

Addressing the Repo Assist observations — all three issues are resolved in commit f4dbc52:

🚨 BuildRawVideoStream .Wait() — Fixed. BuildRawVideoStream no longer exists. The entire encoding pipeline was replaced with MediaStreamSource + PrepareMediaStreamSourceTranscodeAsync, which also fixed the root cause of the Transcode failed: Unknown error that was blocking screen recording entirely.

⚠️ Memory footprint — Fixed. Added MaxFrameBufferBytes = 500 MB constant with an early-stop guard in both capture loops (RecordAsync and ActiveSession.RunAsync). When the cap is hit, capture stops gracefully and a warning is logged. Limits documented in XML docs on ScreenRecordArgs and ScreenRecordStartArgs.

🧹 Fps float → int — Fixed. ScreenRecordResult.Fps is now int.

🔴 Pre-existing culture test failures — Resolved by syncing with upstream. Commit 6933239 (fix: use invariant culture in numeric display formatting, merged in #158) fixes those 8 failures. CI on this PR is now fully green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant