Skip to content

Fix 2-second systemd notify stalls in cloud-hypervisor#493

Open
rzhikharevich wants to merge 3 commits intomicrovm-nix:mainfrom
rzhikharevich:cloud-hv-notify-proxy-fix
Open

Fix 2-second systemd notify stalls in cloud-hypervisor#493
rzhikharevich wants to merge 3 commits intomicrovm-nix:mainfrom
rzhikharevich:cloud-hv-notify-proxy-fix

Conversation

@rzhikharevich
Copy link
Copy Markdown

@rzhikharevich rzhikharevich commented Mar 27, 2026

This is a fix for #474. See the code comment for explanation and caveats.

Relevant parts of:

@rzhikharevich rzhikharevich changed the title Fix 2-second systemd notify deadlocks Fix 2-second systemd notify stalls Mar 27, 2026
@rzhikharevich rzhikharevich changed the title Fix 2-second systemd notify stalls Fix 2-second systemd notify stalls in cloud-hypervisor Mar 27, 2026
@yorickvP
Copy link
Copy Markdown

I think reader.read(-1) should also work, it reads until EOF

@astro
Copy link
Copy Markdown
Member

astro commented Mar 30, 2026

@RaitoBezarius Please help review.

@rzhikharevich
Copy link
Copy Markdown
Author

Unfortunately, there is no EOF because cloud-hypervisor doesn't seem to propagate the socket half-close to the host socket, so read(-1) doesn't work.

@rzhikharevich
Copy link
Copy Markdown
Author

rzhikharevich commented Mar 31, 2026

read(-1) hangs until systemd decides to kill the unit because of the activation timeout:

roman@nixform ~/D/S/microvm.nix (cloud-hv-notify-proxy-fix) [1]> pgrep -af notify-proxy
322387 /nix/store/pzdalg368npikvpq4ncz2saxnz19v53k-python3-3.13.12/bin/python3.13 /nix/store/lqm440b0y5alm3nb6fj044y7hr5aykxs-vsock-notify-proxy/bin/vsock-notify-proxy
roman@nixform ~/D/S/microvm.nix (cloud-hv-notify-proxy-fix)> sudo strace -p 322387
strace: Process 322387 attached
epoll_wait(5

@rzhikharevich
Copy link
Copy Markdown
Author

By the way, should I use vmHostPackages or pkgs here? I see that some other runners use the former but cloud-hypervisor.nix doesn't seem to.

# -T2 is required because cloud-hypervisor does not handle partial
# shutdown of the stream, like systemd v256+ does.
${pkgs.socat}/bin/socat -T2 UNIX-LISTEN:${vsockPath}_8888,fork UNIX-SENDTO:$NOTIFY_SOCKET &
VSOCK_NOTIFY_PATH=${vsockPath}_8888 ${vsockNotifyProxy}/bin/vsock-notify-proxy &
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VSOCK_NOTIFY_PATH=${vsockPath}_8888 ${vsockNotifyProxy}/bin/vsock-notify-proxy &
${lib.getExe vsockNotifyProxy} ${vsockPath}_8888 &

Comment on lines +160 to +161
path = os.environ["VSOCK_NOTIFY_PATH"]
server = await asyncio.start_unix_server(handle, path=path)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
path = os.environ["VSOCK_NOTIFY_PATH"]
server = await asyncio.start_unix_server(handle, path=path)
server = await asyncio.start_unix_server(handle, path=sys.argv[0])

We can make this kinda dirty here, as we control all invocations and this fails loudly when no arg is there, os.environ does IIRC return empty string.

# An important caveat is that the message could theoretically be truncated but this should not
# happen in practice unless the message is large (systemd notifications are small).
#
# TODO(rzhikharevich): Ideally, cloud-hypervisor should propagate the half-close, then the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO(rzhikharevich): Ideally, cloud-hypervisor should propagate the half-close, then the
# TODO: Ideally, cloud-hypervisor should propagate the half-close, then the

@SuperSandro2000 SuperSandro2000 requested a review from Copilot March 31, 2026 13:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses slow microvm boot times when running cloud-hypervisor under systemd with microvm.vsock.cid enabled, by replacing the previous socat-based notify forwarding (which introduced a 2s stall per notification) with a purpose-built proxy that immediately closes the vsock-stream connection after a single read.

Changes:

  • Add a Python-based vsock-notify-proxy that forwards guest notify messages from the cloud-hypervisor vsock socket to the host NOTIFY_SOCKET.
  • Replace the socat -T2 forwarding logic in preStart with the new proxy to avoid repeated systemd stalls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


async def handle(reader, writer):
data = await reader.read(65536)
writer.close()
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After calling writer.close(), it’s generally best to await writer.wait_closed() (or equivalent) to ensure the transport is actually closed and file descriptors are released promptly. This helps avoid accumulating open connections under load and prevents noisy ResourceWarnings in some Python/asyncio versions.

Suggested change
writer.close()
writer.close()
await writer.wait_closed()

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1st example in the Python asyncio documentation does this, too.

@RaitoBezarius
Copy link
Copy Markdown
Contributor

@RaitoBezarius Please help review.

I'm sorry, I don't review anything that involves GenAI in a project on my free time.

@SuperSandro2000
Copy link
Copy Markdown
Member

@rzhikharevich can you please incorporate the feedback?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants