Skip to content

Stream dmypy output instead of dumping everything at the end#16252

Merged
svalentin merged 4 commits intopython:masterfrom
svalentin:stream-output
Oct 16, 2023
Merged

Stream dmypy output instead of dumping everything at the end#16252
svalentin merged 4 commits intopython:masterfrom
svalentin:stream-output

Conversation

@svalentin
Copy link
Copy Markdown
Collaborator

@svalentin svalentin commented Oct 11, 2023

This does 2 things:

  1. It changes the IPC code to work with multiple messages.
  2. It changes the dmypy client/server communication so that it streams stdout/stderr instead of dumping everything at the end.

For 1, we have to provide a way to separate out different messages. I chose to frame messages as bytes separated by whitespace character. That means we have to encode the message in a scheme that escapes whitespace. The urllib.parse quote/unquote seems reasonable. It encodes more than needed but the application is not IPC IO limited so it should be fine. With this convention in place, all we have to do is read from the socket stream until we have a whitespace character.
The framing logic can be easily changed.

For 2, since we communicate with JSONs, it's easy to add a "finished" key that tells us it's the final response from dmypy. Anything else is just stdout/stderr output.

Note: dmypy server also returns out/err which is the output of actual mypy type checking. Right now this change does not stream that output. We can stream that in a followup change. We just have to decide on how to differenciate the 4 text streams (stdout/stderr/out/err) that will now be interleaved.

Played around with it on Linux quite a bit. Will also test it some more on Windows.

The WriteToConn class could use more love. I just put a bare minimum to test the rest.

This does 2 things:
1. It changes the IPC code to work with multiple messages.
2. It changes the dmypy client/server communication so that it streams
stdout/stderr instead of dumping everything at the end.

For 1, we have to provide a way to separate out different messages.
We can frame messages as bytes separated by whitespace character. That
means we have to encode the message in a scheme that escapes whitespace.
The urllib.parse quote/unquote seems reasonable. It encodes more than
needed but the application is not IPC IO limited so it should be fine.
With this convention in place, all we have to do is read from the socket
stream until we have a whitespace character.

For 2, since we communicate with JSONs, it's easy to add a "finished"
key that tells us it's the final response from dmypy. Anything else is
just stdout/stderr output.

Note: dmypy server also returns out/err which is the output of actual
mypy type checking. Right now this change does not stream that output.
We can stream that in a followup change. We just have to decide on how
to differenciate the 4 text streams (stdout/stderr/out/err) that will
now be interleaved.

Played around with it on Linux quite a bit. Will also test it some more
on Windows.

The WriteToConn class could use more love. I just put a bare minimum to
test the rest.
@svalentin svalentin requested a review from JukkaL October 11, 2023 22:27
Comment thread mypy/dmypy_server.py Outdated
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, overall looks good! This will make debugging daemon issues easier. Left a few comments (not a full review).

Comment thread mypy/ipc.py Outdated
import tempfile
from types import TracebackType
from typing import Callable, Final
from urllib.parse import quote, unquote
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a very minor optimization, what about using codecs.encode(<bytes_data>, 'hex') and codecs.decode(<encoded_data>, 'hex') instead? We wouldn't need to depend on urllib.parse, which is a Python package so importing it may take some time, and it would probably be a bit faster as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar minor optimization could be to use bytes.hex() and bytes.fromhex().

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up going with codecs.encode(<bytes_data>, 'base64'). I think it's a better encoding for this. It has a wider alphabet, but without space. Can easily change it to hex if you think it's better.

Base64 is a scheme for converting binary data to printable ASCII characters, namely the upper- and lower-case Roman alphabet characters (A–Z, a–z), the numerals (0–9), and the "+" and "/" symbols, with the "=" symbol as a special suffix code.

Comment thread mypy/test/testipc.py Outdated
p.start()
connection_name = queue.get()
with IPCClient(connection_name, timeout=1) as client:
client.write("test1")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test writing whitespace and non-ascii characters? Maybe test all unicode code points within some range.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Also added a test for multiple consecutive messages to test when a buffer has multiple frames inside it.

Copy link
Copy Markdown
Collaborator Author

@svalentin svalentin Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I didn't test all unicode code points. I tested some. I would trust codecs.encode to base64 does the right thing and we don't have to worry about it.

@github-actions
Copy link
Copy Markdown
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants