Skip to content

Commit 879a3f6

Browse files
Adaephon-GHasvetlov
authored andcommitted
Workaround for missing str.isascii() in Python 3.6 (#389)
* Workaround for missing str.isascii() in Python 3.6 This would allow for checking if `host` contains only ASCII characters with Python 3.6 and 3.5. Performance tests with `%timeit` in `ipython` on Python 3.6 show that this check takes about 0.18 μs, if the first character in `host` is non-ASCII. 0.87 μs if the 10th character is the first non-ASCII character and 1.46 μs if the 20th character is non-ASCII. The times are about the same, if `host` is purely ASCII and 1, 10 or 20 characters long, respectively. While this is quite a bit slower than `str.isascii()` on Python 3.8 on the same machine (about 0.038 μs, independ of length or position of the characters) it is about 25 times faster than running IDNA encoding needlessly: for 20 characters `idna.encode(host, uts46=True).decode("ascii")` takes about 40 μs if `host` is ASCII. If some unicode character is found, the added time is negligible in comparison to the time needed for encoding: on 20 characters it takes 64 μs if one character is Unicode and about 85 - 150 μs if it contains only Unicode characters (There seems to be quite a spread depending on the characters used). So about 0.1 - 2.3 % more time, depending on where the first Unicode character is placed and how many there ares. * Do lexical comparison Lexical comparison of two single letter strings ("characters") looks to be faster than first calling `ord()` on the character and doing a numerical comparison.
1 parent 9b18e9e commit 879a3f6

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

yarl/__init__.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -683,13 +683,18 @@ def _encode_host(cls, host):
683683
return host
684684

685685
else:
686-
# the same bug without isascii check
686+
# work around for missing str.isascii() in Python <= 3.6
687687
@classmethod
688688
def _encode_host(cls, host):
689689
try:
690690
ip, sep, zone = host.partition("%")
691691
ip = ip_address(ip)
692692
except ValueError:
693+
for char in host:
694+
if char > "\x7f":
695+
break
696+
else:
697+
return host
693698
try:
694699
host = idna.encode(host, uts46=True).decode("ascii")
695700
except UnicodeError:

0 commit comments

Comments
 (0)