Workaround for missing str.isascii() in Python 3.6 (#389)

Adaephon-GH · asvetlov · commit 879a3f6f9034 · 2019-12-04T19:20:53.000+02:00
* Workaround for missing str.isascii() in Python 3.6

This would allow for checking if `host` contains only ASCII characters with Python 3.6 and 3.5. 

Performance tests with `%timeit` in `ipython` on Python 3.6 show that this check takes about 0.18 μs, if the first character in `host` is non-ASCII. 0.87 μs if the 10th character is the first non-ASCII character and 1.46 μs if the 20th character is non-ASCII. The times are about the same, if `host` is purely ASCII and 1, 10 or 20 characters long, respectively. 

While this is quite a bit slower than `str.isascii()` on Python 3.8 on the same machine (about 0.038 μs, independ of length or position of the characters) it is about 25 times faster than running IDNA encoding needlessly: for 20 characters `idna.encode(host, uts46=True).decode("ascii")` takes about 40 μs if `host` is ASCII.

If some unicode character is found, the added time is negligible in comparison to the time needed for encoding: on 20 characters it takes 64 μs if one character is Unicode and about 85 - 150 μs if it contains only Unicode characters (There seems to be quite a spread depending on the characters used). So about 0.1 - 2.3 % more time, depending on where the first Unicode character is placed and how many there ares.

* Do lexical comparison

Lexical comparison of two single letter strings ("characters") looks to be faster than first calling `ord()` on the character and doing a numerical comparison.
diff --git a/yarl/__init__.py b/yarl/__init__.py
@@ -683,13 +683,18 @@ def _encode_host(cls, host):
             return host
 
     else:
-        # the same bug without isascii check
+        # work around for missing str.isascii() in Python <= 3.6
         @classmethod
         def _encode_host(cls, host):
             try:
                 ip, sep, zone = host.partition("%")
                 ip = ip_address(ip)
             except ValueError:
+                for char in host:
+                    if char > "\x7f":
+                        break
+                else:
+                    return host
                 try:
                     host = idna.encode(host, uts46=True).decode("ascii")
                 except UnicodeError: