Message 413231 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	christian.heimes
Recipients	christian.heimes, slingamn
Date	2022-02-14.16:32:14
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1644856334.75.0.0937374375451.issue46750@roundup.psfhosted.org>
In-reply-to

Content
Please provide benchmarks and data for your claim that encodings.idna is a performance bottleneck. encodings.idna is a simple, short module without state. On my system it takes about 0.15 msec to import the module. When unicodedata and stringprep aren't loaded yet, it still takes less than 0.5 msec. The stringprep and unicodedata modules are used by other modules, e.g. urllib parse. It's likely that any non-trivial program with network access has both imported already. $ python3 -m timeit -s "import sys" "import encodings.idna; sys.modules.pop('encodings.idna'); sys.modules.pop('stringprep'); sys.modules.pop('unicodedata')" 500 loops, best of 5: 488 usec per loop The IDNA codec performs additional verification of the input. You cannot replace it with a simple "try encode to ASCII" block: >>> ("a"*65).encode('idna') Traceback (most recent call last): File "/usr/lib64/python3.10/encodings/idna.py", line 167, in encode raise UnicodeError("label too long") UnicodeError: label too long The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeError: encoding with 'idna' codec failed (UnicodeError: label too long)

Please provide benchmarks and data for your claim that encodings.idna is a performance bottleneck.

encodings.idna is a simple, short module without state. On my system it takes about 0.15 msec to import the module. When unicodedata and stringprep aren't loaded yet, it still takes less than 0.5 msec. The stringprep and unicodedata modules are used by other modules, e.g. urllib parse. It's likely that any non-trivial program with network access has both imported already.

$ python3 -m timeit -s "import sys" "import encodings.idna; sys.modules.pop('encodings.idna'); sys.modules.pop('stringprep'); sys.modules.pop('unicodedata')"
500 loops, best of 5: 488 usec per loop


The IDNA codec performs additional verification of the input. You cannot replace it with a simple "try encode to ASCII" block:

>>> ("a"*65).encode('idna')
Traceback (most recent call last):
  File "/usr/lib64/python3.10/encodings/idna.py", line 167, in encode
    raise UnicodeError("label too long")
UnicodeError: label too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label too long)

History
Date	User	Action	Args
2022-02-14 16:32:14	christian.heimes	set	recipients: + christian.heimes, slingamn
2022-02-14 16:32:14	christian.heimes	set	messageid: <1644856334.75.0.0937374375451.issue46750@roundup.psfhosted.org>
2022-02-14 16:32:14	christian.heimes	link	issue46750 messages
2022-02-14 16:32:14	christian.heimes	create