Message 390942 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	eryksun
Recipients	Andrew Ushakov, eryksun, serhiy.storchaka, terry.reedy
Date	2021-04-13.09:37:26
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1618306646.43.0.0976888456209.issue38755@roundup.psfhosted.org>
In-reply-to

Content
> P.S. No problems with Python 3.8.5 and Ubuntu 20.04.2 LTS. The issue is that the line length is limited to BUFSIZ, which ends up splitting the UTF-8 sequence b'\xe2\x96\x91'. BUFSIZ is only 512 bytes in Windows. It's 8192 bytes in Linux, in which case you need a line that's 16 times longer in order to reproduce the error. For example: $ stat -c "%s" test.py 8194 $ python3.9 test.py SyntaxError: Non-UTF-8 code starting with '\xe2' in file /home/someone/test.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details This has been fixed in a rewrite of the tokenizer (bpo-25643), for which the PR was recently merged into the main branch for 3.10a7+. Maybe a minimal backport to keep reading up to "\n" can be applied to 3.8 and 3.9.

> P.S. No problems with Python 3.8.5 and Ubuntu 20.04.2 LTS.

The issue is that the line length is limited to BUFSIZ, which ends up splitting the UTF-8 sequence b'\xe2\x96\x91'. BUFSIZ is only 512 bytes in Windows. It's 8192 bytes in Linux, in which case you need a line that's 16 times longer in order to reproduce the error. For example:

    $ stat -c "%s" test.py 
    8194
    $ python3.9 test.py
    SyntaxError: Non-UTF-8 code starting with '\xe2' in file 
    /home/someone/test.py on line 1, but no encoding declared; see 
    http://python.org/dev/peps/pep-0263/ for details

This has been fixed in a rewrite of the tokenizer (bpo-25643), for which the PR was recently merged into the main branch for 3.10a7+.

Maybe a minimal backport to keep reading up to "\n" can be applied to 3.8 and 3.9.

History
Date	User	Action	Args
2021-04-13 09:37:26	eryksun	set	recipients: + eryksun, terry.reedy, serhiy.storchaka, Andrew Ushakov
2021-04-13 09:37:26	eryksun	set	messageid: <1618306646.43.0.0976888456209.issue38755@roundup.psfhosted.org>
2021-04-13 09:37:26	eryksun	link	issue38755 messages
2021-04-13 09:37:26	eryksun	create