Message390942
> P.S. No problems with Python 3.8.5 and Ubuntu 20.04.2 LTS.
The issue is that the line length is limited to BUFSIZ, which ends up splitting the UTF-8 sequence b'\xe2\x96\x91'. BUFSIZ is only 512 bytes in Windows. It's 8192 bytes in Linux, in which case you need a line that's 16 times longer in order to reproduce the error. For example:
$ stat -c "%s" test.py
8194
$ python3.9 test.py
SyntaxError: Non-UTF-8 code starting with '\xe2' in file
/home/someone/test.py on line 1, but no encoding declared; see
http://python.org/dev/peps/pep-0263/ for details
This has been fixed in a rewrite of the tokenizer (bpo-25643), for which the PR was recently merged into the main branch for 3.10a7+.
Maybe a minimal backport to keep reading up to "\n" can be applied to 3.8 and 3.9. |
|
Date |
User |
Action |
Args |
2021-04-13 09:37:26 | eryksun | set | recipients:
+ eryksun, terry.reedy, serhiy.storchaka, Andrew Ushakov |
2021-04-13 09:37:26 | eryksun | set | messageid: <1618306646.43.0.0976888456209.issue38755@roundup.psfhosted.org> |
2021-04-13 09:37:26 | eryksun | link | issue38755 messages |
2021-04-13 09:37:26 | eryksun | create | |
|