Issue43662
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2021-03-29 20:32 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Messages (12) | |||
---|---|---|---|
msg389739 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-03-29 20:32 | |
https://buildbot.python.org/all/#/builders/244/builds/931 At commit 9b999479c0022edfc9835a8a1f06e046f3881048 (...) test_reindent_file_with_bad_encoding (test.test_tools.test_reindent.ReindentTests) ... FAIL (...) ====================================================================== FAIL: test_reindent_file_with_bad_encoding (test.test_tools.test_reindent.ReindentTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/dje/cpython-buildarea/3.x.edelsohn-rhel-z.lto-pgo/build/Lib/test/test_tools/test_reindent.py", line 29, in test_reindent_file_with_bad_encoding rc, out, err = assert_python_ok(self.script, '-r', bad_coding_path) File "/home/dje/cpython-buildarea/3.x.edelsohn-rhel-z.lto-pgo/build/Lib/test/support/script_helper.py", line 160, in assert_python_ok return _assert_python(True, *args, **env_vars) File "/home/dje/cpython-buildarea/3.x.edelsohn-rhel-z.lto-pgo/build/Lib/test/support/script_helper.py", line 145, in _assert_python res.fail(cmd_line) File "/home/dje/cpython-buildarea/3.x.edelsohn-rhel-z.lto-pgo/build/Lib/test/support/script_helper.py", line 72, in fail raise AssertionError("Process return code is %d\n" AssertionError: Process return code is 1 command line: ['/home/dje/cpython-buildarea/3.x.edelsohn-rhel-z.lto-pgo/build/python', '-X', 'faulthandler', '-I', '/home/dje/cpython-buildarea/3.x.edelsohn-rhel-z.lto-pgo/build/Tools/scripts/reindent.py', '-r', '/home/dje/cpython-buildarea/3.x.edelsohn-rhel-z.lto-pgo/build/Lib/test/bad_coding.py'] stdout: --- --- stderr: --- SyntaxError: encoding problem: encoding --- Can it be related to the following change? commit 261a452a1300eeeae1428ffd6e6623329c085e2c Author: Pablo Galindo <Pablogsal@gmail.com> Date: Sun Mar 28 23:48:05 2021 +0100 bpo-25643: Refactor the C tokenizer into smaller, logical units (GH-25050) |
|||
msg389742 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-03-29 20:37 | |
Oh. Or maybe it's related to: commit 4827483f47906fecee6b5d9097df2a69a293a85c Author: Inada Naoki <songofacandy@gmail.com> Date: Mon Mar 29 12:28:14 2021 +0900 bpo-43510: Implement PEP 597 opt-in EncodingWarning. (GH-19481) See [PEP 597](https://www.python.org/dev/peps/pep-0597/). * Add `-X warn_default_encoding` and `PYTHONWARNDEFAULTENCODING`. * Add EncodingWarning * Add io.text_encoding() * open(), TextIOWrapper() emits EncodingWarning when encoding is omitted and warn_default_encoding is enabled. * _pyio.TextIOWrapper() uses UTF-8 as fallback default encoding used when failed to import locale module. (used during building Python) * bz2, configparser, gzip, lzma, pathlib, tempfile modules use io.text_encoding(). * What's new entry |
|||
msg389747 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-03-29 20:40 | |
> https://buildbot.python.org/all/#/builders/244/builds/931 test.pythoninfo: config[filesystem_encoding]: 'utf-8' config[filesystem_errors]: 'surrogateescape' config[stdio_encoding]: 'utf-8' config[stdio_errors]: 'strict' config[use_environment]: 1 config[warn_default_encoding]: 0 locale.encoding: UTF-8 os.environ[LANG]: en_US.UTF-8 os.uname: posix.uname_result(sysname='Linux', nodename='ztcpip3.pok.ibm.com', release='3.10.0-1160.11.1.el7.s390x', version='#1 SMP Mon Nov 30 13:07:00 EST 2020', machine='s390x') platform.libc_ver: glibc 2.17 platform.platform: Linux-3.10.0-1160.11.1.el7.s390x-s390x-with-glibc2.17 pre_config[coerce_c_locale]: 0 pre_config[coerce_c_locale_warn]: 0 pre_config[configure_locale]: 1 pre_config[isolated]: 0 pre_config[utf8_mode]: 0 sys.filesystem_encoding: utf-8/surrogateescape sys.stderr.encoding: utf-8/backslashreplace sys.stdin.encoding: utf-8/strict sys.stdout.encoding: utf-8/strict sys.version: 3.10.0a6+ (heads/master:9b99947, Mar 29 2021, 08:53:44) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] sysconfig[CONFIG_ARGS]: '--prefix' '/home/dje/cpython-buildarea/3.x.edelsohn-rhel-z.lto-pgo/build/target' '--with-lto' '--enable-optimizations' sysconfig[PY_CFLAGS]: -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall sysconfig[PY_CFLAGS_NODIST]: -flto -fuse-linker-plugin -ffat-lto-objects -flto-partition=none -g -std=c99 -Wextra -Wno-unused-result -Wno-unused-parameter -Wno-missing-field-initializers -Werror=implicit-function-declaration -fvisibility=hidden -fprofile-use -fprofile-correction -I./Include/internal sysconfig[PY_CORE_LDFLAGS]: -flto -fuse-linker-plugin -ffat-lto-objects -flto-partition=none -g sysconfig[PY_LDFLAGS_NODIST]: -flto -fuse-linker-plugin -ffat-lto-objects -flto-partition=none -g sysconfig[Py_DEBUG]: 0 |
|||
msg389751 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-03-29 20:56 | |
> SyntaxError: encoding problem: encoding This "encoding problem: %s" error message comes from check_coding_spec() of Parser/tokenizer.c. The "%s" argument is the cs variable which is initialized by get_coding_spec(). test_tools.test_reindent_file_with_bad_encoding() uses Lib/test/bad_coding.py which contains a single line: # -*- coding: uft-8 -*- The expected encoding name is "uft-8", not "encoding". |
|||
msg389752 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-03-29 21:02 | |
test_tools.test_reindent_file_with_bad_encoding() runs Tools/scripts/reindent.py. The check() function of this script calls: with open(file, 'rb') as f: try: encoding, _ = tokenize.detect_encoding(f.readline) except SyntaxError as se: errprint("%s: SyntaxError: %s" % (file, str(se))) But I don't think that the buildbot reached this line since the stderr message doesn't start with the input filename. For example, locally, I get the expected error: $ ./python Tools/scripts/reindent.py -r Lib/test/bad_coding.py; echo $? Lib/test/bad_coding.py: SyntaxError: unknown encoding for 'Lib/test/bad_coding.py': uft-8 0 |
|||
msg389753 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-03-29 21:07 | |
Oh. The failure is random: * 934 green * 933 red: test_reindent_file_with_bad_encoding failed * 932 green * 931 red: test_reindent_file_with_bad_encoding failed * 930 green * 929 red: test_reindent_file_with_bad_encoding failed * 928 green * (... older builds are all green ...) * 775 orange * 774 green * (... more green builds ...) This buildbot uses PGO+LTO optimization on RHEL7 with "gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)". Can it be a compiler issue? Are other buildbots affected? |
|||
msg389756 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-03-29 21:17 | |
We have 4 buildbot workers running RHEL7 and using LTO+PGO optimizations: aarch64, amd64, ppc64le, s390x. I saw random failures on amd64 and s390x. amd64 failed builds: * 910: test_reindent_file_with_bad_encoding() failed * 911: test_reindent_file_with_bad_encoding() failed * 914: test_reindent_file_with_bad_encoding() failed |
|||
msg389757 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-03-29 21:20 | |
AMD64 RHEL7 LTO 3.x: builds 896 and 900 failed with test_reindent_file_with_bad_encoding(). This worker only uses LTO, it doesn't use PGO. |
|||
msg389758 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-03-29 21:24 | |
s390x RHEL7 LTO 3.x: builds 921, 924 and 925 failed with test_reindent_file_with_bad_encoding(). |
|||
msg389759 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-03-29 21:43 | |
vstinner@python-builder-rhel7$ echo|PYTHONMALLOC=malloc valgrind ./python Tools/scripts/reindent.py ==26374== Memcheck, a memory error detector ==26374== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==26374== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==26374== Command: ./python Tools/scripts/reindent.py ==26374== ==26374== Conditional jump or move depends on uninitialised value(s) ==26374== at 0x4C305ED: __memcmp_sse4_1 (vg_replace_strmem.c:1112) ==26374== by 0x5D0BCF: get_coding_spec.100964 (tokenizer.c:165) ==26374== by 0x5D4C1B: check_coding_spec.part.6.100980 (tokenizer.c:214) ==26374== by 0x5D213A: check_coding_spec (tokenizer.c:212) ==26374== by 0x5D213A: tok_underflow_file.101007 (tokenizer.c:966) ==26374== by 0x5D248D: tok_nextc.101010 (tokenizer.c:1031) ==26374== by 0x5D2C80: tok_get.101023 (tokenizer.c:1213) ==26374== by 0x5D4632: PyTokenizer_Get (tokenizer.c:1872) ==26374== by 0x648E4C: _PyPegen_fill_token (pegen.c:633) ==26374== by 0x6494D9: _PyPegen_expect_token (pegen.c:832) ==26374== by 0x667497: _tmp_15_rule.137241 (parser.c:19552) ==26374== by 0x649488: _PyPegen_lookahead (pegen.c:823) ==26374== by 0x64EC04: compound_stmt_rule.138437 (parser.c:2008) ==26374== ==26374== Conditional jump or move depends on uninitialised value(s) ==26374== at 0x5D0BD2: get_coding_spec.100964 (tokenizer.c:165) ==26374== by 0x5D4C1B: check_coding_spec.part.6.100980 (tokenizer.c:214) ==26374== by 0x5D213A: check_coding_spec (tokenizer.c:212) ==26374== by 0x5D213A: tok_underflow_file.101007 (tokenizer.c:966) ==26374== by 0x5D248D: tok_nextc.101010 (tokenizer.c:1031) ==26374== by 0x5D2C80: tok_get.101023 (tokenizer.c:1213) ==26374== by 0x5D4632: PyTokenizer_Get (tokenizer.c:1872) ==26374== by 0x648E4C: _PyPegen_fill_token (pegen.c:633) ==26374== by 0x6494D9: _PyPegen_expect_token (pegen.c:832) ==26374== by 0x667497: _tmp_15_rule.137241 (parser.c:19552) ==26374== by 0x649488: _PyPegen_lookahead (pegen.c:823) ==26374== by 0x64EC04: compound_stmt_rule.138437 (parser.c:2008) ==26374== by 0x64DE4A: statement_rule.138374 (parser.c:1365) ==26374== ==26374== ==26374== HEAP SUMMARY: ==26374== in use at exit: 406,507 bytes in 4,293 blocks ==26374== total heap usage: 63,558 allocs, 59,265 frees, 9,156,496 bytes allocated ==26374== ==26374== LEAK SUMMARY: ==26374== definitely lost: 0 bytes in 0 blocks ==26374== indirectly lost: 0 bytes in 0 blocks ==26374== possibly lost: 390,522 bytes in 4,213 blocks ==26374== still reachable: 15,985 bytes in 80 blocks ==26374== suppressed: 0 bytes in 0 blocks ==26374== Rerun with --leak-check=full to see details of leaked memory ==26374== ==26374== Use --track-origins=yes to see where uninitialised values come from ==26374== For lists of detected and suppressed errors, rerun with: -s ==26374== ERROR SUMMARY: 16322 errors from 2 contexts (suppressed: 0 from 0) |
|||
msg389763 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-03-29 22:25 | |
It's a buffer overflow, or at least a crash related to uninitialized bytes. See: https://github.com/python/cpython/pull/25080#issuecomment-809752737 |
|||
msg389793 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-03-30 07:12 | |
This issue should be fixed by: commit 92a02c1f7e2dcdc62913a4236589e7e5d96172b9 Author: Pablo Galindo <Pablogsal@gmail.com> Date: Tue Mar 30 00:24:49 2021 +0100 Fix tokenizer error when raw decoding null bytes (GH-25080) The fix is the usage of strlen() instead of "tok->end - tok->cur" to compute the line length. > https://buildbot.python.org/all/#/builders/244/builds/931 The latest 6 builds are successful. I close the issue. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:43 | admin | set | github: 87828 |
2021-03-30 07:12:48 | vstinner | set | status: open -> closed resolution: fixed messages: + msg389793 stage: resolved |
2021-03-29 22:25:43 | vstinner | set | messages: + msg389763 |
2021-03-29 21:43:18 | vstinner | set | messages: + msg389759 |
2021-03-29 21:24:42 | vstinner | set | messages:
+ msg389758 title: test_tools: test_reindent_file_with_bad_encoding() fails RHEL7 with LTO -> test_tools: test_reindent_file_with_bad_encoding() fails RHEL7 on x86-64 and s390x with GCC 4.8.5 and LTO |
2021-03-29 21:20:23 | vstinner | set | messages:
+ msg389757 title: test_tools: test_reindent_file_with_bad_encoding() fails on s390x RHEL7 LTO + PGO 3.x -> test_tools: test_reindent_file_with_bad_encoding() fails RHEL7 with LTO |
2021-03-29 21:17:00 | vstinner | set | messages: + msg389756 |
2021-03-29 21:07:27 | vstinner | set | messages: + msg389753 |
2021-03-29 21:02:59 | vstinner | set | messages: + msg389752 |
2021-03-29 20:56:29 | vstinner | set | messages: + msg389751 |
2021-03-29 20:40:44 | vstinner | set | messages: + msg389747 |
2021-03-29 20:37:08 | vstinner | set | nosy:
+ methane messages: + msg389742 |
2021-03-29 20:32:53 | vstinner | create |