Issue45053
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2021-08-30 18:22 by sobolevn, last changed 2022-04-11 14:59 by admin.
Messages (8) | |||
---|---|---|---|
msg400648 - (view) | Author: Nikita Sobolev (sobolevn) * | Date: 2021-08-30 18:22 | |
While working on https://github.com/python/cpython/pull/28060 we've noticed that `test.test_tools.test_md5sum.MD5SumTests.test_checksum_fodder` fails on Windows: ``` ====================================================================== FAIL: test_checksum_fodder (test.test_tools.test_md5sum.MD5SumTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "D:\a\cpython\cpython\lib\test\test_tools\test_md5sum.py", line 41, in test_checksum_fodder self.assertIn(part.encode(), out) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: b'@test_1772_tmp\xc3\xa6' not found in b'd38dae2eb1ab346a292ef6850f9e1a0d @test_1772_tmp\xe6\\md5sum.fodder\r\n' ``` For now it is ignored. Related issue: https://bugs.python.org/issue45042 |
|||
msg400649 - (view) | Author: Nikita Sobolev (sobolevn) * | Date: 2021-08-30 18:23 | |
I would love to work on this issue :) |
|||
msg400656 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2021-08-30 18:49 | |
Test is failing because TESTFN contains now non-ASCII characters. The path is written to stdout using the default stdout encoding on Windows (like cp1252), but test searches the path encoded with UTF-8. This test should fail also on other platforms with non-UTF-8 locale. The simplest way to "fix" the test is using TESTFN_ASCII instead of TESTFN. But there is also an issue in the script itself. It fails or produces a mojibake when the filesystem encoding and the stdout encoding do not match. There are similar issues in other scripts which output file names. |
|||
msg400785 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-08-31 21:52 | |
> But there is also an issue in the script itself. It fails or produces a mojibake when the filesystem encoding and the stdout encoding do not match. I don't know Tools/scripts/md5sum.py. Can you show an example which currently fails? |
|||
msg400812 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2021-09-01 06:28 | |
$ touch тест $ ./python Tools/scripts/md5sum.py тест d41d8cd98f00b204e9800998ecf8427e тест $ LC_ALL=uk_UA.koi8u PYTHONIOENCODING=koi8-u ./python Tools/scripts/md5sum.py тест d41d8cd98f00b204e9800998ecf8427e тест $ LC_ALL=uk_UA.koi8u PYTHONIOENCODING=utf-8 ./python Tools/scripts/md5sum.py тест d41d8cd98f00b204e9800998ecf8427e я┌п╣я│я┌ $ PYTHONIOENCODING=koi8-u ./python Tools/scripts/md5sum.py тест d41d8cd98f00b204e9800998ecf8427e ���� $ PYTHONIOENCODING=latin-1 ./python Tools/scripts/md5sum.py тест Traceback (most recent call last): File "/home/serhiy/py/cpython/Tools/scripts/md5sum.py", line 93, in <module> sys.exit(main(sys.argv[1:], sys.stdout)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/serhiy/py/cpython/Tools/scripts/md5sum.py", line 90, in main return sum(args, out) ^^^^^^^^^^^^^^ File "/home/serhiy/py/cpython/Tools/scripts/md5sum.py", line 39, in sum sts = printsum(f, out) or sts ^^^^^^^^^^^^^^^^ File "/home/serhiy/py/cpython/Tools/scripts/md5sum.py", line 53, in printsum sts = printsumfp(fp, filename, out) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/serhiy/py/cpython/Tools/scripts/md5sum.py", line 69, in printsumfp out.write('%s %s\n' % (m.hexdigest(), filename)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'latin-1' codec can't encode characters in position 33-36: ordinal not in range(256) |
|||
msg401016 - (view) | Author: Nikita Sobolev (sobolevn) * | Date: 2021-09-03 19:51 | |
Yes, it was encodings problem :) This line solved it (here: https://github.com/python/cpython/blob/6f8bc464e006f672d1aeafbfd7c774a40215dab2/Tools/scripts/md5sum.py#L69): ```python out.write('%s %s\n' % (m.hexdigest(), filename.encode( sys.getfilesystemencoding(), ).decode(sys.stdout.encoding))) ``` > The simplest way to "fix" the test is using TESTFN_ASCII instead of TESTFN. I haven't changed this, because right now it should work for non-ASCII symbols as well. I can even add an explicit ASCII test if needed. Shouldn't https://github.com/python/cpython/pull/28060 be merge before I submit a new PR, so we can be sure that test now works? In the current state it will be just ignored. |
|||
msg401038 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2021-09-04 07:08 | |
It will not work in all cases. For example if the stdio encoding is UTF-8 and the filesystem encoding is Latin1. Or the stdio encoding is CP1251 and the filesystem encoding is UTF-8. I am not also sure that it gives us the result which we want if it doesn't fail. It is a general and complex issue, and every program which writes file names to stdout is affected. For now I suggest just use TESTFN_ASCII instead of TESTFN. We will find better solution in future. I hesitate about merging PR 28060 because it can fail also on some non-Windows buildbots with uncommon locale settings. |
|||
msg406129 - (view) | Author: Andrei Kulakov (andrei.avk) * | Date: 2021-11-10 19:51 | |
This was fixed in https://github.com/python/cpython/commit/dd7b816ac87, perhaps this should be closed as fixed? It sounds like the general solution is beyond the scope of this issue and doesn't need to be tracked here. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:49 | admin | set | github: 89216 |
2021-11-10 19:51:57 | andrei.avk | set | nosy:
+ andrei.avk messages: + msg406129 |
2021-09-04 07:08:21 | serhiy.storchaka | set | messages: + msg401038 |
2021-09-03 19:51:55 | sobolevn | set | messages: + msg401016 |
2021-09-01 06:28:25 | serhiy.storchaka | set | messages: + msg400812 |
2021-08-31 21:52:56 | vstinner | set | messages: + msg400785 |
2021-08-30 18:49:54 | serhiy.storchaka | set | nosy:
+ vstinner, serhiy.storchaka, ezio.melotti messages: + msg400656 components: + Unicode |
2021-08-30 18:23:16 | sobolevn | set | messages: + msg400649 |
2021-08-30 18:22:59 | sobolevn | create |