Issue36998
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2019-05-21 17:33 by a.badger, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
test-case.tar.gz | a.badger, 2019-05-21 23:32 |
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 13467 | closed | a.badger, 2019-05-21 17:41 |
Messages (13) | |||
---|---|---|---|
msg343074 - (view) | Author: Toshio Kuratomi (a.badger) * | Date: 2019-05-21 17:33 | |
An sdist may contain files whose names are undecodable in the current locale. For instance, the sdist might include some files for testing whose filenames are undecodable because that's the format of the input for that application. Currently, trying to create the sdist fails with output similar to this: Traceback (most recent call last): File "setup.py", line 330, in <module> main() File "setup.py", line 325, in main setup(**setup_params) File "/home/badger/.local/lib/python3.5/site-packages/setuptools/__init__.py", line 145, in setup return distutils.core.setup(**attrs) File "/usr/lib/python3.5/distutils/core.py", line 148, in setup dist.run_commands() File "/usr/lib/python3.5/distutils/dist.py", line 955, in run_commands self.run_command(cmd) File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command cmd_obj.run() File "setup.py", line 137, in run SDist.run(self) File "/usr/lib/python3.5/distutils/command/sdist.py", line 158, in run self.get_file_list() File "/usr/lib/python3.5/distutils/command/sdist.py", line 214, in get_file_list self.write_manifest() File "/usr/lib/python3.5/distutils/command/sdist.py", line 362, in write_manifest "writing manifest file '%s'" % self.manifest) File "/usr/lib/python3.5/distutils/cmd.py", line 336, in execute util.execute(func, args, msg, dry_run=self.dry_run) File "/usr/lib/python3.5/distutils/util.py", line 301, in execute func(*args) File "/usr/lib/python3.5/distutils/file_util.py", line 236, in write_file f.write(line + "\n") UnicodeEncodeError: 'ascii' codec can't encode characters in position 45-46: ordinal not in range(128) (I replicated the failure case by setting my locale to POSIX and using a standard utf-8 filename but this also applies to having a filename that is not actually text in any locale... as I said, filenames used for testing can run the gamut of odd choices). This traceback is interesting as it occurs during writing of the MANIFEST. That shows that the undecodable file is read in correctly. It's only when writing the file that it fails. Some further debugging showed me that the filename is read in using the surrogateescape error handler. So we can round trip the filename by using the surrogateescase error handler when writing it out. I tested making the following change: - f = open(filename, "w") + f = open(filename, "w", errors="surrogateescape") and sure enough, the sdist is now created correctly. I'll submit a PR to fix this. |
|||
msg343101 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-05-21 21:40 | |
The traceback comes from Python 3.5. Python 3.7 now uses UTF-8 when the LC_CTYPE locale is "C" or "POSIX", thanks to C locale coercion (PEP 538) and UTF-8 Mode (PEP 540): https://vstinner.github.io/posix-locale.html https://vstinner.github.io/python37-new-utf8-mode.html Can you please retry your test using Python 3.7? I would prefer to not start to use errors="surrogateescape" everywhere, especially if the issue is already fixed. errors="surrogateescape" basically means: "I want to produce a file that other people will unable to read... --- def write_file (filename, contents): """Create a file with the specified name and write 'contents' (a sequence of strings without line terminators) to it. """ f = open(filename, "w") ... For me, the first obvious bug is that the encoding is not specified. I don't see how an application on a different computer is supposed to read a text file if the encoding is not well defined... |
|||
msg343102 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-05-21 21:41 | |
IMHO MANIFEST should always be encoded to UTF-8 on all platforms. It means always *decode* it from UTF-8. For backward compatibility, we might first try to decode it from UTF-8, but falls back on the locale encoding if the UTF-8 decoder raises a UnicodeDecodeError. |
|||
msg343107 - (view) | Author: Toshio Kuratomi (a.badger) * | Date: 2019-05-21 22:21 | |
I like the idea of defaulting to UTF-8 (although I think you'll have to build consensus as to whether that's the right thing to do here) but it won't handle the use case here. There's a need to handle files which are undecodable and encoding to utf-8 won't fix that. |
|||
msg343115 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-05-21 22:38 | |
> There's a need to handle files which are undecodable and encoding to utf-8 won't fix that. Are you talking about the initial issue, get_file_list() which fails on writing into the MANIFEST file? For me, a package should be "portable": it should work on most if not all platforms, and so use common characters which are encodable by most locale encodings, no? What is your use case? Package something on Linux and then install it on Linux? What is your locale? Did you try Python 3.7 or newer which speaks UTF-8 when the locale is not set? |
|||
msg343123 - (view) | Author: Toshio Kuratomi (a.badger) * | Date: 2019-05-21 22:58 | |
From my initial description: "An sdist may contain files whose names are undecodable in the current locale. For instance, the sdist might include some files for testing whose filenames are undecodable because that's the format of the input for that application." |
|||
msg343126 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-05-21 23:18 | |
Éric Araujo asked me to review https://github.com/python/cpython/pull/13467. I gave my feedback. I now remove myself from this issue, since I'm not sure that I would like to be involved in a packaging module (distutils) :-) I let others to take a decision on these topics ;-) |
|||
msg343127 - (view) | Author: Toshio Kuratomi (a.badger) * | Date: 2019-05-21 23:32 | |
Uploading a minimal test case. $ tar -xzvf test-case.tar.gz $ python3.7 setup.py sdist running sdist running check warning: sdist: standard file not found: should have one of README, README.txt, README.rst reading manifest template 'MANIFEST.in' writing manifest file 'MANIFEST' Traceback (most recent call last): File "setup.py", line 27, in <module> packages=['hello'], File "/usr/lib64/python3.7/distutils/core.py", line 148, in setup dist.run_commands() File "/usr/lib64/python3.7/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/usr/lib64/python3.7/distutils/command/sdist.py", line 152, in run self.get_file_list() File "/usr/lib64/python3.7/distutils/command/sdist.py", line 208, in get_file_list self.write_manifest() File "/usr/lib64/python3.7/distutils/command/sdist.py", line 390, in write_manifest "writing manifest file '%s'" % self.manifest) File "/usr/lib64/python3.7/distutils/cmd.py", line 335, in execute util.execute(func, args, msg, dry_run=self.dry_run) File "/usr/lib64/python3.7/distutils/util.py", line 291, in execute func(*args) File "/usr/lib64/python3.7/distutils/file_util.py", line 236, in write_file f.write(line + "\n") UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 11: surrogates not allowed With PR applied: $ ../../python setup.py sdist running sdist running check warning: sdist: standard file not found: should have one of README, README.txt, README.rst reading manifest template 'MANIFEST.in' writing manifest file 'MANIFEST' creating hello-1.0 creating hello-1.0/hello creating hello-1.0/tests creating hello-1.0/tests/data making hard links in hello-1.0... hard linking setup.py -> hello-1.0 hard linking hello/__init__.py -> hello-1.0/hello hard linking tests/test_cases.py -> hello-1.0/tests hard linking tests/data/1.bin -> hello-1.0/tests/data hard linking tests/data/\udcff.bin -> hello-1.0/tests/data Creating tar archive removing 'hello-1.0' (and everything under it) Making this minimal test case, though, I found that there's another error somewhere when MANIFEST has already been created (ie: the patched version works for the initial generation of MANIFEST but it doesn't work to *regenerate* the MANIFEST). Looking into that now. |
|||
msg343129 - (view) | Author: Toshio Kuratomi (a.badger) * | Date: 2019-05-21 23:39 | |
Okay, pushed a fix for regenerating the MANIFEST as well. |
|||
msg343132 - (view) | Author: Toshio Kuratomi (a.badger) * | Date: 2019-05-21 23:54 | |
Are the distutils unittests disabled? And if so is there a reason? I was looking to add test cases to my PR and found that I couldn't get them (or indeed any distutils unittests) to run when trying to only target the distutils unittests. From looking at the code my guess is that the Python test suite was ported to use the load_test protocol sometime after Python-3.2 but distutils was missed. I can submit a PR to change that unless there's a reason it is the way it is. |
|||
msg343135 - (view) | Author: Toshio Kuratomi (a.badger) * | Date: 2019-05-22 00:14 | |
Figured out the answer to my last question while looking into fixing it. The devguide documents both running tests via regrtest and running them via unittest test discovery. regrtest works: ./python -m test -v distutils.test But unittest doesn't: ./python -m unittest -v test.test_distutils ./python -m unittest -v distutils.test.test_file_utils # etc I'll submit a separate PR to get that working. |
|||
msg351920 - (view) | Author: Jason R. Coombs (jaraco) * | Date: 2019-09-11 15:03 | |
Thanks all for investigating and discussing this issue. I'm reviewing the PR, and I can't help but wonder, does this same issue exist when building with setuptools? Do you care about older Python versions? Would addressing these issues in setuptools address your needs? Additionally, the main use-case you describe is someone wishes to create an sdist referencing invalid filenames. That seems like an undesirable case. Could such a project not instead generate these invalid filenames on demand rather than materializing them in a distribution package? I'm also concerned that by altering `write_file`, we're applying a change for a specific use-case to a utility probably shared by several use-cases. Overall, I'm leaning -1 on this right now. Can you convince me this change is the best thing for the system? |
|||
msg386273 - (view) | Author: Steve Dower (steve.dower) * | Date: 2021-02-03 18:10 | |
Distutils is now deprecated (see PEP 632) and all tagged issues are being closed. From now until removal, only release blocking issues will be considered for distutils. If this issue does not relate to distutils, please remove the component and reopen it. If you believe it still requires a fix, most likely the issue should be re-reported at https://github.com/pypa/setuptools |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:15 | admin | set | github: 81179 |
2021-02-03 18:10:28 | steve.dower | set | status: open -> closed nosy: + steve.dower messages: + msg386273 resolution: out of date stage: patch review -> resolved |
2019-09-11 15:03:36 | jaraco | set | versions:
- Python 3.5, Python 3.6, Python 3.7 nosy: + jaraco messages: + msg351920 type: enhancement |
2019-05-22 00:14:24 | a.badger | set | messages: + msg343135 |
2019-05-22 00:03:03 | vstinner | set | nosy:
- vstinner |
2019-05-21 23:54:58 | a.badger | set | nosy:
+ vstinner messages: + msg343132 |
2019-05-21 23:43:21 | vstinner | set | nosy:
- vstinner |
2019-05-21 23:39:22 | a.badger | set | messages: + msg343129 |
2019-05-21 23:32:03 | a.badger | set | files:
+ test-case.tar.gz messages: + msg343127 |
2019-05-21 23:18:16 | vstinner | set | nosy:
vstinner, eric.araujo, a.badger, dstufft messages: + msg343126 |
2019-05-21 22:58:48 | a.badger | set | messages: + msg343123 |
2019-05-21 22:38:13 | vstinner | set | messages: + msg343115 |
2019-05-21 22:21:13 | a.badger | set | messages: + msg343107 |
2019-05-21 21:41:25 | vstinner | set | messages: + msg343102 |
2019-05-21 21:40:03 | vstinner | set | nosy:
+ vstinner messages: + msg343101 |
2019-05-21 17:41:00 | a.badger | set | keywords:
+ patch stage: patch review pull_requests: + pull_request13378 |
2019-05-21 17:33:06 | a.badger | create |