New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python setup.py sdist --formats tar* crashes if version is unicode #55847
Comments
i passed in a unicode value as version by accident, Traceback (most recent call last):
File "/home/ronny/.local/venvs/clean/bin/pysetup", line 7, in <module>
execfile(__file__)
File "/home/ronny/Projects/distutils2/distutils2/pysetup", line 5, in <module>
main()
File "/home/ronny/Projects/distutils2/distutils2/run.py", line 486, in main
return dispatcher()
File "/home/ronny/Projects/distutils2/distutils2/run.py", line 477, in __call__
return func(self, self.args)
File "/home/ronny/Projects/distutils2/distutils2/run.py", line 166, in _run
dist.run_command(cmd, dispatcher.command_options[cmd])
File "/home/ronny/Projects/distutils2/distutils2/dist.py", line 781, in run_command
cmd_obj.run()
File "/home/ronny/Projects/distutils2/distutils2/command/sdist.py", line 183, in run
self.make_distribution()
File "/home/ronny/Projects/distutils2/distutils2/command/sdist.py", line 327, in make_distribution
owner=self.owner, group=self.group)
File "/home/ronny/Projects/distutils2/distutils2/command/cmd.py", line 426, in make_archive
owner=owner, group=group)
File "/home/ronny/Projects/distutils2/distutils2/_backport/shutil.py", line 588, in make_archive
filename = func(base_name, base_dir, **kwargs)
File "/home/ronny/Projects/distutils2/distutils2/_backport/shutil.py", line 426, in _make_tarball
tar = tarfile.open(archive_name, 'w|%s' % tar_compression[compress])
File "/home/ronny/Projects/distutils2/distutils2/_backport/tarfile.py", line 1693, in open
_Stream(name, filemode, comptype, fileobj, bufsize),
File "/home/ronny/Projects/distutils2/distutils2/_backport/tarfile.py", line 434, in __init__
self._init_write_gz()
File "/home/ronny/Projects/distutils2/distutils2/_backport/tarfile.py", line 462, in _init_write_gz
self.__write(self.name + NUL)
File "/home/ronny/Projects/distutils2/distutils2/_backport/tarfile.py", line 478, in __write
self.buf += s
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1: ordinal not in range(128) note that i have no idea where the 0x8b is from, if i just pass the version trough str it works (which means something is wrong somewhere else, unicode just triggers it) |
What is the version? Can you also include the setup.cfg file? |
here the file that passed in the unicode string via hook |
actually its enough to have the version_hook set the version to u'0.0' |
Python 3.3 works with unicode ;), so we’ll try reproducing this later, when we have the 2.x backport. |
I have the same problem, using distutils (and not distutils2): Traceback (most recent call last):
File "./setup.py", line 60, in <module>
test_suite="creole.tests.run_all_tests",
File "/usr/lib/python2.7/distutils/core.py", line 152, in setup
dist.run_commands()
File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/home/jens/python2creole_env/local/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg/setuptools/command/sdist.py", line 147, in run
File "/usr/lib/python2.7/distutils/command/sdist.py", line 448, in make_distribution
owner=self.owner, group=self.group)
File "/usr/lib/python2.7/distutils/cmd.py", line 392, in make_archive
owner=owner, group=group)
File "/usr/lib/python2.7/distutils/archive_util.py", line 237, in make_archive
filename = func(base_name, base_dir, **kwargs)
File "/usr/lib/python2.7/distutils/archive_util.py", line 101, in make_tarball
tar = tarfile.open(archive_name, 'w|%s' % tar_compression[compress])
File "/usr/lib/python2.7/tarfile.py", line 1687, in open
_Stream(name, filemode, comptype, fileobj, bufsize),
File "/usr/lib/python2.7/tarfile.py", line 431, in __init__
self._init_write_gz()
File "/usr/lib/python2.7/tarfile.py", line 459, in _init_write_gz
self.__write(self.name + NUL)
File "/usr/lib/python2.7/tarfile.py", line 475, in __write
self.buf += s
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1: ordinal not in range(128) The Problem seems that tarfile._Stream() can't handle 'name' as unicode. With this changes, it works: class _Stream:
...
def __init__(self, name, mode, comptype, fileobj, bufsize):
...
self.name = str(name) or ""
++++ + Don't know it this is related to the usage of: from __future__ import unicode_literals ? |
Does someone want to write a test for this? We have examples of creating tarball sdists in Lib/distutils/tests/test_sdist.py, one would just need to copy one example and use a version with a unicode version. |
I can’t reproduce with pysetup or distutils 3.x. |
Jens:
|
I'm getting this exact error when I run "python setup.py sdist", no matter what I do. Even if I just create a new project, type "1.0.0" for version, type "a" in all the other fields, and say "no" to every question; then run "pysetup generate-setup" and "python setup.py sdist". |
David: As I said before, I agree this is a bug. I’m working on many things right now, so if someone volunteers to write a test and possibly a fix for this, it would help. We have examples of creating tarball sdists in Lib/distutils/tests/test_sdist.py, one would just need to copy an example and pass version=u'1.0'. |
Here's a test for the bug. |
One way to fix the symptom (maybe not the correct way) would be to edit tarfile._Stream._init_write_gz and change the line that reads tarfile is building up an encoded stream of bytes, and whatever self.name is it needs to be encoded before being inserted into the stream. I'm not positive UTF-8 is right, and maybe it should only convert if isinstance(self.name, unicode). |
|
First, the term 'gztar' doesn't appear in this ticket, and since the issue only applies when sdist --format gztar, I mention that here. Also, bpo-8396 suggests encoding using sys.getfilesystemencoding(). |
|
This error is also encountered if the package name is unicode. The error can be simply reproduced with this command: python -c "from setuptools import setup; setup(name=u'foo')" sdist --formats gztar The error also occurs with the bdist command, and probably others. |
I meant to paste the repro with distutils.core: python -c "from distutils.core import setup; setup(name=u'foo')" sdist --formats gztar |
I believe the underlying cause of this issue is bpo-13639. |
I've created a repo to continue this work. I've integrated David's patch (thanks). It's not obvious to me what the encoding should be. Python and the tarfile module can accept unicode filenames. It seems that only the gzip part of tarfile fails if a unicode name is passed. Encoding to 'utf-8' or the default file system encoding doesn't seem right (as the characters end up getting stored in the gzip archive itself). Additionally, encoding as 'utf-8' would cause the file to be created with a utf-8 filename, which would be undesirable. So in the current repo, I've created a check to convert the filename to ASCII. If it can be converted to ASCII, it is converted and passed through to tarfile. This should address the majority of users who have thus encountered this issue. For those who wish to use non-ascii characters in project names or versions, one will have to use Python 3 or wait until bpo-13639 is fixed. Please review the enclosed patch. Since one test fails (and is known to fail), should it omitted? Can it remain but be marked as "expected to fail"? |
Is there a good reason why the tarfile mode that is used is "w|gz"? It seems to me that this is not necessary, "w:gz" should be enough. "w|gz" is for special operations only (see the tarfile docs). |
Lars: I will check the history to see if there is a reason (there is probably none) and apply your patch, thank you. Jason: Thanks for the input.
|
The characters are being stored in the gzip archive as part of the gzip header. The comment in the Python 3 trunk indicates the encoding should be iso-8859-1: https://bitbucket.org/mirror/cpython/src/f3041e7f535d/Lib/tarfile.py#cl-475 My point is that the file system encoding is not relevant here. Because the name is being stored in a gzip blob, it should be encoded according to gzip specs.
My concern here was that if we're encoding the string as utf-8 before passing to the __builtins__.open() call, Python might encode _that_ utf-8 string using the file system encoding and save the file that way (where the file is named with a utf-8 encoded string, not the unicode string intended). After further investigation, and based on the work that's been proposed, this is not a risk. |
Just for the record: The gzip format (defined in RFC 1952) allows storing the original filename (without the .gz suffix) in an additional field in the header (the FNAME field). Latin-1 (iso-8859-1) is required. It is ironic that this causes so much trouble, because it is never used. A gzip file without that field is prefectly valid. The gzip program for example stores the original filename by default but does not use it when decompressing unless it is explicitly told to do so with the -N/--name option. If no FNAME field is present in a gzipped file the gzip program just falls back on stripping the .gz suffix. |
Thanks to Lars for suggesting the fix, replacing 'w|gz' with 'w:gz'. I attempted this change in the latest revision of my fork (774933cf7775.diff). While this change does address the issue if a unicode string is passed which can be encoded using the default encoding. However, if a latin-1 string is passed or another multi-byte unicode character is passed, a UnicodeDecodeError still occurs (though now in gzip.py). Here's the test results and tracebacks: PS C:\Users\jaraco\projects\public\cpython> python .\lib\distutils\tests\test_archive_util.py ====================================================================== Traceback (most recent call last):
File ".\lib\distutils\tests\test_archive_util.py", line 305, in test_make_tarball_unicode_extended
self._make_tarball(u'のアーカイブ') # japanese for archive
File ".\lib\distutils\tests\test_archive_util.py", line 64, in _make_tarball
make_tarball(splitdrive(base_name)[1], '.')
File "C:\Users\jaraco\projects\public\cpython\Lib\distutils\archive_util.py",
line 101, in make_tarball
tar = tarfile.open(archive_name, 'w:%s' % tar_compression[compress])
File "C:\Users\jaraco\projects\public\cpython\Lib\tarfile.py", line 1676, in open
return func(name, filemode, fileobj, **kwargs)
File "C:\Users\jaraco\projects\public\cpython\Lib\tarfile.py", line 1724, in gzopen
gzip.GzipFile(name, mode, compresslevel, fileobj),
File "C:\Users\jaraco\projects\public\cpython\Lib\gzip.py", line 127, in __init__
self._write_gzip_header()
File "C:\Users\jaraco\projects\public\cpython\Lib\gzip.py", line 172, in _write_gzip_header
self.fileobj.write(fname + '\000')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128) ====================================================================== Traceback (most recent call last):
File ".\lib\distutils\tests\test_archive_util.py", line 297, in test_make_tarball_unicode_latin1
self._make_tarball(u'årchiv') # note this isn't a real word
File ".\lib\distutils\tests\test_archive_util.py", line 64, in _make_tarball
make_tarball(splitdrive(base_name)[1], '.')
File "C:\Users\jaraco\projects\public\cpython\Lib\distutils\archive_util.py",
line 101, in make_tarball
tar = tarfile.open(archive_name, 'w:%s' % tar_compression[compress])
File "C:\Users\jaraco\projects\public\cpython\Lib\tarfile.py", line 1676, in open
return func(name, filemode, fileobj, **kwargs)
File "C:\Users\jaraco\projects\public\cpython\Lib\tarfile.py", line 1724, in gzopen
gzip.GzipFile(name, mode, compresslevel, fileobj),
File "C:\Users\jaraco\projects\public\cpython\Lib\gzip.py", line 127, in __init__
self._write_gzip_header()
File "C:\Users\jaraco\projects\public\cpython\Lib\gzip.py", line 172, in _write_gzip_header
self.fileobj.write(fname + '\000')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 0:
ordinal not in range(128) Ran 12 tests in 0.058s FAILED (errors=2, skipped=3)
Traceback (most recent call last):
File ".\lib\distutils\tests\test_archive_util.py", line 311, in <module>
run_unittest(test_suite())
File "C:\Users\jaraco\projects\public\cpython\Lib\test\test_support.py", line
1094, in run_unittest
_run_suite(suite)
File "C:\Users\jaraco\projects\public\cpython\Lib\test\test_support.py", line
1077, in _run_suite
raise TestFailed(err)
test.test_support.TestFailed: multiple errors occurred |
I've captured the cause of the UnicodeEncodeErrors as bpo-13664. After rebasing the changes to include the fix for bpo-13639, I found that the tests were still failing until I also reverted the patch to call tarfile.open with 'w:gz'. Now all the new tests pass (with no other changes to the code). This latest patch only contains tests to capture the errors encountered. I plan to push this changeset and also port the test changes the default (Python 3.3) branch. |
New changeset dc1045d08bd8 by Jason R. Coombs in branch '2.7': |
New changeset f0fcb82a88e9 by Jason R. Coombs in branch 'default': |
Since the tests now pass, and the only changes were to the tests, I've pushed them to the master. And with that I'm marking this ticket as closed. |
f0fcb82a88e9 broke bots. See http://www.python.org/dev/buildbot/all/builders/x86%20Gentoo%203.x/builds/1374/steps/test/logs/stdio |
|
New changeset a7744f778646 by Jason R. Coombs in branch 'default': |
I've limited the scope of the patch to attempt to only test on those platforms that can actually create unicode-named files. I'll watch the buildbots to see if that corrects the failures (since I don't have the failing platforms available to me). |
New changeset 9b681e0c04ed by Jason R. Coombs in branch '2.7': |
The changes to the default branch seem to have cleaned up the test failures on most platforms (still waiting on the ARM results). So I've backported the test skips to the Python 2.7 branch as well. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: