Author a.badger
Recipients a.badger, dstufft, eric.araujo
Date 2019-05-21.17:33:06
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1558459986.87.0.362886106344.issue36998@roundup.psfhosted.org>
In-reply-to
Content
An sdist may contain files whose names are undecodable in the current locale.  For instance, the sdist might include some files for testing whose filenames are undecodable because that's the format of the input for that application.

Currently, trying to create the sdist fails with output similar to this:

Traceback (most recent call last):
  File "setup.py", line 330, in <module>
    main()
  File "setup.py", line 325, in main
    setup(**setup_params)
  File "/home/badger/.local/lib/python3.5/site-packages/setuptools/__init__.py", line 145, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib/python3.5/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/lib/python3.5/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "setup.py", line 137, in run
    SDist.run(self)
  File "/usr/lib/python3.5/distutils/command/sdist.py", line 158, in run
    self.get_file_list()
  File "/usr/lib/python3.5/distutils/command/sdist.py", line 214, in get_file_list
    self.write_manifest()
  File "/usr/lib/python3.5/distutils/command/sdist.py", line 362, in write_manifest
    "writing manifest file '%s'" % self.manifest)
  File "/usr/lib/python3.5/distutils/cmd.py", line 336, in execute
    util.execute(func, args, msg, dry_run=self.dry_run)
  File "/usr/lib/python3.5/distutils/util.py", line 301, in execute
    func(*args)
  File "/usr/lib/python3.5/distutils/file_util.py", line 236, in write_file
    f.write(line + "\n")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 45-46: ordinal not in range(128)

(I replicated the failure case by setting my locale to POSIX and using a standard utf-8 filename but this also applies to having a filename that is not actually text in any locale... as I said, filenames used for testing can run the gamut of odd choices).

This traceback is interesting as it occurs during writing of the MANIFEST.  That shows that the undecodable file is read in correctly.  It's only when writing the file that it fails.  Some further debugging showed me that the filename is read in using the surrogateescape error handler.  So we can round trip the filename by using the surrogateescase error handler when writing it out.

I tested making the following change:

-    f = open(filename, "w")
+    f = open(filename, "w", errors="surrogateescape")

and sure enough, the sdist is now created correctly.

I'll submit a PR to fix this.
History
Date User Action Args
2019-05-21 17:33:06a.badgersetrecipients: + a.badger, eric.araujo, dstufft
2019-05-21 17:33:06a.badgersetmessageid: <1558459986.87.0.362886106344.issue36998@roundup.psfhosted.org>
2019-05-21 17:33:06a.badgerlinkissue36998 messages
2019-05-21 17:33:06a.badgercreate