Author vstinner
Recipients eric.araujo, tarek, vstinner
Date 2010-08-10.17:51:36
SpamBayes Score 6.24744e-11
Marked as misclassified No
Message-id <1281462699.26.0.0348702049008.issue9561@psf.upfronthosting.co.za>
In-reply-to
Content
While working on #9425 (support non-ascii characters in python directory name with ascii locale), I wrote a patch for distutils.file_util(): set encoding to utf-8 and errors to surrogateescape. See the patch with comments at:
http://codereview.appspot.com/1874048/patch/1/9

(the patch is not enough, it should also patch *all* functions reading files)

I discussed with takek who told me that it is documented that distutils files have to be utf-8. I didn't found the documentation. I checked read_manifest() in sdist command: in Python2 and Python3, it uses open(name) syntax. It means that Python2 uses the binary API (bytes), whereas Python3 uses the text API (unicode characters) and Python3 relies on open() (TextIOWrapper) heuristic to *guess* the file encoding.

I think that it will be better to specify the encoding in Python3, and maybe use the text API in Python2.

Anyway, before going futher (work on patches), I would like the approval of distutils maintainer(s).
History
Date User Action Args
2010-08-10 17:51:39vstinnersetrecipients: + vstinner, tarek, eric.araujo
2010-08-10 17:51:39vstinnersetmessageid: <1281462699.26.0.0348702049008.issue9561@psf.upfronthosting.co.za>
2010-08-10 17:51:37vstinnerlinkissue9561 messages
2010-08-10 17:51:36vstinnercreate