Author vstinner
Recipients eric.araujo, tarek, vstinner
Date 2010-08-13.00:18:52
SpamBayes Score 3.33067e-16
Marked as misclassified No
Message-id <201008130218.45060.victor.stinner@haypocalc.com>
In-reply-to <1281496888.52.0.395441307665.issue9561@psf.upfronthosting.co.za>
Content
> - PKG-INFO (METADATA in distutil2), that already uses a trick to support
> Unicode, but your change would replace it in a better way;

Which "trick"?

> - MANIFEST, which with your fix would gain the ability to handle non-ASCII
> paths, which is a feature or a bugfix depending on your point of view;

Wait. Non encodable bytes is a separated issue. I would like to work on the 
first problem: distutils in Python3 uses open() without encoding argument and 
so the encoding depends on the user's locale. Said differently: if you produce 
a file with distutils on a computer, you cannot be sure that the file can be 
read with the same version of Python on other computer (if the locale encoding 
is different). Eg. Windows uses mbcs encoding whereas utf-8 is the preferred 
encoding on Linux.

What is the encoding of the MANIFEST file?

> - .def files, used by the compilers for the C linking step; I don’t know if
> it’s appropriate to allow UTF-8 there.

I don't know these files.

> - RPM spec files, which use ASCII or UTF-8 according to
> http://en.opensuse.org/openSUSE:Specfile_guidelines#Specfile_Encoding but
> it’s not confirmed in
> http://www.rpm.org/max-rpm/s1-rpm-build-creating-spec-file.html (linked
> from the LSB site), so there’s no guarantee this works for all RPM
> platforms. This sort of platform-specific thing is the reason why RPM
> support has been removed in distutils2.

UTF-8 is a superset of ASCII. If you use utf-8 but only write ascii 
characters, your output file will be written to utf-8... but it will be also 
encoded to ascii. It's magical :-)

> - record and .pth files created by the install command.

.pth contain directory names which can be non-ASCII.

> I agree that there is something to be fixed, but I don’t know if they can
> be fixed in distutils. Unicode in PKG-INFO is unrelated to files, whereas
> there are files or directories in MANIFEST, spec, record and .pth.

You can use non-ASCII characters for other topics than filenames. Eg. in a 
description of a package :-)

> If this is going to be fixed, write_file should not use UTF-8 unconditionally
> but grow a keyword argument IMO, so that use cases requiring ASCII 
> continue to work.

As written before, UTF-8 is a superset of ASCII. If you read a file using utf-8 
encoding, you will be able to read ascii files. But if you use utf-8 and write 
non-ascii characters, old version of distutils using ascii or other encoding 
will not be able to read these files.

Anyway, I think that in most cases, all files only contain ASCII text. So it 
doesn't really matter.

About the keyword solution: yes, it would be a smooth way to fix this issue.

> When you say “patch *all* functions reading files”, I guess you mean all
> functions that read distutils files, i.e. MANIFEST and PKG-INFO.

I don't know distutils to answer to my own question.
History
Date User Action Args
2010-08-13 00:18:59vstinnersetrecipients: + vstinner, tarek, eric.araujo
2010-08-13 00:18:57vstinnerlinkissue9561 messages
2010-08-13 00:18:52vstinnercreate