This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author a.badger
Recipients a.badger, eric.araujo, tarek, vstinner
Date 2010-09-13.15:37:04
SpamBayes Score 1.6104648e-08
Marked as misclassified No
Message-id <1284392227.01.0.266221361888.issue9561@psf.upfronthosting.co.za>
In-reply-to
Content
>>> - RPM spec files, which use ASCII or UTF-8 according to
>>> http://en.opensuse.org/openSUSE:Specfile_guidelines#Specfile_Encoding but
>>> it’s not confirmed in
>>> http://www.rpm.org/max-rpm/s1-rpm-build-creating-spec-file.html (linked
>>> from the LSB site)
>> UTF-8 is a superset of ASCII. If you use utf-8 but only write ascii
>> characters, your output file will be written to utf-8... but it will be also
>> encoded to ascii. It's magical :-)
>
> I know that, but it does not answer the question:  Is it okay for these files
> to use UTF-8?

rpm spec files are encoding agnostic similar to POSIX filesystems.  This causes no end of troubles for people writing python code that deals with python of course, as they cannot rely on the bytes that they are dealing with from one package to another to have the same encoding (Remember that things like dependency solvers have to compare the information from multiple packages to make their decisions).

Individual distributions will have different policies about encoding and the use of unicode in spec files to try and mitigate the problems.  For instance, Fedora specifies utf-8 in the spec files and additionally specifies that package names must be ascii.  (So if there's a package name: python-café, we would likely transcribe it as python-cafe when we made a package for it).

utf-8 is a good default for locales on POSIX systems so it's a good default for encoding spec files but I know there's some people out there who make their own packages that aren't utf-8.  I haven't checked but I also wouldn't be surprised if some Asian countries (where the bytes-per-character with utf-8 is high) have local distributions that use non-utf-8 encoding as well.  Whether either of these use cases needs to be catered to in distutils (when the support is going away in distutils2) I'll leave to someone else to decide.  My personal gut instinct is no but I'm not one of the people using a non-utf-8 locale.
History
Date User Action Args
2010-09-13 15:37:07a.badgersetrecipients: + a.badger, vstinner, tarek, eric.araujo
2010-09-13 15:37:07a.badgersetmessageid: <1284392227.01.0.266221361888.issue9561@psf.upfronthosting.co.za>
2010-09-13 15:37:05a.badgerlinkissue9561 messages
2010-09-13 15:37:04a.badgercreate