This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author loewis
Recipients lemburg, loewis, tarek
Date 2008-04-07.20:07:40
SpamBayes Score 0.00194291
Marked as misclassified No
Message-id <47FA7F0A.3060903@v.loewis.de>
In-reply-to <1207597778.66.0.180924076032.issue2562@psf.upfronthosting.co.za>
Content
> Agreed, but any change will target the package authors who can easily
> upgrade their packages to use Unicode for e.g. names.

They can't: that would break their 2.5-and-earlier compatibility.

> If the change were to address distutils users, we'd have to be a lot
> more careful.

We do address distutils users: what else? Why should we be more careful?

> In any case, if UTF-8 is the defacto standard used in older packages,
> then we should probably use that as fallback solution if the ASCII
> assumption doesn't work out:
> 
> try:
>     value = unicode(value)
> except UnicodeDecodeError:
>     value = unicode(value, 'utf-8')
> value = value.encode('utf-8')

For writing the metadata, we don't need to make any assumptions. We
can just write the bytes as-is. This is how distutils has behaved
for many releases now, and this is how users have been using it.

Of course, we (probably) agree that this is conceptually wrong, as
we won't be able to know what the encoding of the metadata file is,
and we (probably) also agree that the metadata should have the
fixed encoding of UTF-8. However, I don't think we should deliberately
break packages before 3.0 (even if they chose to use some other
encoding); instead, such packages will silently start doing the
right thing with 3.0, when their strings become Unicode strings.
History
Date User Action Args
2008-04-07 20:07:41loewissetspambayes_score: 0.00194291 -> 0.00194291
recipients: + loewis, lemburg, tarek
2008-04-07 20:07:40loewislinkissue2562 messages
2008-04-07 20:07:40loewiscreate