Message 129255 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	belopolsky, ezio.melotti, georg.brandl, lemburg, mrabarnett, pitrou
Date	2011-02-24.09:20:37
SpamBayes Score	5.899295e-07
Marked as misclassified	No
Message-id	<4D6622E4.7010003@egenix.com>
In-reply-to	<1298513419.85.0.00455767376339.issue5902@psf.upfronthosting.co.za>

Content
Alexander Belopolsky wrote: > > Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: > >> Accepting all common forms for >> encoding names means that you can usually give Python an encoding name >> from, e.g. a HTML page, or any other file or system that specifies an >> encoding. > > I don't buy this argument. Running attached script on http://www.iana.org/assignments/character-sets shows that there are hundreds of registered charsets that are not accepted by python: > > $ ./python.exe iana.py\| wc -l > 413 > > Any serious HTML or XML processing software should be based on the IANA character-sets file rather than on the ad-hoc list of aliases that made it into encodings/aliases.py. Let's do a reality check: How often do you see requests for additions to the aliases we have in Python ? Perhaps one every year, if at all. We take great care not to add aliases that are not in common use or that do not have a proven track record of really being compatible to the codec in question. If you think we are missing some aliases, please open tickets for them, indicating why these should be added. If you really want complete IANA coverage, I suggest you create a normalization module which maps the IANA names to our names and upload it to PyPI.

Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
> 
>> Accepting all common forms for
>> encoding names means that you can usually give Python an encoding name
>> from, e.g. a HTML page, or any other file or system that specifies an
>> encoding.
> 
> I don't buy this argument.  Running attached script on http://www.iana.org/assignments/character-sets shows that there are hundreds of registered charsets that are not accepted by python:
> 
> $ ./python.exe iana.py| wc -l
>      413
> 
> Any serious HTML or XML processing software should be based on the IANA character-sets file rather than on the ad-hoc list of aliases that made it into encodings/aliases.py.

Let's do a reality check:

How often do you see requests for additions to the aliases we
have in Python ? Perhaps one every year, if at all.

We take great care not to add aliases that are not in common
use or that do not have a proven track record of really being
compatible to the codec in question.

If you think we are missing some aliases, please open tickets
for them, indicating why these should be added.

If you really want complete IANA coverage, I suggest you create
a normalization module which maps the IANA names to our names
and upload it to PyPI.

History
Date	User	Action	Args
2011-02-24 09:20:45	lemburg	set	recipients: + lemburg, georg.brandl, belopolsky, pitrou, ezio.melotti, mrabarnett
2011-02-24 09:20:37	lemburg	link	issue5902 messages
2011-02-24 09:20:37	lemburg	create