Message 86312 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rg3
Recipients	rg3
Date	2009-04-22.18:20:42
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<1240424446.58.0.284100987299.issue5815@psf.upfronthosting.co.za>
In-reply-to

Content
A recent issue with one of my programs has shown that locale.getdefaultlocale() does not handle correctly a corner case. The issue URL is this one: http://bitbucket.org/rg3/youtube-dl/issue/7/ Essentially, some users have LANG set to something like es_CA.UTF-8@valencia. In that case, locale.getdefaultlocale() returns, as the encoding, the string "utf_8_valencia", which cannot be used as an argument to the string encode() function. The obvious correct encoding in this case is UTF-8. I have traced the problem and it seems that it could be fixed by the attached patch. It checks if the encoding, at that point, contains the '@' symbol and, in that case, removes everything starting at that point, leaving only "UTF-8". I am not sure if this patch or a similar one should be applied to other Python versions. My system has Python 2.5.2 and that's what I have patched. Explanation as to why I put the code there: * The simple case, es_CA.UTF-8 goes through that point too and enters the "if". * I wanted to remove what goes after the '@' symbol at that point, so it either needed to be removed before the call to the normalizing function or inside the normalization. * As this is not what I would consider a normalization, I put the code before the function call. Thanks for your hard work. I hope my patch is valid. Regards.

A recent issue with one of my programs has shown that
locale.getdefaultlocale() does not handle correctly a corner case. The
issue URL is this one:

http://bitbucket.org/rg3/youtube-dl/issue/7/

Essentially, some users have LANG set to something like
es_CA.UTF-8@valencia. In that case, locale.getdefaultlocale() returns,
as the encoding, the string "utf_8_valencia", which cannot be used as an
argument to the string encode() function. The obvious correct encoding
in this case is UTF-8.

I have traced the problem and it seems that it could be fixed by the
attached patch. It checks if the encoding, at that point, contains the
'@' symbol and, in that case, removes everything starting at that point,
leaving only "UTF-8".

I am not sure if this patch or a similar one should be applied to other
Python versions. My system has Python 2.5.2 and that's what I have patched.

Explanation as to why I put the code there:

* The simple case, es_CA.UTF-8 goes through that point too and enters
the "if".
* I wanted to remove what goes after the '@' symbol at that point, so it
either needed to be removed before the call to the normalizing function
or inside the normalization.
* As this is not what I would consider a normalization, I put the code
before the function call.

Thanks for your hard work. I hope my patch is valid.

Regards.

History
Date	User	Action	Args
2009-04-22 18:20:46	rg3	set	recipients: + rg3
2009-04-22 18:20:46	rg3	set	messageid: <1240424446.58.0.284100987299.issue5815@psf.upfronthosting.co.za>
2009-04-22 18:20:44	rg3	link	issue5815 messages
2009-04-22 18:20:43	rg3	create