Message 187537 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Tomoki.Imai
Recipients	Tomoki.Imai, ezio.melotti, pradyunsg, r.david.murray, terry.reedy
Date	2013-04-21.23:19:42
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1366586382.36.0.964994952434.issue17348@psf.upfronthosting.co.za>
In-reply-to

Content
Thanks. I noticed Terry used python3 to confirm this problem... I am Japanese, but using English environment. Here is my locale settings. And I'm using Linux. konomi:tomoki% locale LANG=en_US.utf8 LC_CTYPE=en_US.UTF-8 LC_NUMERIC="en_US.utf8" LC_TIME="en_US.utf8" LC_COLLATE="en_US.utf8" LC_MONETARY="en_US.utf8" LC_MESSAGES="en_US.utf8" LC_PAPER="en_US.utf8" LC_NAME="en_US.utf8" LC_ADDRESS="en_US.utf8" LC_TELEPHONE="en_US.utf8" LC_MEASUREMENT="en_US.utf8" LC_IDENTIFICATION="en_US.utf8" LC_ALL= All strings used internally should be unicode type. In Japan, many many charset is here.(cp932,euc-jp,...). And, they causes problems in Python2 without converting it to unicode type. Remember, unicode type and "utf-8" is not same. When I type into Tkinter's Entry and get Entry's value,it returned me unicode. And deleted code converts unicode to str type. They are unified in Python3.(unicode become str,and str become byte). So, these lines are not in Python3 codes. I typed these strings using "Input Method"(am using uim). https://code.google.com/p/uim/ But, I don't know how uim generate these characters.

Thanks.

I noticed Terry used python3 to confirm this problem...

I am Japanese, but using English environment.
Here is my locale settings. And I'm using Linux.
konomi:tomoki% locale                                    
LANG=en_US.utf8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=

All strings used internally should be unicode type.
In Japan, many many charset is here.(cp932,euc-jp,...).
And, they causes problems in Python2 without converting it to unicode type.
Remember, unicode type and "utf-8" is not same.

When I type into Tkinter's Entry and get Entry's value,it returned me unicode.
And deleted code converts unicode to str type.
They are unified in Python3.(unicode become str,and str become byte).
So, these lines are not in Python3 codes.

I typed these strings using "Input Method"(am using uim).
https://code.google.com/p/uim/
But, I don't know how uim generate these characters.

History
Date	User	Action	Args
2013-04-21 23:19:42	Tomoki.Imai	set	recipients: + Tomoki.Imai, terry.reedy, ezio.melotti, r.david.murray, pradyunsg
2013-04-21 23:19:42	Tomoki.Imai	set	messageid: <1366586382.36.0.964994952434.issue17348@psf.upfronthosting.co.za>
2013-04-21 23:19:42	Tomoki.Imai	link	issue17348 messages
2013-04-21 23:19:42	Tomoki.Imai	create