Message 31342 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	sgala
Recipients
Date	2007-02-25.22:27:40
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
re: consistent, my experience it is that python unicode handling is consistently stupid, doing almost always the wrong thing. It remembers me of the defaults of WordPerfect, that were always exactly the opposite of what the user wanted 99% of time. I hope python 3000 comes fast and stops that real pain. I love the language, but the way it handles unicode provokes hundreds of bugs. >Python could correctly find out your terminal >encoding, the Unicode string is automatically encoded in that encoding. > >If you output to a file, Python does not know which encoding you want to >have, so all Unicode strings are converted to ascii only. >>> sys.getfilesystemencoding() 'UTF-8' so python is really dumb if print does not know my filesystemencoding, but knows my terminal encoding. I though breaking the least surprising behaviour was not considered pythonic, and now you tell me that having a program running on console but issuing an exception when redirected is intended. I would prefer an exception in both cases. Or, even better, using sys.getfilesystemencoding(), or allowing me to set defaultencoding() >Please direct further questions to the Python mailing list or newsgroup. I would if I didn't consider this behaviour a bug, and a serious one. >The basic rule when handling Unicode is: use Unicode everywhere inside the >program, and byte strings for input and output. >So, your code is exactly the other way round: it takes a byte string, >decodes it to unicode and then prints it. > >You should do it the other way: use Unicode literals in your code, and >when y(ou write something to a file, encode them in utf-8. Do you mean that I need to say print unicode(whatever).encode('utf8'), like: >>> a = unicode('\xc3\xa1','utf8') # instead of 'á', easy to read and understand, even in files encoded as utf8. Assume this is a literal or input ... >>> print unicode(a).encode('utf8') # because a could be a number, or a different object every time, instead of "a='á'; print a" Cool, I'm starting to really love it. Concise and pythonic Are you seriously meaning that there is no way to tell print to use a default encoding, and it will magically try to find it and fail for everything not being a terminal? Are you seriously telling me that this is not a bug? Even worse, that it is "intended behaviour". BTW, jython acts differently about this, in all the versions I tried. And with -S I am allowed to change the encoding, which is crippled in site for no known good reason. python -S -c "import sys; sys.setdefaultencoding('utf8'); print unicode('\xc3\xa1','utf8')" >test (works, test contains an accented a as intended >use Unicode everywhere inside the >program, and byte strings for input and output. Have you ever wondered that to use unicode everywhere inside the program, one needs to decode literals (or input) to unicode (the next sentence you complain about)? >So, your code is exactly the other way round: it takes a byte string, >decodes it to unicode and then prints it. I follow this principle in my programming since about 6 years ago, so I'm not a novice. I'm playing by the rules: a) "decodes it to unicode" is the first step to get it into processing. This is just a test case, so processing is zero. b) I refuse to believe that the only way to ensure something to be printed right is wrapping every item into unicode(var).encode('utf8') [The redundant unicode call is because the var could be a number, or a different object] c) or making my code non portable by patching site.py to get a real encoding instead of ascii.

re: consistent, my experience it is that python unicode handling is consistently stupid, doing almost always the wrong thing. It remembers me of the defaults of WordPerfect, that were always exactly the opposite of what the user wanted 99% of time. I hope python 3000 comes fast and stops that real pain.

I love the language, but the way it handles unicode provokes hundreds of bugs.

>Python could correctly find out your terminal
>encoding, the Unicode string is automatically encoded in that encoding.
>
>If you output to a file, Python does not know which encoding you want to
>have, so all Unicode strings are converted to ascii only.

>>> sys.getfilesystemencoding()
'UTF-8'

so python is really dumb if print does not know my filesystemencoding, but knows my terminal encoding.

I though breaking the least surprising behaviour was not considered pythonic, and now you tell me that having a program running on console but issuing an exception when redirected is intended. I would prefer an exception in both cases. Or, even better, using sys.getfilesystemencoding(), or allowing me to set defaultencoding()

>Please direct further questions to the Python mailing list or newsgroup.

I would if I didn't consider this behaviour a bug, and a serious one. 

>The basic rule when handling Unicode is: use Unicode everywhere inside the
>program, and byte strings for input and output.
>So, your code is exactly the other way round: it takes a byte string,
>decodes it to unicode and *then* prints it.
>
>You should do it the other way: use Unicode literals in your code, and
>when y(ou write something to a file, *encode* them in utf-8.

Do you mean that I need to say print unicode(whatever).encode('utf8'), like:

>>> a = unicode('\xc3\xa1','utf8') # instead of 'á', easy to read and understand, even in files encoded as utf8. Assume this is a literal or input
...
>>> print unicode(a).encode('utf8') # because a could be a number, or a different object

every time, instead of "a='á'; print a"

Cool, I'm starting to really love it. Concise and pythonic

Are you seriously meaning that there is no way to tell print to use a default encoding, and it will magically try to find it and fail for everything not being a terminal?


Are you seriously telling me that this is not a bug? Even worse, that it is "intended behaviour". BTW, jython acts differently about this, in all the versions I tried.

And with -S I am allowed to change the encoding, which is crippled in site for no known good reason. 

python -S -c "import sys; sys.setdefaultencoding('utf8'); print unicode('\xc3\xa1','utf8')" >test
(works, test contains an accented a as intended


>use Unicode everywhere inside the
>program, and byte strings for input and output.

Have you ever wondered that to use unicode everywhere inside the program, one needs to decode literals (or input) to unicode (the next sentence you complain about)?

>So, your code is exactly the other way round: it takes a byte string,
>decodes it to unicode and *then* prints it.

I follow this principle in my programming since about 6 years ago, so I'm not a novice. I'm playing by the rules:
a) "decodes it to unicode" is the first step to get it into processing. This is just a test case, so processing is zero.
b) I refuse to believe that the only way to ensure something to be printed right is wrapping every item into unicode(var).encode('utf8') [The redundant unicode call is because the var could be a number, or a different object]
c) or making my code non portable by patching site.py to get a real encoding instead of ascii.

History
Date	User	Action	Args
2007-08-23 14:52:06	admin	link	issue1668295 messages
2007-08-23 14:52:06	admin	create