classification
Title: Outputting unicode crushes when printing to file on Linux
Type: crash Stage:
Components: Unicode Versions: Python 2.6
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: Orlowski, benjamin.peterson, georg.brandl, loewis
Priority: normal Keywords:

Created on 2009-09-03 09:29 by Orlowski, last changed 2009-09-04 05:01 by Orlowski. This issue is now closed.

Messages (9)
msg92196 - (view) Author: Jerzy (Orlowski) Date: 2009-09-03 09:29
Hi

When I am outputting unicode strings to terminal my script works OK, but
when I redirect it to file I get a crash:

$ python mailing/message_sender.py -l Bia
Białystok

$ python mailing/message_sender.py -l Bia > ~/tmp/aaa.txt 
Traceback (most recent call last):
  File "mailing/message_sender.py", line 71, in <module>
    list_groups(unicode(args[0],'utf-8'))
  File "mailing/message_sender.py", line 53, in list_groups
    print group[1].name
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in
position 3: ordinal not in range(128)
msg92202 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-09-03 11:51
You have to use an encoding that's not ascii then.
msg92203 - (view) Author: Jerzy (Orlowski) Date: 2009-09-03 12:01
I know how to make it work. The question is why outputting to file makes 
it crush when outputting to terminal does not.

I have never seen "$program > file" behaving in a different way than 
"$program" in any other language

Jerzy Orlowski

Benjamin Peterson wrote:
> Benjamin Peterson <benjamin@python.org> added the comment:
>
> You have to use an encoding that's not ascii then.
>
> ----------
> nosy: +benjamin.peterson
> resolution:  -> works for me
> status: open -> closed
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue6832>
> _______________________________________
>
>
>
>
msg92205 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2009-09-03 12:22
When output goes to a terminal, Python can determine its encoding. For a
file, it cannot, therefore it refuses to guess.

Also, many programs behave differently when used with redirection;
namely, all those that use `isatty()` to determine if stdout is a terminal.
msg92207 - (view) Author: Jerzy (Orlowski) Date: 2009-09-03 12:27
Well, I would suggest using the terminal encoding as default one when 
redirecting. In my opinion sys.stdin and sys.stdout should always have 
the terminal encoding

Alternatively you could make the function sys.setdefaultencoding() 
visible to change it in a reasonable way

Jerzy

Georg Brandl wrote:
> Georg Brandl <georg@python.org> added the comment:
>
> When output goes to a terminal, Python can determine its encoding. For a
> file, it cannot, therefore it refuses to guess.
>
> Also, many programs behave differently when used with redirection;
> namely, all those that use `isatty()` to determine if stdout is a terminal.
>
> ----------
> nosy: +georg.brandl
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue6832>
> _______________________________________
>
>
>
>
msg92214 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-09-03 18:27
Using the terminal encoding for sys.stdout does not work in the general
case, as a (background) process may not *have* a controlling terminal
(such as a CGI script, a cron job, or a Windows service). That Python
recognizes the terminal encoding is primarily a convenience feature for
the interactive mode.

Exposing sys.setdefaultencoding is not implementable in a reasonable way.
msg92216 - (view) Author: Jerzy (Orlowski) Date: 2009-09-03 19:38
OK, I give up.

The problem is that one might test a program on terminal and think that 
everything is running OK and then spend a reasonable amount of time 
trying to find the problem later

Another approach: couldn't utf8 be set as default encoding for all 
inputs and outputs?

I know that some of my questions are caused by the fact that I do not 
understand how python works. But You have to bear in mind that most of 
the people don't. Such behaviour of Python (see also 
http://bugs.python.org/issue5092) is illogical in the "common sense" for 
standard poeple. If interpreter does something illogical for me, I am 
more eager to switch to another language.

Jerzy

Martin v. Löwis wrote:
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
> Using the terminal encoding for sys.stdout does not work in the general
> case, as a (background) process may not *have* a controlling terminal
> (such as a CGI script, a cron job, or a Windows service). That Python
> recognizes the terminal encoding is primarily a convenience feature for
> the interactive mode.
>
> Exposing sys.setdefaultencoding is not implementable in a reasonable way.
>
> ----------
> nosy: +loewis
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue6832>
> _______________________________________
>
>
>
>
msg92217 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-09-03 20:12
If you want to switch to a different language, consider switching to
Python 3. There, all strings are Unicode strings, and files opened in
text mode always use the locale encoding.
msg92230 - (view) Author: Jerzy (Orlowski) Date: 2009-09-04 05:01
good point!

I will give it a try

Jerzy

Martin v. Löwis wrote:
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
> If you want to switch to a different language, consider switching to
> Python 3. There, all strings are Unicode strings, and files opened in
> text mode always use the locale encoding.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue6832>
> _______________________________________
>
>
>
>
History
Date User Action Args
2009-09-04 05:01:24Orlowskisetmessages: + msg92230
2009-09-03 20:12:59loewissetmessages: + msg92217
2009-09-03 19:38:52Orlowskisetmessages: + msg92216
2009-09-03 18:27:35loewissetnosy: + loewis
messages: + msg92214
2009-09-03 12:27:19Orlowskisetmessages: + msg92207
2009-09-03 12:22:07georg.brandlsetnosy: + georg.brandl
messages: + msg92205
2009-09-03 12:01:31Orlowskisetmessages: + msg92203
2009-09-03 11:51:02benjamin.petersonsetstatus: open -> closed

nosy: + benjamin.peterson
messages: + msg92202

resolution: works for me
2009-09-03 09:29:54Orlowskicreate