This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: print() fails on latin1 characters on OSX
Type: Stage: resolved
Components: macOS Versions: Python 3.2
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: hynek, javahaxxor, ned.deily, r.david.murray, ronaldoussoren
Priority: normal Keywords:

Created on 2012-06-02 13:18 by javahaxxor, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (12)
msg162134 - (view) Author: Adrian Bastholm (javahaxxor) Date: 2012-06-02 13:18
print(listentry) fails on folder name with swedish (latin1) characters
Error:

 File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/mac_roman.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u030a' in position 33: character maps to <undefined>
msg162135 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-06-02 14:12
A mac expert can confirm, but I think that just means that the default mac_roman encoding (which is made the default by the OS, if I understand correctly) can't handle that character.  I believe it will work if you use utf-8.  And no, I don't know how to do that, not be being a Mac person.
msg162143 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-06-02 15:28
'\u030a' can’t be latin1 as 0x030a = 778 which is waaay beyond 255. :) That's gonna be utf-8 and indeed that maps to " ̊".

My best guess is that your LC_CTYPE is set to Mac Roman. You can check it using "import os;os.environ.get('LC_CTYPE')".

Try running python as "LC_CTYPE=sv_SE.UTF-8 python3" and do a "print('\u030a')" to try if it helps.

Otherwise a more complete (but minimal) example demonstrating the problem would be helpful.
msg162150 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2012-06-02 17:05
mac_roman is an obsolete encoding from Mac OS 9 days; it is seldom seen on modern OS X systems. But it is often the fallback encoding set in ~/.CFUserTextEncoding if the LANG or a LC_* environment variable is not set (see, for example, http://superuser.com/questions/82123/mac-whats-cfusertextencoding-for).  If you run a terminal session using Terminal.app, the LANG environment variable is usually set for you to an appropriate modern value, like 'en_US.UTF-8' in the US locale; this is controlled by a Terminal.app preference; other terminal apps like iTerm2 have something similar.  But if you are using xterm with X11, xterm does not inject a LANG env variable.  So, something like:

   python3.2 -c 'print("\u030a")'

may fail running under xterm with UnicodeEncodeError but will print the expected character when run under Terminal.app.  I avoid those kinds of issues by explicitly setting LANG in my shell profile.

Let us know if that helps or, if not, how to reproduce your issue.
msg162156 - (view) Author: Adrian Bastholm (javahaxxor) Date: 2012-06-02 17:41
The char in question: 'å'. It is a folder with this character in the name. My encoding is UTF-8. Running print("\u030a") gives a blank line

U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
General Character Properties
In Unicode since: 1.1
Unicode category: Letter, Uppercase
Canonical decomposition: U+0041 LATIN CAPITAL LETTER A + U+030A COMBINING RING ABOVE
Various Useful Representations
UTF-8: 0xC3 0x85
UTF-16: 0x00C5
C octal escaped UTF-8: \303\205
XML decimal entity: &#197;
Annotations and Cross References
See also:
 • U+212B ANGSTROM SIGN
Equivalents:
 • U+0041 LATIN CAPITAL LETTER A U+030A COMBINING RING ABOVE

The code:

def traverse (targetDir):
    currentDir = targetDir
    dirs = os.listdir(targetDir)
    for entry in dirs:
        if os.path.isdir(entry):
            print("Traversing " + entry)
            traverse(entry)
        else:
            print("Not dir: " + entry)
            if os.path.isfile(entry):
                print("Processing " + " " + currentDir + " " + entry)
            else:
                print("Not file: " + entry)
    print("\n")
msg162158 - (view) Author: Adrian Bastholm (javahaxxor) Date: 2012-06-02 17:52
The last post is the CAPITAL Å. The following is the small letter "å"

U+00E5 LATIN SMALL LETTER A WITH RING ABOVE
General Character Properties
In Unicode since: 1.1
Unicode category: Letter, Lowercase
Canonical decomposition: U+0061 LATIN SMALL LETTER A + U+030A COMBINING RING ABOVE
Various Useful Representations
UTF-8: 0xC3 0xA5
UTF-16: 0x00E5
C octal escaped UTF-8: \303\245
XML decimal entity: &#229;
Annotations and Cross References
Notes:
 • Danish, Norwegian, Swedish, Walloon
Equivalents:
 • U+0061 LATIN SMALL LETTER A U+030A COMBINING RING ABOVE
msg162164 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2012-06-02 18:58
The character in question is not the problem and the code snippet you provide looks fine.  The problem is almost certainly that you are running the code in an execution environment where the LANG environment variable is either not set or is set to an encoding that doesn't support higher-order Unicode characters. The fallback 'mac_roman' is such an encoding.  The default encodings used by the Python 3 interpreter are influenced by the value of these environment variables.  So the questions are: how are you running your code and what are the values of the environment variables that your Python program inherits, and, by any chance, is your program using the 'locale' module, and if so, exactly what functions from it?

Please try adding the following in the environment you are seeing the problem:

import sys
print(sys.stdout)
import os
print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')])
print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')])
import locale
print(locale.getlocale())
print('\u00e5')
print('\u0061\u030a')

If I paste the above into a Python3.2 interactive terminal session using the python.org 64-/32-bit Python 3.2.3, I see the following:

$ python3.2
Python 3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.stdout)
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
>>> import os
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')])
[]
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')])
[('LANG', 'en_US.UTF-8')]
>>> import locale
>>> print(locale.getlocale())
('en_US', 'UTF-8')
>>> print('\u00e5')
å
>>> print('\u0061\u030a')
å

But, if I explicitly remove the LANG environment variable:

$ unset LANG
$ python3.2
Python 3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.stdout)
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='US-ASCII'>
>>> import os
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')])
[]
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')])
[]
>>> import locale
>>> print(locale.getlocale())
(None, None)
>>> print('\u00e5')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xe5' in position 0: ordinal not in range(128)
>>> print('\u0061\u030a')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u030a' in position 1: ordinal not in range(128)
>>>
msg162173 - (view) Author: Adrian Bastholm (javahaxxor) Date: 2012-06-02 20:34
Output in console:

Python 3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.stdout)
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
>>> import os
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')])
[('LC_CTYPE', 'UTF-8')]
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')])
[]
>>> import locale
>>> print(locale.getlocale())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/locale.py", line 524, in getlocale
    return _parse_localename(localename)
  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/locale.py", line 433, in _parse_localename
    raise ValueError('unknown locale: %s' % localename)
ValueError: unknown locale: UTF-8
>>> print('\u00e5')
å
>>> print('\u0061\u030a')
å
**********************

Output from Eclipse:

<_io.TextIOWrapper name='<stdout>' mode='w' encoding='MacRoman'>
[]
[]
(None, None)
å
Traceback (most recent call last):
  File "/Users/adyhasch/Documents/PythonWorkspace/PatternRenamer/src/prenamer.py", line 70, in <module>
    print('\u0061\u030a')
  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/mac_roman.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u030a' in position 1: character maps to <undefined>
************************************

I'm running PyDev ..
msg162174 - (view) Author: Adrian Bastholm (javahaxxor) Date: 2012-06-02 20:42
my code runs fine in a console window, so it's some kind of configuration error. Sorry for wasting your time guys .. It would be nice to know why PyDev is not setting the right environment vars though ..

>>> traverse(".")
Processing ./.DS_Store
Traversing ./2011-10-03--Sebi_o_costi_ny_frisyr
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/.DS_Store
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/.picasa.ini
Traversing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år/.DS_Store
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år/.picasa.ini
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år/DSC_5467.JPG
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år/DSC_5468.JPG
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år/DSC_5472.JPG

Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/DSC_5440.JPG
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/DSC_5441.JPG

Processing ./__init__.py
Processing ./DSC_5440.JPG
Processing ./DSC_5453.JPG
Processing ./prenamer.py
msg162177 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-06-02 21:30
Glad we could help. I suspected it was running under "special circumstances".
msg162178 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2012-06-02 21:31
I'm neither a PyDev nor an Eclipse user but there should be some way to set environment variables in it.  Undoubtedly, Eclipse is launched as an app so a shell is not involved and shell profile files are not processed.  However, the "Environment" section of this tutorial may help:

http://pydev.org/manual_101_interpreter.html

Try adding a definition for LANG or LC_CTYPE, as you prefer.  And you should use a valid localized definition, like LANG=en_US.UTF-8 for US English UTF-8.  The list of definitions is in Lib/locale.py.  Good luck!
msg162202 - (view) Author: Adrian Bastholm (javahaxxor) Date: 2012-06-03 09:25
Thanks a lot for the help, guys !
History
Date User Action Args
2022-04-11 14:57:31adminsetgithub: 59191
2012-06-03 09:25:26javahaxxorsetmessages: + msg162202
2012-06-02 21:31:11ned.deilysetmessages: + msg162178
2012-06-02 21:30:07hyneksetstatus: open -> closed
resolution: not a bug
messages: + msg162177

stage: resolved
2012-06-02 20:42:38javahaxxorsetmessages: + msg162174
2012-06-02 20:34:36javahaxxorsetmessages: + msg162173
2012-06-02 18:58:57ned.deilysetmessages: + msg162164
2012-06-02 17:52:15javahaxxorsetmessages: + msg162158
2012-06-02 17:41:42javahaxxorsetmessages: + msg162156
2012-06-02 17:05:46ned.deilysetmessages: + msg162150
2012-06-02 15:28:15hyneksetmessages: + msg162143
2012-06-02 14:12:26r.david.murraysetassignee: ronaldoussoren ->

messages: + msg162135
nosy: + hynek, r.david.murray, ned.deily
2012-06-02 13:18:23javahaxxorcreate