Issue14986
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012-06-02 13:18 by javahaxxor, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Messages (12) | |||
---|---|---|---|
msg162134 - (view) | Author: Adrian Bastholm (javahaxxor) | Date: 2012-06-02 13:18 | |
print(listentry) fails on folder name with swedish (latin1) characters Error: File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/mac_roman.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u030a' in position 33: character maps to <undefined> |
|||
msg162135 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2012-06-02 14:12 | |
A mac expert can confirm, but I think that just means that the default mac_roman encoding (which is made the default by the OS, if I understand correctly) can't handle that character. I believe it will work if you use utf-8. And no, I don't know how to do that, not be being a Mac person. |
|||
msg162143 - (view) | Author: Hynek Schlawack (hynek) * | Date: 2012-06-02 15:28 | |
'\u030a' can’t be latin1 as 0x030a = 778 which is waaay beyond 255. :) That's gonna be utf-8 and indeed that maps to " ̊". My best guess is that your LC_CTYPE is set to Mac Roman. You can check it using "import os;os.environ.get('LC_CTYPE')". Try running python as "LC_CTYPE=sv_SE.UTF-8 python3" and do a "print('\u030a')" to try if it helps. Otherwise a more complete (but minimal) example demonstrating the problem would be helpful. |
|||
msg162150 - (view) | Author: Ned Deily (ned.deily) * | Date: 2012-06-02 17:05 | |
mac_roman is an obsolete encoding from Mac OS 9 days; it is seldom seen on modern OS X systems. But it is often the fallback encoding set in ~/.CFUserTextEncoding if the LANG or a LC_* environment variable is not set (see, for example, http://superuser.com/questions/82123/mac-whats-cfusertextencoding-for). If you run a terminal session using Terminal.app, the LANG environment variable is usually set for you to an appropriate modern value, like 'en_US.UTF-8' in the US locale; this is controlled by a Terminal.app preference; other terminal apps like iTerm2 have something similar. But if you are using xterm with X11, xterm does not inject a LANG env variable. So, something like: python3.2 -c 'print("\u030a")' may fail running under xterm with UnicodeEncodeError but will print the expected character when run under Terminal.app. I avoid those kinds of issues by explicitly setting LANG in my shell profile. Let us know if that helps or, if not, how to reproduce your issue. |
|||
msg162156 - (view) | Author: Adrian Bastholm (javahaxxor) | Date: 2012-06-02 17:41 | |
The char in question: 'å'. It is a folder with this character in the name. My encoding is UTF-8. Running print("\u030a") gives a blank line U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE General Character Properties In Unicode since: 1.1 Unicode category: Letter, Uppercase Canonical decomposition: U+0041 LATIN CAPITAL LETTER A + U+030A COMBINING RING ABOVE Various Useful Representations UTF-8: 0xC3 0x85 UTF-16: 0x00C5 C octal escaped UTF-8: \303\205 XML decimal entity: Å Annotations and Cross References See also: • U+212B ANGSTROM SIGN Equivalents: • U+0041 LATIN CAPITAL LETTER A U+030A COMBINING RING ABOVE The code: def traverse (targetDir): currentDir = targetDir dirs = os.listdir(targetDir) for entry in dirs: if os.path.isdir(entry): print("Traversing " + entry) traverse(entry) else: print("Not dir: " + entry) if os.path.isfile(entry): print("Processing " + " " + currentDir + " " + entry) else: print("Not file: " + entry) print("\n") |
|||
msg162158 - (view) | Author: Adrian Bastholm (javahaxxor) | Date: 2012-06-02 17:52 | |
The last post is the CAPITAL Å. The following is the small letter "å" U+00E5 LATIN SMALL LETTER A WITH RING ABOVE General Character Properties In Unicode since: 1.1 Unicode category: Letter, Lowercase Canonical decomposition: U+0061 LATIN SMALL LETTER A + U+030A COMBINING RING ABOVE Various Useful Representations UTF-8: 0xC3 0xA5 UTF-16: 0x00E5 C octal escaped UTF-8: \303\245 XML decimal entity: å Annotations and Cross References Notes: • Danish, Norwegian, Swedish, Walloon Equivalents: • U+0061 LATIN SMALL LETTER A U+030A COMBINING RING ABOVE |
|||
msg162164 - (view) | Author: Ned Deily (ned.deily) * | Date: 2012-06-02 18:58 | |
The character in question is not the problem and the code snippet you provide looks fine. The problem is almost certainly that you are running the code in an execution environment where the LANG environment variable is either not set or is set to an encoding that doesn't support higher-order Unicode characters. The fallback 'mac_roman' is such an encoding. The default encodings used by the Python 3 interpreter are influenced by the value of these environment variables. So the questions are: how are you running your code and what are the values of the environment variables that your Python program inherits, and, by any chance, is your program using the 'locale' module, and if so, exactly what functions from it? Please try adding the following in the environment you are seeing the problem: import sys print(sys.stdout) import os print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')]) print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')]) import locale print(locale.getlocale()) print('\u00e5') print('\u0061\u030a') If I paste the above into a Python3.2 interactive terminal session using the python.org 64-/32-bit Python 3.2.3, I see the following: $ python3.2 Python 3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> print(sys.stdout) <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'> >>> import os >>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')]) [] >>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')]) [('LANG', 'en_US.UTF-8')] >>> import locale >>> print(locale.getlocale()) ('en_US', 'UTF-8') >>> print('\u00e5') å >>> print('\u0061\u030a') å But, if I explicitly remove the LANG environment variable: $ unset LANG $ python3.2 Python 3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> print(sys.stdout) <_io.TextIOWrapper name='<stdout>' mode='w' encoding='US-ASCII'> >>> import os >>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')]) [] >>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')]) [] >>> import locale >>> print(locale.getlocale()) (None, None) >>> print('\u00e5') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character '\xe5' in position 0: ordinal not in range(128) >>> print('\u0061\u030a') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character '\u030a' in position 1: ordinal not in range(128) >>> |
|||
msg162173 - (view) | Author: Adrian Bastholm (javahaxxor) | Date: 2012-06-02 20:34 | |
Output in console: Python 3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> print(sys.stdout) <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'> >>> import os >>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')]) [('LC_CTYPE', 'UTF-8')] >>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')]) [] >>> import locale >>> print(locale.getlocale()) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/locale.py", line 524, in getlocale return _parse_localename(localename) File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/locale.py", line 433, in _parse_localename raise ValueError('unknown locale: %s' % localename) ValueError: unknown locale: UTF-8 >>> print('\u00e5') å >>> print('\u0061\u030a') å ********************** Output from Eclipse: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='MacRoman'> [] [] (None, None) å Traceback (most recent call last): File "/Users/adyhasch/Documents/PythonWorkspace/PatternRenamer/src/prenamer.py", line 70, in <module> print('\u0061\u030a') File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/mac_roman.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u030a' in position 1: character maps to <undefined> ************************************ I'm running PyDev .. |
|||
msg162174 - (view) | Author: Adrian Bastholm (javahaxxor) | Date: 2012-06-02 20:42 | |
my code runs fine in a console window, so it's some kind of configuration error. Sorry for wasting your time guys .. It would be nice to know why PyDev is not setting the right environment vars though .. >>> traverse(".") Processing ./.DS_Store Traversing ./2011-10-03--Sebi_o_costi_ny_frisyr Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/.DS_Store Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/.picasa.ini Traversing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år/.DS_Store Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år/.picasa.ini Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år/DSC_5467.JPG Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år/DSC_5468.JPG Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år/DSC_5472.JPG Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/DSC_5440.JPG Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/DSC_5441.JPG Processing ./__init__.py Processing ./DSC_5440.JPG Processing ./DSC_5453.JPG Processing ./prenamer.py |
|||
msg162177 - (view) | Author: Hynek Schlawack (hynek) * | Date: 2012-06-02 21:30 | |
Glad we could help. I suspected it was running under "special circumstances". |
|||
msg162178 - (view) | Author: Ned Deily (ned.deily) * | Date: 2012-06-02 21:31 | |
I'm neither a PyDev nor an Eclipse user but there should be some way to set environment variables in it. Undoubtedly, Eclipse is launched as an app so a shell is not involved and shell profile files are not processed. However, the "Environment" section of this tutorial may help: http://pydev.org/manual_101_interpreter.html Try adding a definition for LANG or LC_CTYPE, as you prefer. And you should use a valid localized definition, like LANG=en_US.UTF-8 for US English UTF-8. The list of definitions is in Lib/locale.py. Good luck! |
|||
msg162202 - (view) | Author: Adrian Bastholm (javahaxxor) | Date: 2012-06-03 09:25 | |
Thanks a lot for the help, guys ! |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:31 | admin | set | github: 59191 |
2012-06-03 09:25:26 | javahaxxor | set | messages: + msg162202 |
2012-06-02 21:31:11 | ned.deily | set | messages: + msg162178 |
2012-06-02 21:30:07 | hynek | set | status: open -> closed resolution: not a bug messages: + msg162177 stage: resolved |
2012-06-02 20:42:38 | javahaxxor | set | messages: + msg162174 |
2012-06-02 20:34:36 | javahaxxor | set | messages: + msg162173 |
2012-06-02 18:58:57 | ned.deily | set | messages: + msg162164 |
2012-06-02 17:52:15 | javahaxxor | set | messages: + msg162158 |
2012-06-02 17:41:42 | javahaxxor | set | messages: + msg162156 |
2012-06-02 17:05:46 | ned.deily | set | messages: + msg162150 |
2012-06-02 15:28:15 | hynek | set | messages: + msg162143 |
2012-06-02 14:12:26 | r.david.murray | set | assignee: ronaldoussoren -> messages: + msg162135 nosy: + hynek, r.david.murray, ned.deily |
2012-06-02 13:18:23 | javahaxxor | create |