Author ned.deily
Recipients benjamin.peterson, ned.deily, ronaldoussoren
Date 2009-06-05.10:37:06
SpamBayes Score 1.44329e-15
Marked as misclassified No
Message-id <1244198231.29.0.333691708249.issue6202@psf.upfronthosting.co.za>
In-reply-to
Content
Potential Release Blocker

The default file encoding for 3.x file objects is the value of 
locale.getpreferredencoding(). Currently, the locale module behavior on 
OS X deviates from other python POSIX platforms in a few unexpected and 
bad ways:

1. On OS X, locale.getpreferredencoding() returns "mac-roman", an 
obsolete encoding from the old "Classic" MacOS days.

2. Unlike other POSIX platforms (at least Debian Linux), the values 
returned by locale.getdefaultlocale() and locale.getpreferredencoding() 
on OS X are not influenced by the settings of the POSIX locale 
environment variables, i.e LANG.  So, unlike on the other POSIX 
platforms, one can't override the (obsolete) encoding without explicitly 
setting the encoding argument to open().

Compare the results from Debian Linux:

$ unset LANG
$ python3.1
Python 3.1a1+ (py3k, Mar 23 2009, 00:12:12) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'ANSI_X3.4-1968'
>>> open('blah','r').encoding
'ANSI_X3.4-1968'
>>> locale.getlocale()
(None, None)
>>> locale.getdefaultlocale()
(None, None)
>>> 
$ export LANG=en_US.UTF-8
$ python3.1
Python 3.1a1+ (py3k, Mar 23 2009, 00:12:12) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'UTF-8'
>>> open('blah','r').encoding
'UTF-8'
>>> locale.getlocale()
('en_US', 'UTF8')
>>> locale.getdefaultlocale()
('en_US', 'UTF8')
>>> 

... to OS X:

$ unset LANG
$ python3.1
Python 3.1rc1+ (py3k, Jun  3 2009, 14:31:41) 
[GCC 4.0.1 (Apple Computer, Inc. build 5370)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'mac-roman'
>>> open('blah','r').encoding
'mac-roman'
>>> locale.getlocale()
(None, None)
>>> locale.getdefaultlocale()
(None, 'mac-roman')
>>> 
$ export LANG=en_US.UTF-8
$ python3.1
Python 3.1rc1+ (py3k, Jun  3 2009, 14:31:41) 
[GCC 4.0.1 (Apple Computer, Inc. build 5370)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'mac-roman'
>>> open('blah','r').encoding
'mac-roman'
>>> locale.getlocale()
('en_US', 'UTF8')
>>> locale.getdefaultlocale()
(None, 'mac-roman')
>>> 

A quick look at the code shows that part of the problem is in 
Modules/_localemodule.c where there is a #if defined(__APPLE__) version 
of PyLocale_getdefaultlocale which appears to have its origins in MacOS 
and should probably just be removed and locale.py modified to 
eliminate/minimize the special case mac/darwin code.  For the case of no 
locale, "UTF-8" would seem to be a reasonable default.  In any case, 
"mac-roman" is not.
History
Date User Action Args
2009-06-05 10:37:12ned.deilysetrecipients: + ned.deily, ronaldoussoren, benjamin.peterson
2009-06-05 10:37:11ned.deilysetmessageid: <1244198231.29.0.333691708249.issue6202@psf.upfronthosting.co.za>
2009-06-05 10:37:09ned.deilylinkissue6202 messages
2009-06-05 10:37:06ned.deilycreate