Issue6202
Created on 2009-06-05 10:37 by ned.deily, last changed 2009-06-07 17:43 by ned.deily.
|
msg88929 - (view) |
Author: Ned Deily (ned.deily) |
Date: 2009-06-05 10:37 |
|
Potential Release Blocker
The default file encoding for 3.x file objects is the value of
locale.getpreferredencoding(). Currently, the locale module behavior on
OS X deviates from other python POSIX platforms in a few unexpected and
bad ways:
1. On OS X, locale.getpreferredencoding() returns "mac-roman", an
obsolete encoding from the old "Classic" MacOS days.
2. Unlike other POSIX platforms (at least Debian Linux), the values
returned by locale.getdefaultlocale() and locale.getpreferredencoding()
on OS X are not influenced by the settings of the POSIX locale
environment variables, i.e LANG. So, unlike on the other POSIX
platforms, one can't override the (obsolete) encoding without explicitly
setting the encoding argument to open().
Compare the results from Debian Linux:
$ unset LANG
$ python3.1
Python 3.1a1+ (py3k, Mar 23 2009, 00:12:12)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'ANSI_X3.4-1968'
>>> open('blah','r').encoding
'ANSI_X3.4-1968'
>>> locale.getlocale()
(None, None)
>>> locale.getdefaultlocale()
(None, None)
>>>
$ export LANG=en_US.UTF-8
$ python3.1
Python 3.1a1+ (py3k, Mar 23 2009, 00:12:12)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'UTF-8'
>>> open('blah','r').encoding
'UTF-8'
>>> locale.getlocale()
('en_US', 'UTF8')
>>> locale.getdefaultlocale()
('en_US', 'UTF8')
>>>
... to OS X:
$ unset LANG
$ python3.1
Python 3.1rc1+ (py3k, Jun 3 2009, 14:31:41)
[GCC 4.0.1 (Apple Computer, Inc. build 5370)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'mac-roman'
>>> open('blah','r').encoding
'mac-roman'
>>> locale.getlocale()
(None, None)
>>> locale.getdefaultlocale()
(None, 'mac-roman')
>>>
$ export LANG=en_US.UTF-8
$ python3.1
Python 3.1rc1+ (py3k, Jun 3 2009, 14:31:41)
[GCC 4.0.1 (Apple Computer, Inc. build 5370)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'mac-roman'
>>> open('blah','r').encoding
'mac-roman'
>>> locale.getlocale()
('en_US', 'UTF8')
>>> locale.getdefaultlocale()
(None, 'mac-roman')
>>>
A quick look at the code shows that part of the problem is in
Modules/_localemodule.c where there is a #if defined(__APPLE__) version
of PyLocale_getdefaultlocale which appears to have its origins in MacOS
and should probably just be removed and locale.py modified to
eliminate/minimize the special case mac/darwin code. For the case of no
locale, "UTF-8" would seem to be a reasonable default. In any case,
"mac-roman" is not.
|
|
msg88938 - (view) |
Author: Ronald Oussoren (ronaldoussoren) |
Date: 2009-06-05 11:47 |
|
I'm setting the priority to "release blocker" because the current
behaviour is completely unwanted, the "mac-roman" encoding is no longer
used by default on OSX. All system tools write UTF-8 encoded files by
default, and the LANG variable is set to an UTF8 encoding as well.
I won't be able to look into before sunday, and possibly only after next
week (that is june 15th or later) because I'll be at a conference and
don't know if I have spare time to spent on this after sunday.
|
|
msg88957 - (view) |
Author: Benjamin Peterson (benjamin.peterson) |
Date: 2009-06-05 17:53 |
|
Here's a patch. (for the trunk as it is also afflicted) It simply
removes the specific mac cases and uses posix detection.
|
|
msg88978 - (view) |
Author: Ned Deily (ned.deily) |
Date: 2009-06-05 22:22 |
|
A very quick test of the patch on trunk for 10.4 and 10.5 looks good,
though it should be re-tested once the unrelated current breakage of
test__locale is fixed.
|
|
msg89043 - (view) |
Author: Ronald Oussoren (ronaldoussoren) |
Date: 2009-06-07 15:29 |
|
The patch looks good, and tests pass on 10.5.7.
I've committed this as r73268
|
|
msg89048 - (view) |
Author: Ned Deily (ned.deily) |
Date: 2009-06-07 17:43 |
|
(and committed to trunk in r73270 by Benjamin)
|
|
| Date |
User |
Action |
Args |
| 2009-06-25 13:48:42 | r.david.murray | link | issue6315 superseder |
| 2009-06-07 17:43:56 | ned.deily | set | messages:
+ msg89048 |
| 2009-06-07 15:29:59 | ronaldoussoren | set | status: open -> closed resolution: fixed messages:
+ msg89043
|
| 2009-06-05 22:22:56 | ned.deily | set | messages:
+ msg88978 |
| 2009-06-05 17:53:13 | benjamin.peterson | set | files:
+ fix_mac_encoding.patch keywords:
+ patch messages:
+ msg88957
versions:
+ Python 2.7 |
| 2009-06-05 11:47:06 | ronaldoussoren | set | priority: release blocker
messages:
+ msg88938 |
| 2009-06-05 10:37:09 | ned.deily | create | |
|