This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_dbm_dumb fails due to character encoding issue on Mac OS X
Type: behavior Stage:
Components: Library (Lib), macOS, Tests Versions: Python 3.0
process
Status: closed Resolution: fixed
Dependencies: 3799 Superseder:
Assigned To: brett.cannon Nosy List: brett.cannon, oefe
Priority: normal Keywords:

Created on 2008-11-21 22:15 by oefe, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)
msg76208 - (view) Author: Martina Oefelein (oefe) Date: 2008-11-21 22:15
test_dbm_dumb fails due to what appears to be a character encoding issue 
on Mac OS X:

Majestix:Python-3.0rc3 martina$ 
DYLD_FRAMEWORK_PATH=/Users/martina/Downloads/Python-3.0rc3: ./python.exe 
-E -bb ./Lib/test/regrtest.py -l test_dbm_dumbtest_dbm_dumb
Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü', 
(3072, 1)\n", 2, 3, 'character maps to <undefined>') in <bound method 
_Database.close of <dbm.dumb._Database object at 0x6a2510>> ignored
Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü', 
(3072, 1)\n", 2, 3, 'character maps to <undefined>') in <bound method 
_Database.close of <dbm.dumb._Database object at 0x6a2510>> ignored
Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü', 
(3072, 1)\n", 2, 3, 'character maps to <undefined>') in <bound method 
_Database.close of <dbm.dumb._Database object at 0x6a2510>> ignored
Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü', 
(3072, 1)\n", 2, 3, 'character maps to <undefined>') in <bound method 
_Database.close of <dbm.dumb._Database object at 0x6a2510>> ignored
Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü', 
(3072, 1)\n", 2, 3, 'character maps to <undefined>') in <bound method 
_Database.close of <dbm.dumb._Database object at 0x6a2550>> ignored
Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü', 
(3072, 1)\n", 2, 3, 'character maps to <undefined>') in <bound method 
_Database.close of <dbm.dumb._Database object at 0x6a2550>> ignored
test test_dbm_dumb failed -- errors occurred; run in verbose mode for 
details
1 test failed:
    test_dbm_dumb
msg76209 - (view) Author: Martina Oefelein (oefe) Date: 2008-11-21 22:16
Example of verbose output (other testcases are similar):

======================================================================
ERROR: test_dumbdbm_creation (test.test_dbm_dumb.DumbDBMTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/martina/Downloads/Python-
3.0rc3/Lib/test/test_dbm_dumb.py", line 41, in test_dumbdbm_creation
    f.close()
  File "/Users/martina/Downloads/Python-3.0rc3/Lib/dbm/dumb.py", line 
228, in close
    self._commit()
  File "/Users/martina/Downloads/Python-3.0rc3/Lib/dbm/dumb.py", line 
116, in _commit
    f.write("%r, %r\n" % (key.decode('Latin-1'), pos_and_siz_pair))
  File "./Lib/io.py", line 1491, in write
    b = encoder.encode(s)
  File "./Lib/encodings/mac_roman.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xbc' in 
position 2: character maps to <undefined>
msg76212 - (view) Author: Martina Oefelein (oefe) Date: 2008-11-21 22:28
The Mac Roman encoding comes into play, because _commit opens _dirfile 
without explicitly specifying an encoding. io.open then gets the 
encoding via locale.getpreferredencoding, which returns mac-roman:

Majestix:Python-3.0rc3 martina$ 
DYLD_FRAMEWORK_PATH=/Users/martina/Downloads/Python-3.0rc3: ./python.exe 
-c "import locale;print(locale.getpreferredencoding())"
mac-roman

Two issues:
- since dumb.py handles encoding explicitly, shouldn't it specify the 
encoding for _dirfile as well? (or use a binary file; but this could 
cause new line-ending troubles...)
- is mac-roman really the appropriate choice for 
locale.getpreferredencoding? This is on Mac OS X 10.5, not Mac OS 9... 
The preferred encoding for Mac OS X should be utf-8, not some legacy 
encoding...

Seems to be related to r67310, which was intended to fix issue #3799
http://svn.python.org/view/python/branches/py3k/Lib/dbm/dumb.py?
rev=67310&r1=63662&r2=67310
msg76214 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2008-11-21 22:45
Issue 3799 already has a patch that specifies the encoding upon opening
the file so this should be fixed by final. Can you test the patch
(specify_open_encoding.diff) and let me know if that solves your
problem, Martina?
msg76274 - (view) Author: Martina Oefelein (oefe) Date: 2008-11-23 19:40
Yes, the patch fixes the issue for me.
msg76362 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2008-11-24 21:10
Fixed in r67369.
History
Date User Action Args
2022-04-11 14:56:41adminsetgithub: 48632
2008-11-24 21:10:25brett.cannonsetstatus: open -> closed
resolution: fixed
messages: + msg76362
2008-11-24 18:14:37brett.cannonsetassignee: brett.cannon
2008-11-23 19:40:37oefesetmessages: + msg76274
2008-11-21 22:45:33brett.cannonsetdependencies: + Byte/string inconsistencies between different dbm modules
messages: + msg76214
2008-11-21 22:28:44oefesetmessages: + msg76212
2008-11-21 22:17:49benjamin.petersonsetnosy: + brett.cannon
2008-11-21 22:16:58oefesetmessages: + msg76209
2008-11-21 22:15:17oefecreate