classification
Title: os.getcwd() should raise UnicodeDecodeError for arbitrary bytes
Type: behavior Stage: resolved
Components: Extension Modules Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: flox, pjenvey
Priority: normal Keywords:

Created on 2010-01-13 22:35 by flox, last changed 2010-01-13 22:46 by pjenvey. This issue is now closed.

Messages (3)
msg97740 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-01-13 22:35
When the current working directory is not decodable, the os.getcwd() function should raise an error.

>>> sys.getfilesystemencoding()
'utf-8'
>>> cwd=b'/tmp/\xe7'
>>> os.mkdir(cwd); os.chdir(cwd)
>>> os.getcwdb()
b'/tmp/\xe7'
>>> os.getcwd()  # Should raise UnicodeDecodeError
'/tmp/\udce7'

Python 2 raises the error.
msg97742 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-01-13 22:44
Actually, it is the documented behaviour.
http://docs.python.org/py3k/library/os.html#file-names-command-line-arguments-and-environment-variables

>>> b'\xe7'.decode('utf-8', 'surrogateescape')
'\udce7'
msg97743 - (view) Author: Philip Jenvey (pjenvey) * (Python committer) Date: 2010-01-13 22:46
Right, this is an intentional change in behavior in Python 3.1, non-decodable characters are now decoded to utf8b (via the surrogateescape error handler). The unicode string returned from getcwd furthermore can be passsed around to other fs functions, they simply encode back to the original bytes via surrogateescape on POSIX

See PEP 383
History
Date User Action Args
2010-01-13 22:46:26pjenveysetnosy: + pjenvey
messages: + msg97743
2010-01-13 22:44:30floxsetstatus: open -> closed
resolution: not a bug
messages: + msg97742

stage: resolved
2010-01-13 22:35:07floxcreate