classification
Title: UnicodeDecodeError in os.path.expandvars
Type: behavior Stage: resolved
Components: Library (Lib), Unicode Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: brian.curtin, ezio.melotti, haypo, loewis, python-dev, serhiy.storchaka, shura_zam
Priority: normal Keywords: needs review, patch

Created on 2009-09-01 07:51 by shura_zam, last changed 2014-02-23 16:58 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
issue6815.diff brian.curtin, 2010-03-06 17:49 patch for trunk review
expandvars_nonascii.patch serhiy.storchaka, 2014-01-27 19:31 review
expandvars_nonascii-2.7.patch serhiy.storchaka, 2014-02-13 13:58 review
Messages (17)
msg92124 - (view) Author: Alexandr Zamaraev (shura_zam) Date: 2009-09-01 07:51
OS Windows Vista Home Basic Ru + sp2
Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit
(Intel)] on win32

Simple code to crach (file expandvars_bug.py):
[code]
# -*- coding: cp1251 -*-
import os.path

var = r'C:\Вася\Microsoft'
print os.path.expandvars(var)
print os.path.expandvars(unicode(var))
[/code]
Console session:
[code]
C:\┬рё \Microsoft
Traceback (most recent call last):
  File "C:\Lang\test\python\expandvars_bug.py", line 6, in <module>
    print os.path.expandvars(unicode(var))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3:
ordinal
not in range(128)
[/code]
msg92127 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-09-01 07:55
If you don't specify the encoding, unicode() will decode 'var' using the
default ascii codec and it will fail if 'var' contains non-ascii
characters. You should use var.decode(encoding) instead.

The fact that the first print is not displayed correctly is probably due
to the limitation of the Windows terminal.
msg92133 - (view) Author: Alexandr Zamaraev (shura_zam) Date: 2009-09-01 08:28
Sorry.
First code is:
var = ${APPDATA}\Microsoft\Windows\Start Menu

In my windows 
F:\Lang\sf.net\svn\appupdater>echo %APPDATA%
C:\Users\Леново\AppData\Roaming
msg92134 - (view) Author: Alexandr Zamaraev (shura_zam) Date: 2009-09-01 08:33
All code:
[code]
# -*- coding: cp1251 -*-
import os.path

var = r'${APPDATA}\Microsoft\Windows\Start Menu'
print os.path.expandvars(var)
print os.path.expandvars(unicode(var, 'cp1251'))
[/code]
Console session:
[code]
C:\Users\Леново\AppData\Roaming\Microsoft\Windows\Start Menu
Traceback (most recent call last):
  File "C:\Lang\test\python\expandvars_bug.py", line 6, in <module>
    print os.path.expandvars(unicode(var, 'cp1251'))
  File "C:\Lang\Python\26\lib\ntpath.py", line 388, in expandvars
    res = res + c
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcb in position 9:
ordinal not in range(128)
[/code]
msg100535 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-03-06 15:54
This is a shorter snippet to reproduce the issue:
import os
os.environ['TEST'] = u'äö'.encode('iso-8859-1')
os.path.expandvars(u'%TEST%a')

If the var is a non-ASCII byte string, and the string passed to expandvars() is Unicode, the var is decoded implicitly using the ASCII codec and the decoding fails.

On Python 3 the situation looks even worse:
import os
os.environ['TEST'] = 'äö'.encode('iso-8859-1');
os.path.expandvars('%TEST%a')

This snippet returns "b'\\xe4\\xf6'a".
msg100536 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-03-06 16:26
FWIW, _winreg.ExpandEnvironmentStrings does the right thing.

D:\python-dev\trunk>PCbuild\amd64\python.exe
Python 2.7a3+ (trunk, Feb 23 2010, 20:22:24) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys, _winreg
>>> os.environ["TEST"] = u"jalape\xf1o".encode(sys.getfilesystemencoding())
>>> print(os.path.expandvars(u"C:\\%TEST%"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\python-dev\trunk\lib\ntpath.py", line 354, in expandvars
    res = res + os.environ[var]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf1 in position 6: ordinal
not in range(128)
>>> print(_winreg.ExpandEnvironmentStrings(u"C:\\%TEST%"))
C:\jalapeño
msg100539 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-03-06 17:49
Here is a patch for trunk.
It works for me, but my Unicode knowledge isn't the strongest. I believe sys.getfilesystemencoding() is what we want here. Can anyone confirm?
msg100540 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-03-06 18:12
I think the patch is incorrect. It shouldn't do any encoding conversion, but perform the expanding completely in Unicode strings.

For 2.x, I recommend to close this as "won't fix". Expanding a Unicode strings is just not supported. If it was supported, it should be supported correctly, i.e. allowing both environment variable names and environment variable values to have non-ASCII characters in them, and, on Windows, even non-MBCS characters.
msg209460 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-27 18:49
Here is a patch (for 3.3+) which add support for environment variables with non-ASCII values and names in posixpath and ntpath.
msg209467 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-27 19:31
Ah, posixpath can be used on Windows. Here is corrected patch. Please test it on Windows.
msg211063 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-12 08:31
Brian, could you please test it on Windows?
msg211101 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2014-02-12 16:42
Sorry, I don't have a Windows environment setup right now.
msg211131 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-13 08:16
New changeset 08746e015c64 by Serhiy Storchaka in branch '3.3':
Issue #6815: os.path.expandvars() now supports non-ASCII environment
http://hg.python.org/cpython/rev/08746e015c64

New changeset f3c146036e7c by Serhiy Storchaka in branch 'default':
Issue #6815: os.path.expandvars() now supports non-ASCII environment
http://hg.python.org/cpython/rev/f3c146036e7c
msg211133 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-13 08:49
New changeset b5ad525076eb by Serhiy Storchaka in branch '3.3':
Fixed typo in previous commit (issue #6815).
http://hg.python.org/cpython/rev/b5ad525076eb

New changeset 6825395e6107 by Serhiy Storchaka in branch 'default':
Fixed typo in previous commit (issue #6815).
http://hg.python.org/cpython/rev/6825395e6107
msg211145 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-02-13 11:05
The changes on ntpath.py are a little bit weird: it uses os.environb, whereas Windows is the only OS where os.environb does not exist. It looks like ntpath is available and tested on UNIX so the change looks to be valid, even you are not supposed to have Windows variables in your UNIX environment :-)

As bytes filenames, we should maybe deprecated bytes environment variables on Windows in Python 3.5. On Windows, the environment is Unicode. The bytes API is just provided for backward compatibility, but it should not be used.

By the way, the initial bug report was on Python 2.

Using os.fsencode/fsdecode instead of the ASCII encoding looks correct.
msg211157 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-13 13:58
Here is patch for 2.7.
msg211658 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-19 21:28
New changeset d11ca14c9a61 by Serhiy Storchaka in branch '2.7':
Issue #6815: os.path.expandvars() now supports non-ASCII Unicode environment
http://hg.python.org/cpython/rev/d11ca14c9a61
History
Date User Action Args
2014-02-23 16:58:47serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2014-02-19 21:28:03python-devsetmessages: + msg211658
2014-02-13 13:58:42serhiy.storchakasetfiles: + expandvars_nonascii-2.7.patch

messages: + msg211157
versions: + Python 2.7
2014-02-13 11:05:04hayposetmessages: + msg211145
2014-02-13 08:49:51python-devsetmessages: + msg211133
2014-02-13 08:16:44python-devsetnosy: + python-dev
messages: + msg211131
2014-02-12 16:42:26brian.curtinsetmessages: + msg211101
2014-02-12 08:31:20serhiy.storchakasetmessages: + msg211063
2014-02-10 17:38:17serhiy.storchakasetkeywords: + needs review
2014-02-09 20:58:16serhiy.storchakasetassignee: serhiy.storchaka
2014-01-27 19:31:45serhiy.storchakasetfiles: + expandvars_nonascii.patch

messages: + msg209467
2014-01-27 19:30:30serhiy.storchakasetfiles: - expandvars_nonascii.patch
2014-01-27 18:49:26serhiy.storchakasetfiles: + expandvars_nonascii.patch
versions: + Python 3.3, Python 3.4, - Python 3.1
nosy: + serhiy.storchaka

messages: + msg209460

stage: test needed -> patch review
2012-04-25 22:59:44hayposetnosy: + haypo

versions: - Python 2.6, Python 2.7, Python 3.2
2010-03-06 18:13:09loewissetpriority: high -> normal
2010-03-06 18:12:52loewissetmessages: + msg100540
2010-03-06 17:49:45brian.curtinsetfiles: + issue6815.diff

nosy: + loewis
messages: + msg100539

keywords: + patch
2010-03-06 16:26:28brian.curtinsetnosy: + brian.curtin
messages: + msg100536
2010-03-06 15:54:03ezio.melottisetpriority: normal -> high

messages: + msg100535
versions: + Python 3.1, Python 2.7, Python 3.2
2010-02-27 09:56:41ezio.melottisetnosy: shura_zam, ezio.melotti
components: + Library (Lib)
2010-02-09 16:43:07brian.curtinsetpriority: normal
type: crash -> behavior
stage: test needed
2009-09-01 08:34:00shura_zamsetmessages: + msg92134
2009-09-01 08:28:12shura_zamsetmessages: + msg92133
2009-09-01 07:55:17ezio.melottisetnosy: + ezio.melotti
messages: + msg92127
2009-09-01 07:51:15shura_zamcreate