classification
Title: tempfile.mkdtemp fails with non-ascii paths on Python 2
Type: behavior Stage:
Components: Unicode Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: akira, ezio.melotti, gregory.p.smith, risto3, scoder, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2015-01-25 11:02 by akira, last changed 2016-01-02 10:28 by risto3.

Messages (7)
msg234662 - (view) Author: Akira Li (akira) * Date: 2015-01-25 11:02
Python 2.7.9 (default, Jan 25 2015, 13:41:30) 
  [GCC 4.9.2] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import os, sys, tempfile
  >>> d = u'\u20ac'.encode(sys.getfilesystemencoding()) # non-ascii
  >>> if not os.path.isdir(d): os.makedirs(d)
  ... 
  >>> os.environ['TEMP'] = d
  >>> tempfile.mkdtemp(prefix=u'')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File ".../python2.7/tempfile.py", line 331, in mkdtemp
      file = _os.path.join(dir, prefix + name + suffix)
    File ".../python2.7/posixpath.py", line 80, in join
      path += '/' + b
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 13: ordinal not in range(128)

Related: https://bugs.python.org/issue1681974
msg234664 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-01-25 12:05
Why do you use an unicode prefix? Does it work with a bytes prefix?

You should use Python 3 if you want the best Unicode support.
msg257333 - (view) Author: Richard PALO (risto3) Date: 2016-01-02 07:42
I notice similar problems, as found when running the test suite for lxml 3.5.0 on python2.7

======================================================================
ERROR: test_etree_parse_io_error (lxml.tests.test_io.ETreeIOTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/local/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/tmp/pkgsrc/textproc/py-lxml/work/lxml-3.5.0/src/lxml/tests/test_io.py", line 276, in test_etree_parse_io_error
    dn = tempfile.mkdtemp(prefix=dirnameRU)
  File "/opt/local/lib/python2.7/tempfile.py", line 339, in mkdtemp
    _os.mkdir(file, 0700)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 40-53: ordinal not in range(128)

======================================================================
ERROR: test_etree_parse_io_error (lxml.tests.test_io.ElementTreeIOTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/local/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/tmp/pkgsrc/textproc/py-lxml/work/lxml-3.5.0/src/lxml/tests/test_io.py", line 276, in test_etree_parse_io_error
    dn = tempfile.mkdtemp(prefix=dirnameRU)
  File "/opt/local/lib/python2.7/tempfile.py", line 339, in mkdtemp
    _os.mkdir(file, 0700)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 40-53: ordinal not in range(128)


the code snippet is in test_io.py", line 276

   266	    def test_etree_parse_io_error(self):
   267		# this is a directory name that contains characters beyond latin-1
   268		dirnameEN = _str('Directory')
   269		dirnameRU = _str('КÐ\260Ñ\032Ð\260Ð\273Ð\276Ð\263')
   270		filename = _str('nosuchfile.xml')
   271		dn = tempfile.mkdtemp(prefix=dirnameEN)
   272		try:
   273		    self.assertRaises(IOError, self.etree.parse, os.path.join(dn, filename))
   274		finally:
   275		    os.rmdir(dn)
   276		dn = tempfile.mkdtemp(prefix=dirnameRU)
   277		try:
   278		    self.assertRaises(IOError, self.etree.parse, os.path.join(dn, filename))
   279		finally:
   280		    os.rmdir(dn)

even if I change dirnameRU to a simple French 'Répertoire' I still get errors...

It is not an option to upgrade to 3.0, sorry.

BTW, I tried passing dirnameRU.encode('utf-8') but that just generates
a different error:

ERROR: test_etree_parse_io_error (lxml.tests.test_io.ETreeIOTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/local/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/tmp/pkgsrc/textproc/py-lxml/work/lxml-3.5.0/src/lxml/tests/test_io.py", line 278, in test_etree_parse_io_error
    self.assertRaises(IOError, self.etree.parse, os.path.join(dn, filename))
  File "/opt/local/lib/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 40: ordinal not in range(128)
msg257334 - (view) Author: Richard PALO (risto3) Date: 2016-01-02 07:58
If I also add .encode('utf-8') to filename on line 278, that seems gets over the pathname problem.

I guess it comes down to the fact that if sys.filesystemencoding() is utf-8, which in my case it is (on SunOS), I believe these conversion should be automatic.
msg257338 - (view) Author: Richard PALO (risto3) Date: 2016-01-02 08:59
curiously enough, I was able to test with python3.5.
The same errors result, and the same workaround seems to get over it.
msg257340 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-02 09:37
The similar problem in Python 3 was addressed in issue24230. But this was a new feature.

As for lxml tests, I suggest to use bytes names compatible with all Windows OEM encodings (consisting of ASCII + b'\xa9\xb0\xb2\xb3\xb4\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc8\xc9\xe6\xf0\xf1\xf3\xf4\xf5\xf6\xf7') and with UTF-8.
msg257342 - (view) Author: Richard PALO (risto3) Date: 2016-01-02 10:28
This turns out to be related to the locale environment set to 'C'.

A UTF-8 locale seems to get over the issue.

A fellow pkgsrc colleague filed an issue with lxml already relating to that fact for the test suite (https://bugs.launchpad.net/lxml/+bug/1522052)

cheers
History
Date User Action Args
2016-01-02 10:28:34risto3setmessages: + msg257342
2016-01-02 09:37:48serhiy.storchakasetnosy: + scoder, gregory.p.smith, serhiy.storchaka
messages: + msg257340
2016-01-02 08:59:45risto3setmessages: + msg257338
2016-01-02 07:58:22risto3setmessages: + msg257334
2016-01-02 07:42:24risto3setnosy: + risto3
messages: + msg257333
2015-01-25 12:05:10vstinnersetmessages: + msg234664
2015-01-25 11:02:16akiracreate