classification
Title: os.statvfs() not working well with unicode paths
Type: behavior Stage: resolved
Components: Unicode Versions: Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: giampaolo.rodola Nosy List: Arfrever, benjamin.peterson, ezio.melotti, giampaolo.rodola, serhiy.storchaka, vstinner
Priority: normal Keywords: needs review, patch

Created on 2013-08-09 12:06 by giampaolo.rodola, last changed 2020-05-31 14:42 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
statvfs.patch giampaolo.rodola, 2013-08-09 12:06
issue18695-2.patch giampaolo.rodola, 2014-01-22 19:47
issue18695-3.patch giampaolo.rodola, 2014-01-22 20:48
Messages (16)
msg194726 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-08-09 12:06
From: https://code.google.com/p/psutil/issues/detail?id=416


# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import os, errno
name = "ƒőő"
try:
    os.mkdir(name)
except OSError as err:
    if err.errno != errno.EEXIST:
        raise
os.statvfs(name)


The script above works fine on Python 3.3 but on 2.7 you'll get:

Traceback (most recent call last):
  File "foo.py", line 10, in <module>
    os.statvfs(name)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

Patch in attachment fixes the issue.
msg194764 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-09 20:17
Functions such as rename(), popen(), mkfifo(), mknod(), etc have the same issue.
msg194774 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-08-09 20:47
> The script above works fine on Python 3.3 but on 2.7 you'll get: ...

Cool, you now have a good reason to upgrade to Python 3 ;-)

I'm not sure that it's a good idea to invest time on fixing Unicode issues in Python 2, especially in a minor version (Python 2.7.x).
msg194783 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-08-10 02:44
> I'm not sure that it's a good idea to invest time on fixing Unicode 
> issues in Python 2, especially in a minor version (Python 2.7.x).

I admit I sort of share the same doubts, but considering 2.7 a "minor python version", especially at this point, would be a mistake.
msg194788 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-10 09:47
2.7.x is a minor version, not 2.7. We fixed Unicode issues in Python 2 bugfixes many times.
msg208114 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-14 21:17
Giampaolo, do you want to provide a test?
msg208844 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-01-22 19:47
Attached patch includes tests. I took test_sax.py as an example.
msg208847 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-01-22 20:19
While I'm at it I'm going to fix also mkfifo(), mknod() and others. Hold on a bit more.
msg208849 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-01-22 20:48
Ok, patch in attachment fixes mkfifo(), mknod() and statvfs() and also includes Unicode tests for all os module's path-related functions.
msg208853 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-22 21:38
You have eaten "return NULL;" in posix_mkfifo.

"from test.test_support import TESTFN_UNICODE, TESTFN_ENCODING" can fail. The simplest solution is just initialize them to None by default in test_support.

If TESTFN_UNICODE.encode(TESTFN_ENCODING) fails (on POSIX locale), it will be better to run tests with unicode(TESTFN, 'ascii') than skip them.

Tests should check that results for unicode filename is same as for str filename.

As far as Victor have doubts, we should ask Benjamin.
msg208857 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-01-22 22:24
> As far as Victor have doubts, we should ask Benjamin.

Well, if you begin to patch some os functions, we will find much functions which don't support Unicode path.

I prefer to consider that Python 2 doesn't support Unicode filenames to avoid bugs.

If you want to support Unicode filename, we will have to modify a lot of code. What's the point since Python 3 has a very good support of Unicode? Much better than Python 2?
msg208959 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-01-23 15:25
Either way it's fine with me.

Regardless I think these tests have some value because all those os functions are not currently tested so it might makes sense to apply  Serhiy's suggestions and port them to Python 3.4.
msg219038 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-24 12:28
BDFL says (http://permalink.gmane.org/gmane.comp.python.devel/146074):

"""Given that the claim "Python 2 doesn't support Unicode filenames" is factually incorrect (in Python 2.7, most filesystem calls in fact do support Unicode, at least on some platforms), I think individual functions in the os module that are found lacking should be considered bugs, and if someone goes through the effort to supply an otherwise acceptable fix, we shouldn't reject it on the basis that we don't want to consider supporting Unicode filenames."""
msg219048 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-05-24 17:00
Ok, I will go on then.

> You have eaten "return NULL;" in posix_mkfifo.

What do you mean?

> If TESTFN_UNICODE.encode(TESTFN_ENCODING) fails (on POSIX locale), it will be better to run tests with 
> unicode(TESTFN, 'ascii') than skip them.

Agreed.

> Tests should check that results for unicode filename is same as for str filename.

What do you mean? Can you provide an example?
msg219054 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-24 20:50
> > You have eaten "return NULL;" in posix_mkfifo.
> What do you mean?

You deleted "return NULL;" after "if (!PyArg_ParseTuple(...))" in the 
posix_mkfifo() function.

> > Tests should check that results for unicode filename is same as for str
> > filename.
> What do you mean? Can you provide an example?

For example test_statvfs should check that os.statvfs(TESTFN_UNICODE) == 
os.statvfs(TESTFN_UNICODE_ENCODED) (where TESTFN_UNICODE_ENCODED is relevant 
8-bit str).
msg370467 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-31 14:42
Python 2.7 is no longer supported.
History
Date User Action Args
2020-05-31 14:42:33serhiy.storchakasetstatus: open -> closed
resolution: out of date
messages: + msg370467

stage: test needed -> resolved
2014-05-24 20:50:17serhiy.storchakasetmessages: + msg219054
2014-05-24 17:00:02giampaolo.rodolasetmessages: + msg219048
2014-05-24 12:28:19serhiy.storchakasetmessages: + msg219038
2014-01-23 15:25:11giampaolo.rodolasetmessages: + msg208959
2014-01-22 22:24:22vstinnersetmessages: + msg208857
2014-01-22 21:38:38serhiy.storchakasetassignee: giampaolo.rodola

messages: + msg208853
nosy: + benjamin.peterson
2014-01-22 20:48:10giampaolo.rodolasetfiles: + issue18695-3.patch

messages: + msg208849
2014-01-22 20:19:14giampaolo.rodolasetmessages: + msg208847
2014-01-22 19:47:42giampaolo.rodolasetfiles: + issue18695-2.patch

messages: + msg208844
2014-01-15 06:25:14Arfreversetnosy: + Arfrever
2014-01-14 21:17:58serhiy.storchakasettype: behavior
messages: + msg208114
stage: test needed
2013-08-10 09:47:34serhiy.storchakasetmessages: + msg194788
2013-08-10 02:44:24giampaolo.rodolasetmessages: + msg194783
2013-08-09 20:47:49vstinnersetmessages: + msg194774
2013-08-09 20:17:22serhiy.storchakasetnosy: + vstinner, serhiy.storchaka
messages: + msg194764
2013-08-09 12:06:43giampaolo.rodolacreate