classification
Title: shutil.disk_usage() on Windows can't properly handle unicode
Type: Stage: resolved
Components: Documentation, Windows Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Mariatta, cheryl.sabella, docs@python, eryksun, giampaolo.rodola, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2016-02-10 15:11 by giampaolo.rodola, last changed 2018-01-15 14:33 by Mariatta. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5184 merged cheryl.sabella, 2018-01-14 21:53
PR 5188 merged python-dev, 2018-01-15 05:08
Messages (11)
msg260017 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2016-02-10 15:11
On Python 3.4, Windows 7:

>>> import shutil, os
>>> path = 'psuugxik1s0è'
>>> os.stat(path)
os.stat_result(st_mode=33206, st_ino=6755399441249628, st_dev=3158553679, st_nlink=1, st_uid=0, st_gid=0, st_size=27136, st_atime=1455
116789, st_mtime=1455116789, st_ctime=1455116789)
>>>
>>> shutil.disk_usage(path)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\python34\lib\shutil.py", line 989, in disk_usage
    total, free = nt._getdiskusage(path)
NotADirectoryError: [WinError 267] The directory name is invalid
>>>
msg260018 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-02-10 15:15
>    total, free = nt._getdiskusage(path)
> NotADirectoryError: [WinError 267] The directory name is invalid

The underlying C function is GetDiskFreeSpaceEx():
https://msdn.microsoft.com/fr-fr/library/windows/desktop/aa364937%28v=vs.85%29.aspx

It takes a lpDirectoryName parameter: "A directory on the disk."

Is psuugxik1s0è a directory?

It looks more like a shutil.disk_usage() documentation issue than an Unicode issue.
msg260020 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2016-02-10 15:35
You are right, my bad. I'll fix doc mentioning that on Windows "path" can only be a directory (on UNIX it can also be a file).
msg260021 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-02-10 15:36
> You are right, my bad.

No problem. I read the doc before replying, and it is not said that the path must exist or must be a directory:
https://docs.python.org/dev/library/shutil.html#shutil.disk_usage

> I'll fix doc mentioning that on Windows "path" can only be a directory (on UNIX it can also be a file).

Great!
msg260022 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2016-02-10 15:41
Different but kind of related, disk_usage() is not able to accept bytes:

>>> shutil.disk_usage(b'.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\python34\lib\shutil.py", line 989, in disk
    total, free = nt._getdiskusage(path)
TypeError: must be str, not bytes
>>>
msg260023 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-02-10 15:44
> Different but kind of related, disk_usage() is not able to accept bytes:

On Python 3, I don't think that it's a big issue: bytes filenames are
deprecated.

See the current thread on python-dev:
https://mail.python.org/pipermail/python-dev/2016-February/143150.html

It's really much better to use Unicode on Windows, and I also suggest
you to use Unicode on UNIX/BSD.
msg309936 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2018-01-14 21:55
I've submitted a PR for the documentation change.
msg309941 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-01-15 01:27
This is the high-level shutil module, so why not try to use the resolved parent directory? For example:

    def disk_usage(path):
        try:
            total, free = nt._getdiskusage(path)
        except NotADirectoryError:
            path = os.path.dirname(nt._getfinalpathname(path))
            total, free = nt._getdiskusage(path)
        used = total - free
        return _ntuple_diskusage(total, used, free)

Also, as noted in msg260022, nt._getdiskusage was overlooked when implementing PEP 529. The same applies to nt._getfinalpathname and nt._getvolumepathname. nt._getfullpathname works with bytes because it takes an argument-clinic `path_t` instead of `unicode` or `Py_UNICODE`. I think the other 3 should be rewritten to use path_t, but it's out of scope for this issue.
msg309947 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2018-01-15 05:08
New changeset ee3b83547c6b0cac1da2cb44aaaea533a1d1bbc8 by Mariatta (Cheryl Sabella) in branch 'master':
bpo-26330: Update shutil.disk_usage() documentation (GH-5184)
https://github.com/python/cpython/commit/ee3b83547c6b0cac1da2cb44aaaea533a1d1bbc8
msg309984 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2018-01-15 14:32
New changeset fb8569e36f2629654d5bc9c7ba05978edce408f4 by Mariatta (Miss Islington (bot)) in branch '3.6':
bpo-26330: Update shutil.disk_usage() documentation (GH-5184) (GH-5188)
https://github.com/python/cpython/commit/fb8569e36f2629654d5bc9c7ba05978edce408f4
msg309985 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2018-01-15 14:33
Thanks!
History
Date User Action Args
2018-01-15 14:33:33Mariattasetstatus: open -> closed
versions: + Python 3.6
messages: + msg309985

resolution: fixed
stage: patch review -> resolved
2018-01-15 14:32:19Mariattasetmessages: + msg309984
2018-01-15 05:08:47python-devsetpull_requests: + pull_request5042
2018-01-15 05:08:39Mariattasetnosy: + Mariatta
messages: + msg309947
2018-01-15 01:27:27eryksunsetnosy: + eryksun
messages: + msg309941
2018-01-14 21:55:03cheryl.sabellasetnosy: + cheryl.sabella
messages: + msg309936
2018-01-14 21:54:47cheryl.sabellasetversions: + Python 3.7, - Python 3.3, Python 3.4, Python 3.5, Python 3.6
2018-01-14 21:53:49cheryl.sabellasetkeywords: + patch
stage: patch review
pull_requests: + pull_request5038
2016-02-10 15:44:32vstinnersetmessages: + msg260023
2016-02-10 15:41:48giampaolo.rodolasetmessages: + msg260022
2016-02-10 15:36:47vstinnersetmessages: + msg260021
2016-02-10 15:35:42giampaolo.rodolasetmessages: + msg260020
2016-02-10 15:16:03vstinnersetassignee: docs@python

components: + Documentation, Windows
nosy: + docs@python, paul.moore, tim.golden, zach.ware, steve.dower
2016-02-10 15:15:54vstinnersetnosy: + vstinner
messages: + msg260018
2016-02-10 15:11:17giampaolo.rodolacreate