classification
Title: Bytes paths now are supported in os.scandir() on Windows
Type: behavior Stage: resolved
Components: Documentation, Windows Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: 27781 Superseder:
Assigned To: serhiy.storchaka Nosy List: Elvis.Pranskevichus, benhoyt, docs@python, eryksun, gvanrossum, ned.deily, paul.moore, python-dev, serhiy.storchaka, steve.dower, tim.golden, vstinner, yselivanov, zach.ware
Priority: normal Keywords: patch

Created on 2016-09-07 08:21 by serhiy.storchaka, last changed 2017-03-31 16:36 by dstufft. This issue is now closed.

Files
File name Uploaded Description Edit
cleanup_scandir_bytes.patch serhiy.storchaka, 2016-09-25 10:57 review
direntry_bytes_path.patch serhiy.storchaka, 2016-10-08 07:53 review
issue_27998_01.patch eryksun, 2016-10-08 16:19 review
doc-scandir-bytes.patch serhiy.storchaka, 2016-11-13 09:04 review
Pull Requests
URL Status Linked Edit
PR 552 closed dstufft, 2017-03-31 16:36
Messages (26)
msg274778 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-07 08:21
Bytes paths are deprecated on Windows and their support was not added in os.scandir() for purpose. But this makes harder converting filesystem walking functions to use os.scandir(). We have to add a special case for bytes paths on Windows in every such function (os.walk(), glob.iglob()). This is cumbersome inefficient and errorprone code that slows down common case due to additional checks for special case. It would be better to add a support of bytes paths directly in os.scandir().
msg274779 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2016-09-07 08:27
In the light of Steve Dower's work to "un-deprecate" bytes paths, I agree this should be added.
msg274782 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-09-07 09:07
This is implemented in Steve's latest patch in issue 27781. For example:

    >>> sys.platform
    'win32'
    >>> os.mkdir('test')
    >>> f = open('test/\U00010000.txt', 'w')
    >>> next(os.scandir(b'test')).name
    b'\xf0\x90\x80\x80.txt'
    >>> next(os.scandir(b'test')).path
    b'test\\\xf0\x90\x80\x80.txt'
msg274812 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-07 14:08
Heh, totally by accident as well.
msg274846 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-07 17:31
Guido just said that we should make sure that os.scandir() continues to not support bytes, even after the patch for issue27781 goes in, so I'm reopening this.
msg274848 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-07 17:40
But why? The code of os and glob modules could be simpler if add the support.
msg274850 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-07 17:45
I'm ok to support os.scandir(bytes) on Windows.

My long term goal was to drop bytes support on Windows, but it seems like the new trend is more to keep this support and even enhance it.

It's quite easy to support os.scandir(bytes) on Windows, whereas it seems tricky and annoying to have to emulate the feature in pure Python.

Note: I was the BDFL-delegate for the PEP 471 ;-)
msg274851 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-07 17:48
There are only two references to bytes in the pep:

> Like the other functions in the os module, scandir() accepts either a 
> bytes or str object for the path parameter, and returns the 
> DirEntry.name and DirEntry.path attributes with the same type as
> path . However, it is strongly recommended to use the str type, as
> this ensures cross-platform support for Unicode filenames. (On
> Windows, bytes filenames have been deprecated since Python 3.3). 

I'm not concerned about supporting bytes here (with PEP 529, obviously), though the strong recommendation to use str should certainly stand.
msg277363 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-25 10:57
I don't know what is the benefit of removing support of bytes paths in os.scandir(), but supporting bytes paths allows to remove ugly workarounds from os and glob modules.

 Lib/glob.py |   16 --------!!!!!!
 Lib/os.py   |   67 ---------------------------------------------------------!!!
 2 files changed, 73 deletions(-), 10 modifications(!)
msg277397 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-09-25 20:41
Nice!
msg278071 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-04 17:55
I'm confused by the current title of this issue. Should the support of bytes paths in os.scandir() be removed or preserved? In latter case we can remove workaround code from os and glob.
msg278077 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-10-04 18:55
It went back and forth. The current feeling is to *keep* that support,
especially since their deprecation on Windows has been *undone* in
3.6. Feel free to update the title (though the initial comments may be
harder to understand without the context of the original title).
msg278085 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-10-04 20:06
I suggest to modify posixmodule.c to support bytes on Windows at the C
level.

Since Windows uses utf8, there is no more real reason to drop bytes support
only on Windows.
msg278147 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-05 20:19
New changeset 7c36e6fd0232 by Serhiy Storchaka in branch '3.6':
Issue #27998: Removed workarounds for supporting bytes paths on Windows in
https://hg.python.org/cpython/rev/7c36e6fd0232

New changeset bcee710c42fe by Serhiy Storchaka in branch 'default':
Issue #27998: Removed workarounds for supporting bytes paths on Windows in
https://hg.python.org/cpython/rev/bcee710c42fe
msg278150 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-05 21:06
Tests are passed on Windows.

Now we need to document that bytes paths are supported in os.scandir() on Windows since 3.6.
msg278285 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-08 07:23
Hmm, tests are passed on some Windows buildbots, but failed on others.

http://buildbot.python.org/all/builders/AMD64%20Windows10%203.x/builds/1631/steps/test/logs/stdio
======================================================================
ERROR: test_walk_topdown (test.test_os.BytesWalkTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\buildarea\3.x.bolen-windows10\build\lib\test\test_os.py", line 890, in test_walk_topdown
    all = list(self.walk(self.walk_path))
  File "D:\buildarea\3.x.bolen-windows10\build\lib\test\test_os.py", line 1055, in walk
    for broot, bdirs, bfiles in os.walk(os.fsencode(top), **kwargs):
  File "D:\buildarea\3.x.bolen-windows10\build\lib\os.py", line 409, in walk
    yield from walk(new_path, topdown, onerror, followlinks)
  File "D:\buildarea\3.x.bolen-windows10\build\lib\os.py", line 367, in walk
    is_dir = entry.is_dir()
TypeError: bad argument type for built-in operation

----------------------------------------------------------------------

The error message is not very informative. Seems like some C API function takes an argument of wrong type (e.g. bytes instead of unicode).
msg278286 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-08 07:53
I suppose the following patch should fix the issue. Could anybody please test it on Windows?
msg278306 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-10-08 16:19
Here's an alternative patch, using PyUnicode_FSDecoder. It also adds path_object_error and path_object_error2 helper functions.
msg278308 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-08 16:31
Thank you Eryk. The patch LGTM. Did you tested that it fixes tests?
msg278310 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-10-08 16:48
With the patch I uploaded, test_glob and test_os BytesWalkTests both pass, in both Windows 10 and Linux. Without it those tests fail for me in Windows 10.
msg278311 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-08 17:17
New changeset 4ed634870a9a by Serhiy Storchaka in branch '3.6':
Issue #27998: Fixed bytes path support in os.scandir() on Windows.
https://hg.python.org/cpython/rev/4ed634870a9a

New changeset 837114dea493 by Serhiy Storchaka in branch 'default':
Issue #27998: Fixed bytes path support in os.scandir() on Windows.
https://hg.python.org/cpython/rev/837114dea493
msg278312 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-08 17:20
Now it is documentation issue again.
msg280689 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-13 09:04
Proposed patch documents bytes paths support on Windows.
msg281244 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-11-20 03:22
Doc change looks good to me.
msg281249 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-11-20 06:25
New changeset ac63c70635db by Serhiy Storchaka in branch '3.6':
Issue #27998: Documented bytes paths support on Windows.
https://hg.python.org/cpython/rev/ac63c70635db

New changeset 26195e07fcc5 by Serhiy Storchaka in branch 'default':
Issue #27998: Documented bytes paths support on Windows.
https://hg.python.org/cpython/rev/26195e07fcc5
msg281250 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-20 06:26
Thanks Steve.
History
Date User Action Args
2017-03-31 16:36:24dstufftsetpull_requests: + pull_request977
2016-11-20 06:26:46serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg281250

stage: patch review -> resolved
2016-11-20 06:25:31python-devsetmessages: + msg281249
2016-11-20 03:22:19steve.dowersetmessages: + msg281244
2016-11-13 09:04:52serhiy.storchakasetfiles: + doc-scandir-bytes.patch

nosy: + Elvis.Pranskevichus, yselivanov
messages: + msg280689

keywords: + patch
stage: needs patch -> patch review
2016-10-08 17:20:04serhiy.storchakasetkeywords: - patch
priority: release blocker -> normal
messages: + msg278312

stage: patch review -> needs patch
2016-10-08 17:17:51python-devsetmessages: + msg278311
2016-10-08 16:48:30eryksunsetmessages: + msg278310
2016-10-08 16:31:27serhiy.storchakasetmessages: + msg278308
2016-10-08 16:19:26eryksunsetfiles: + issue_27998_01.patch

messages: + msg278306
2016-10-08 07:53:17serhiy.storchakasetfiles: + direntry_bytes_path.patch
assignee: docs@python -> serhiy.storchaka
messages: + msg278286

stage: needs patch -> patch review
2016-10-08 07:23:24serhiy.storchakasetpriority: normal -> release blocker
nosy: + ned.deily
messages: + msg278285

2016-10-05 21:06:15serhiy.storchakasetassignee: serhiy.storchaka -> docs@python
components: + Documentation, - Extension Modules
versions: + Python 3.7
nosy: + docs@python

messages: + msg278150
stage: patch review -> needs patch
2016-10-05 20:19:18python-devsetnosy: + python-dev
messages: + msg278147
2016-10-05 20:08:24serhiy.storchakasetassignee: serhiy.storchaka
title: Remove support of bytes paths in os.scandir() -> Bytes paths now are supported in os.scandir() on Windows
stage: needs patch -> patch review
2016-10-04 20:06:56vstinnersetmessages: + msg278085
2016-10-04 18:55:38gvanrossumsetmessages: + msg278077
2016-10-04 17:55:25serhiy.storchakasetmessages: + msg278071
2016-09-25 20:41:36gvanrossumsetmessages: + msg277397
2016-09-25 10:57:12serhiy.storchakasetfiles: + cleanup_scandir_bytes.patch
keywords: + patch
messages: + msg277363
2016-09-07 17:48:01steve.dowersetmessages: + msg274851
2016-09-07 17:45:55vstinnersetmessages: + msg274850
2016-09-07 17:40:32serhiy.storchakasetmessages: + msg274848
2016-09-07 17:31:25steve.dowersettitle: Add support of bytes paths in os.scandir() -> Remove support of bytes paths in os.scandir()
2016-09-07 17:31:03steve.dowersetstatus: closed -> open


dependencies: + Change sys.getfilesystemencoding() on Windows to UTF-8
type: enhancement -> behavior
stage: resolved -> needs patch
versions: + Python 3.6
nosy: + gvanrossum
messages: + msg274846
superseder: Change sys.getfilesystemencoding() on Windows to UTF-8 ->
resolution: duplicate -> (no value)
2016-09-07 14:08:02steve.dowersetmessages: + msg274812
2016-09-07 09:07:41eryksunsetstatus: open -> closed

superseder: Change sys.getfilesystemencoding() on Windows to UTF-8

nosy: + eryksun
messages: + msg274782
resolution: duplicate
stage: resolved
2016-09-07 08:27:22paul.mooresetmessages: + msg274779
2016-09-07 08:21:47serhiy.storchakasettitle: Add support of butes paths in os.scandir() -> Add support of bytes paths in os.scandir()
2016-09-07 08:21:10serhiy.storchakacreate