Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(55645)

#22524: PEP 471 implementation: os.scandir() directory scanning function

Can't Edit
Can't Publish+Mail
Start Review
Created:
4 years, 8 months ago by benhoyt
Modified:
4 years, 3 months ago
Reviewers:
berker.peksag, victor.stinner, storchaka, jimjjewett
CC:
tebeka, AntoinePitrou, haypo, giampaolo.rodola, tim.golden, benhoyt, abacabadabacaba_gmail.com, akira, socketpair, devnull_psf.upfronthosting.co.za, Josh.R, alexei.romanov
Visibility:
Public.

Patch Set 1 #

Total comments: 64

Patch Set 2 #

Total comments: 3

Patch Set 3 #

Total comments: 49

Patch Set 4 #

Patch Set 5 #

Total comments: 24

Patch Set 6 #

Patch Set 7 #

Total comments: 65

Patch Set 8 #

Patch Set 9 #

Unified diffs Side-by-side diffs Delta from patch set Stats Patch
Doc/whatsnew/3.5.rst View 1 2 3 4 5 6 7 8 4 chunks +18 lines, -8 lines 0 comments Download

Messages

Total messages: 15
berkerpeksag
I only looked at the tests, so you can ignore my comments for now :) ...
4 years, 8 months ago #1
victor.stinner_gmail.com
http://bugs.python.org/review/22524/diff/13005/Modules/posixmodule.c File Modules/posixmodule.c (right): http://bugs.python.org/review/22524/diff/13005/Modules/posixmodule.c#newcode16365 Modules/posixmodule.c:16365: static char *_follow_symlinks_keywords[] = {"follow_symlinks", NULL}; no need to ...
4 years, 8 months ago #2
benhoyt
I've updated the patch per Victor's code review at http://bugs.python.org/issue22524 (see scandir-2.patch) http://bugs.python.org/review/22524/diff/13005/Modules/posixmodule.c File Modules/posixmodule.c ...
4 years, 8 months ago #3
victor.stinner_gmail.com
http://bugs.python.org/review/22524/diff/13913/Lib/os.py File Lib/os.py (right): http://bugs.python.org/review/22524/diff/13913/Lib/os.py#newcode1058 Lib/os.py:1058: dir_it = _scandir(path) Oh, this variable is useless. http://bugs.python.org/review/22524/diff/13913/Lib/os.py#newcode1138 ...
4 years, 4 months ago #4
storchaka_gmail.com
http://bugs.python.org/review/22524/diff/13913/Lib/os.py File Lib/os.py (right): http://bugs.python.org/review/22524/diff/13913/Lib/os.py#newcode993 Lib/os.py:993: if name == 'nt': Some methods are common in ...
4 years, 4 months ago #5
victor.stinner_gmail.com
http://bugs.python.org/review/22524/diff/13913/Lib/os.py File Lib/os.py (right): http://bugs.python.org/review/22524/diff/13913/Lib/os.py#newcode993 Lib/os.py:993: if name == 'nt': On 2015/02/12 14:05:17, storchaka wrote: ...
4 years, 4 months ago #6
storchaka_gmail.com
http://bugs.python.org/review/22524/diff/13913/Lib/os.py File Lib/os.py (right): http://bugs.python.org/review/22524/diff/13913/Lib/os.py#newcode1008 Lib/os.py:1008: lstat = stat(self.path, follow_symlinks=False) On 2015/02/12 15:41:48, haypo wrote: ...
4 years, 4 months ago #7
victor.stinner_gmail.com
http://bugs.python.org/review/22524/diff/13913/Lib/os.py File Lib/os.py (right): http://bugs.python.org/review/22524/diff/13913/Lib/os.py#newcode1008 Lib/os.py:1008: lstat = stat(self.path, follow_symlinks=False) On 2015/02/12 16:11:39, storchaka wrote: ...
4 years, 4 months ago #8
storchaka_gmail.com
https://bugs.python.org/review/22524/diff/13917/Lib/os.py File Lib/os.py (right): https://bugs.python.org/review/22524/diff/13917/Lib/os.py#newcode1025 Lib/os.py:1025: if name == 'nt' and self._inode is None: self._inode ...
4 years, 4 months ago #9
victor.stinner_gmail.com
https://bugs.python.org/review/22524/diff/13917/Lib/os.py File Lib/os.py (right): https://bugs.python.org/review/22524/diff/13917/Lib/os.py#newcode1025 Lib/os.py:1025: if name == 'nt' and self._inode is None: On ...
4 years, 4 months ago #10
Jim.J.Jewett
I was NOT reviewing for correctness; I was thinking of Paul Moore's comment that he ...
4 years, 4 months ago #11
storchaka_gmail.com
https://bugs.python.org/review/22524/diff/13917/Lib/test/test_os.py File Lib/test/test_os.py (right): https://bugs.python.org/review/22524/diff/13917/Lib/test/test_os.py#newcode2862 Lib/test/test_os.py:2862: entry.stat(follow_symlinks=False) On 2015/02/13 15:14:14, haypo wrote: > On 2015/02/12 ...
4 years, 4 months ago #12
storchaka_gmail.com
http://bugs.python.org/review/22524/diff/13104/Modules/posixmodule.c File Modules/posixmodule.c (right): http://bugs.python.org/review/22524/diff/13104/Modules/posixmodule.c#newcode16373 Modules/posixmodule.c:16373: #ifdef MS_WINDOWS On 2015/02/13 17:15:47, Jim.J.Jewett wrote: > Does ...
4 years, 4 months ago #13
victor.stinner_gmail.com
First review of the C code. I didn't read the doc or tests. http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c File ...
4 years, 3 months ago #14
benhoyt
4 years, 3 months ago #15
Thanks -- comments below. I've made these changes and will upload
scandir-8.patch in the next few minutes.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c
File Modules/posixmodule.c (right):

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16450: __int64 win32_file_index;
On 2015/02/27 16:33:00, haypo wrote:
> I'm surprised that the type is signed: nFileIndexHigh and nFileIndexLow fields
> of BY_HANDLE_FILE_INFORMATION are unsigned (DWORD). But since
> Py_stat_struct.st_ino already uses the type __int64, it's better to discuss
this
> in a separate issue.

Yes, this surprised me too, but I just copied what Py_stat_struct.st_ino did, so
agreed if it's an issue it should be a separate issue.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16451: int got_file_index;
On 2015/02/27 16:33:00, haypo wrote:
> If I remember correctly, file index cannot be zero on Windows. You may use 0
to
> check if the file index is known or not. But I'm not sure, so you may keep
> got_file_index.

I thought about that, and it's *probably* okay, but I'm pretty hesitant to do
that, as there's nothing at all about nFileIndex not being zero in the MS
documentation:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa363788(v=vs.85).aspx

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16468: } ScandirIterator;
On 2015/02/27 16:33:00, haypo wrote:
> Maybe you should move ScandirIterator structor below, after DirEntry code, to
> move it closer to the code using it.

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16480: /* Forward reference to make it easier to implement
DirEntry_is_symlink etc */
On 2015/02/27 16:33:00, haypo wrote:
> Just say "Forward reference", it's enough ;-)

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16490: if (self->d_type != DT_UNKNOWN) {
On 2015/02/27 16:33:00, haypo wrote:
> If the body has only one line, you can avoid { and } characters.

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16532: if (follow_symlinks) {
On 2015/02/27 16:33:00, haypo wrote:
> To not put almost the whole function in an if block, you can invert the test:
> 
> if (!follow_symlinks)
>     return DirEntry_get_lstat(self);
> 
> ...

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16550: if (!po_is_symlink) {
On 2015/02/27 16:33:00, haypo wrote:
> Having to handle NULL and to call PyObject_IsTrue() is annoying. You should
> split is_symlink() in two parts: a C version which returns an int (0 or 1),
and
> a Python wrapper which simply calls PyBool_FromLong().
> 
> So you can reuse it for the Windows path a few line above, and it will
probably
> break the dependency cycle.

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16592: int result = 0;
On 2015/02/27 16:33:00, haypo wrote:
> It's strange that you have to initialize result here. It's always set, no?

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16595: unsigned long dir_bits;
On 2015/02/27 16:33:00, haypo wrote:
> Not needed on POSIX. Only declare dir_bits on Windows: put it into #ifdef
> MS_WINDOWS.

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16625: Py_DECREF(stat);
On 2015/02/27 16:33:00, haypo wrote:
> Even if the code is currently ok, it's safer to use Py_CLEAR(), so it's still
> safe to call "goto error".

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16705: path_cleanup(&path);
On 2015/02/27 16:33:00, haypo wrote:
> You don't need the path_t type here:
> 
> wchar_t *path = PyUnicode_AsUnicode(self->path);
> if (path == NULL)
>     return NULL;
> 
> if (win32_lstat_w(path.wide, &stat) != 0) {
>     Py_DECREF(path);
>     return PyErr_SetExcFromWindowsErrWithFilenameObject(PyExc_OSError,
>                                                         0, self->path);
> }
> Py_DECREF(path);

Thanks -- good call. Done. Though I don't think you want Py_DECREF called on
path, because PyUnicode_AsUnicode returns a read-only pointer, doesn't create a
new object. It's also a wchar_t, not a PyObject.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16722: return PyUnicode_FromFormat("<" MODNAME ".DirEntry
%R>", self->name);
On 2015/02/27 16:33:00, haypo wrote:
> Hum, you should just use "<DirEntry %r>", drop the module name. And it can be
> surprising to see "nt" or "posix" module name.

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16803: result = PyMem_Malloc((path_len + 1 +
wcslen(filename) + 1) * sizeof(wchar_t));
On 2015/02/27 16:33:00, haypo wrote:
> You should use PyMem_New() to detect integer overflow. Example:
> 
> size = path_len + 1 + wcslen(filename) + 1;
> result = PyMem_New(wchar_t, size);
> 
> The new size variable is required, because PyMem_New() is a macro which
> duplicates parameters.

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16812: result[path_len++] = SEP;
On 2015/02/27 16:33:00, haypo wrote:
> ntpath.join('c:', 'name) returns 'c:name'. I'm surprised, I expected
"c:\name".
> I didn't know that "c:name" is valid. I should try scandir("c:") on Windows.

Hmmm, that's kinda weird. However, I copied this from os.listdir(), which does
exactly the same thing, so I'm loathe to change it in case there's a good
reason. Can open a separate issue if needed.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16820: DirEntry_new(path_t *path, WIN32_FIND_DATAW *dataW)
On 2015/02/27 16:33:00, haypo wrote:
> I would prefer a "From" name: DirEntry_FromFindData() for example. "new" is
> usually used for regular constructor taking variable parameters.

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16837: entry->name =
PyUnicode_FromWideChar(dataW->cFileName, wcslen(dataW->cFileName));
On 2015/02/27 16:33:00, haypo wrote:
> You can just pass -1 for the length. (FromWideChar calls wcslen() for you.)

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16858: Py_XDECREF(entry);
On 2015/02/27 16:33:00, haypo wrote:
> Py_DECREF() is ok here.

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16867: 
On 2015/02/27 16:33:00, haypo wrote:
> You shouldn't call FindClose() with INVALID_HANDLE_VALUE. Add:
> 
> if (iterator->handle == INVALID_HANDLE_VALUE)
>    return 1;

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16874: iterator->handle = INVALID_HANDLE_VALUE;
On 2015/02/27 16:33:00, haypo wrote:
> Hum, you should set "iterator->handle = INVALID_HANDLE_VALUE;" even if
> FindClose() failed.

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16897: ScandirIterator_close(iterator);
On 2015/02/27 16:33:00, haypo wrote:
> I'm not sure that it's ok call FindClose() here. You may retry calling next()
> later, no?
> 
> By the way, ScandirIterator_close(iterator) can replace the Windows last error
> (SetLastError).

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16912: iterator->first_time = 0;
On 2015/02/27 16:33:00, haypo wrote:
> It the first file is not "." or "..", next() always return the same file. You
> should move this line before the previous if.

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16926: join_path_filenameA(char *path_narrow, char*
filename, Py_ssize_t filename_len)
On 2015/02/27 16:33:00, haypo wrote:
> I hate these W/A suffix. Please avoid them on POSIX! Rename the function to
> join_path_filename(), or just join_path().

I hate them too, but I think this is helpful to distinguish, given there's a
join_path_filenameW for Windows.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16944: result = PyMem_Malloc(path_len + 1 + filename_len +
1);
On 2015/02/27 16:33:00, haypo wrote:
> Ditto: use PyMem_New().

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:16978: if (!path->narrow || !PyBytes_Check(path->object))
{
On 2015/02/27 16:33:00, haypo wrote:
> This is just wrong: you must always decode the name from the filesystem
> encoding. I don't understand this test.

I think this is actually correct. I mimicked the behaviour of os.listdir() here.
What's happening is there are 3 ways to call scandir: no args (which is
equivalent to '.'), unicode arg, bytes arg:

1) scandir(): path.narrow=NULL -> unicode
2) scandir('unicode'): path.narrow=char*, path.object=u'unicode' -> unicode
3) scandir(b'bytes'): path.narrow=char*, path.object=b'bytes' -> bytes

So this sets name/path to unicode in cases 1 and 2, but to bytes in case 3.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:17004: int result;
On 2015/02/27 16:33:00, haypo wrote:
> You should not call closedir() with NULL. Add:
> 
> if (iteratator->dirp == NULL)
>     return 1;

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:17009: if (result != 0) {
On 2015/02/27 16:33:00, haypo wrote:
> I'm not sure that you do something useful if FindClose/closedir failed. You
may
> just ignore errors :-/ Especially when ScandirIterator_close() is called from
> dealloc.

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:17038: ScandirIterator_close(iterator);
On 2015/02/27 16:33:00, haypo wrote:
> Ditto: don't close the iterator on error. You may retry on error.

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:17070: #endif
On 2015/02/27 16:33:00, haypo wrote:
> Please mention the related #ifdef, I guess: #endif   /* MS_WINDOWS */

That feels weird to me, as it's actually ending the POSIX #else block, or not?

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:17076: path_cleanup(&iterator->path);
On 2015/02/27 16:33:00, haypo wrote:
> IMO it's very important to close the iterator here! 

Done, thanks!

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:17142: "os.scandir() doesn't support bytes path on
Windows, use Unicode instead");
On 2015/02/27 16:33:00, haypo wrote:
> You should use a more generic message:
> 
> "str expected, got %s" % Py_TYPE(iterator->path.object)->tp_name
> 
> Hum, I don't remember if I wrote a test with other types like int and float.

I don't quite understand why. My error message is much more specific and
informative; I'd expect the interpreter to raise a generic message if it didn't
know any better, but I think specific code can be more helpful.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:17156: PyMem_Free(path_strW);  /* We're done with
path_strW now */
On 2015/02/27 16:33:00, haypo wrote:
> IMO the comment is useless.

Done.

http://bugs.python.org/review/22524/diff/14036/Modules/posixmodule.c#newcode1...
Modules/posixmodule.c:17165: iterator->dirp = opendir(iterator->path.narrow ?
iterator->path.narrow : ".");
On 2015/02/27 16:33:00, haypo wrote:
> I would prefer to store the name in a variable, it would ease debug. Example:
> 
> if (iterator->path.narrow)
>    name = iterator->path.narrow;
> else
>    name = ".";
> 
> Py_BEGIN_ALLOW_THREADS
> terator->dirp = opendir(name);
> Py_END_ALLOW_THREADS
> 
> I regulary debug such C code with gdb.

Done.
Sign in to reply to this message.

RSS Feeds Recent Issues | This issue
This is Rietveld 894c83f36cb7+