This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: replace readdir to readdir_r in function posix_listdir
Type: enhancement Stage:
Components: Extension Modules Versions: Python 3.4
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Rock, r.david.murray, rosslagerwall
Priority: normal Keywords: patch

Created on 2013-03-15 09:01 by Rock, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
mywork.patch Rock, 2013-03-15 09:01 the proposal patch review
Messages (6)
msg184224 - (view) Author: Rock Li (Rock) Date: 2013-03-15 09:01
When I'm how glob module is implemented. I found in file posixmodule.c, the function posix_listdir is using readdir to get all the entries under one directory and the context is setted to allow multi threads. But the function readdir is not thread-safe, so I changed this call to use readdir_r instead. 

I'm using the lastest codes in the repo.
msg184250 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2013-03-15 19:56
Hi,

There shouldn't be a problem with the existing implementation since, according to posix:
"""
The pointer returned by readdir() points to data which may be overwritten by another call to readdir() on the same directory stream. This data is not overwritten by another call to readdir() on a different directory stream.
"""

Since each call to listdir() opens a different directory stream, this shouldn't be a problem.

Also, when using readdir_r(), struct dirent should not be allocated on the stack since the last field may be an unspecified size.
See the following for further reading about this:
http://elliotth.blogspot.co.uk/2012/10/how-not-to-use-readdirr3.html
http://womble.decadent.org.uk/readdir_r-advisory.html

Regards
msg184283 - (view) Author: Rock Li (Rock) Date: 2013-03-16 01:52
Hi Ross, 

What about if one implementation of posix use the global variable to store something. From `man readdir`, I didn't see the sentence "This data is not overwritten by another call to readdir() on a different directory stream.".

And from the link you gived, 
""
Background

The POSIX readdir_r function is a thread-safe version of the readdir function used to read directory entries. Whereas readdir returns a pointer to a system-allocated buffer and may use global state without mutual exclusion, readdir_r uses a user-supplied buffer and is guaranteed to be reentrant. Its use is therefore preferable or even essential in portable multithreaded programs.
""

Actually readdir_r is guaranteed to be reentrant. By using readdir, I think just like to use errno in a multi thread programs, we can't always think it will work as the right way. But readdir_r, we can. 

And another question, you say 
"struct dirent should not be allocated on the stack since the last field may be an unspecified size."

What does this mean? Can you explain detail to me?
Thx.
msg184286 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-03-16 02:30
My man page says "The  data  returned by readdir() may be overwritten by subsequent calls to readdir() for the same directory stream."  Ross I believe was quoting from the posix *spec*, so that is a more complete explanation.  If an implementation is doing what you said, it would be out of compliance with posix.

Python opens a new directory stream inside listdir, and that directory stream is not exposed to the Python code, so there is no way for another Python thread to cause a problem here.
msg184296 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2013-03-16 06:37
That text was from the POSIX 2008 spec:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html

The following text from my copy of the readdir manpage gives an indication of how you *should* allocate struct dirent when using readdir_r:
"""
Since  POSIX.1 does not specify the size of the d_name field, and other nonstandard fields may precede that field within the dirent structure, portable applications that use readdir_r() should allocate the buffer whose address is passed in entry as follows:

           name_max = pathconf(dirpath, _PC_NAME_MAX);
           if (name_max == -1)         /* Limit not defined, or error */
               name_max = 255;         /* Take a guess */
           len = offsetof(struct dirent, d_name) + name_max + 1;
           entryp = malloc(len);

(POSIX.1 requires that d_name is the last field in a struct dirent.)
"""

I hope that helps
msg184317 - (view) Author: Rock Li (Rock) Date: 2013-03-16 14:37
You are right. I checked the GLibc implementation of readdir and readdir_r, there's no global variale used in the function readdir. Just as the POSIX standards says, "This data is not overwritten by another call to readdir() on a different directory stream". 

To the second question, now I understood. POSIX does not specify the size of the d_name field. This will cause our codes a bit inconvenient.

Next time, I will check the POSIX specifications and related several implementations first, not just the documents.

Thx All.
History
Date User Action Args
2022-04-11 14:57:42adminsetgithub: 61630
2013-03-16 14:37:39Rocksetstatus: open -> closed

messages: + msg184317
2013-03-16 06:37:22rosslagerwallsetmessages: + msg184296
2013-03-16 02:30:54r.david.murraysetnosy: + r.david.murray
messages: + msg184286
2013-03-16 01:52:02Rocksetmessages: + msg184283
2013-03-15 19:56:45rosslagerwallsetnosy: + rosslagerwall
messages: + msg184250
2013-03-15 09:01:41Rockcreate