This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author izbyshev
Recipients eryksun, izbyshev, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Date 2021-01-26.11:20:52
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1611660053.85.0.382827635524.issue42606@roundup.psfhosted.org>
In-reply-to
Content
> I think truncation via TRUNCATE_EXISTING (O_TRUNC, with O_WRONLY or O_RDWR) or overwriting with CREATE_ALWAYS (O_CREAT | O_TRUNC) is at least tolerable because the caller doesn't care about the existing data. 

Yes, I had a thought that creating or truncating a file when asked is in some sense "less wrong" than deleting an existing file on open() failure, but I'm still not comfortable with it. It would be nice, for example, if an open() with O_CREAT | O_EXCL failed, then the file would indeed not be created in all cases.

>> Truncation can simply be deferred until we have the fd and then performed manually.

> If the file itself has to be overwritten (i.e. the default, anonymous data stream), as opposed to a named data stream, it would have to delete all named data streams and extended attributes in the file. Normally that's all implemented atomically in the filesystem. 

> In contrast, TRUNCATE_EXISTING (O_TRUNC) is simple to emulate, since CreateFileW implents it non-atomically with a subsequent NtSetInformationFile: FileAllocationInformation system call. 

Oh. So CREATE_ALWAYS for an existing file has a very different semantics than TRUNCATE_EXISTING, which means we can't easily use OPEN_ALWAYS with a deferred manual truncation, I see.

>> But I still don't know how to deal with O_TEMPORARY, unless there is a 
>> way to unset FILE_DELETE_ON_CLOSE on a handle.

> For now, that's possible with NTFS and the Windows API in all supported versions of Windows by using a second kernel File with DELETE access, which is opened before the last handle to the first kernel File is closed. After you close the first open, use the second one to call SetFileInformation: FileDispositionInfo to undelete the file.

So the idea is to delete the file for a brief period, but then undelete it. As I understand it, any attempt to open the file while it's in the deleted state (but still has a directory entry) will fail. This is probably not critical since it could happen only on an unlikely failure of _open_oshandle(), but is still less than perfect.

> Windows 10 supports additional flags with FileDispositionInfoEx (21), or NTAPI FileDispositionInformationEx [1]. This provides a better way to disable or modify the delete-on-close state per kernel File object, if the filesystem supports it.

This is nice and would reduce our non-atomicity to just the following: if _wopen() would have failed even before CreateFile() (e.g. due to EMFILE), our reimplementation could still create a handle with FILE_DELETE_ON_CLOSE, so if our process is terminated before we unset it, we'll still lose the file. But it seems like the best we can get in any situation when we need to wrap a handle with an fd.

===

Regarding using WriteFile()-with-OVERLAPPED approach in FileIO, I've looked at edge cases of error remapping in MSVCRT write() again. If the underlying WriteFile() fails, it only remaps ERROR_ACCESS_DENIED to EBADF, presumably to deal with write() on a O_RDONLY fd. But if WriteFile() writes zero bytes and does not fail, it gets more interesting:

* If the file type is FILE_TYPE_CHAR and the first byte to write was 26 (Ctrl-Z), it's treated as success. I don't think I understand: it *seems* like it's handling something like writing to the *input* of a console, but I'm not sure it's even possible in this manner.

* Anything else is a failure with ENOSPC. This is mysterious too.

I've checked how java.io.FileOutputStream deals with WriteFile() succeeding with zero size (just to compare with another language runtime) and haven't found any special handling[1]. Moreover, it seems like FileOutputStream.write(bytes[]) will enter an infinite loop if WriteFile() always returns 0 for a non-empty byte array[2]. So it's unclear to me what MSVCRT is up to here and whether FileIO should do the same.

[1] https://github.com/openjdk/jdk/blob/b4ace3e9799dab5970d84094a0fd1b2d64c7f923/src/java.base/windows/native/libjava/io_util_md.c#L522
[2] https://github.com/openjdk/jdk/blob/b4ace3e9799dab5970d84094a0fd1b2d64c7f923/src/java.base/share/native/libjava/io_util.c#L195
History
Date User Action Args
2021-01-26 11:20:53izbyshevsetrecipients: + izbyshev, paul.moore, vstinner, tim.golden, zach.ware, eryksun, steve.dower
2021-01-26 11:20:53izbyshevsetmessageid: <1611660053.85.0.382827635524.issue42606@roundup.psfhosted.org>
2021-01-26 11:20:53izbyshevlinkissue42606 messages
2021-01-26 11:20:52izbyshevcreate