classification
Title: Make REG_MULTI_SZ support zero-length strings
Type: behavior Stage: resolved
Components: Windows Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: steve.dower Nosy List: ZackerySpytz, eryksun, jaraco, miss-islington, nanonyme, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2018-01-17 18:09 by nanonyme, last changed 2019-09-09 14:25 by miss-islington. This issue is now closed.

Files
File name Uploaded Description Edit
rtlqueryreg.py eryksun, 2018-01-18 00:38
Pull Requests
URL Status Linked Edit
PR 13239 merged ZackerySpytz, 2019-05-10 21:32
PR 15745 merged miss-islington, 2019-09-09 09:26
PR 15746 merged miss-islington, 2019-09-09 09:28
PR 15766 merged steve.dower, 2019-09-09 12:16
PR 15777 merged miss-islington, 2019-09-09 13:24
PR 15778 merged miss-islington, 2019-09-09 13:24
Messages (16)
msg310202 - (view) Author: Seppo Yli-Olli (nanonyme) Date: 2018-01-17 18:09
MSDN documents that REG_MULTI_SZ is not supposed to have \0\0 anywhere else than in the end. The comment in  
https://github.com/python/cpython/blob/a5293b4ff2c1b5446947b4986f98ecf5d52432d4/PC/winreg.c#L504
clearly shows that Python has the assumption that this specification is actually correct. However, Microsoft is violating it eg in https://technet.microsoft.com/en-us/library/cc960241.aspx which prevents you from reading that registry key in Python at all. This is a list which is treated as pairs:
"""
foo

meep
bar
"""
so you can have empty items which in Windows semantics means "remove this file". This results in a string like "foo\0\0meep\0\bar\0\0". I'm proposing relaxing Python registry handling semantics because it's clearly Microsoft isn't following this either
msg310216 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-01-18 00:38
PyWin32's win32api.RegQueryValueEx and win32api.RegEnumValue also use the same documented interpretation of REG_MULTI_SZ [1] (i.e. a "sequence of null-terminated strings, terminated by an empty string").

[1]: https://msdn.microsoft.com/en-us/library/ms724884

However, in practice Windows doesn't follow this literal interpretation. For example, the Session Manager (smss.exe) reads the "PendingFileRenameOperations" REG_MULTI_SZ value using the documented runtime library function RtlQueryRegistryValues [2]. By default, this function calls the specified QueryRoutine for each null-terminated string when reading a REG_MULTI_SZ value. 

[2]: https://msdn.microsoft.com/en-us/library/ff562046

I've verified that RtlQueryRegistryValues does not terminate the read on the first empty string, but instead continues to iterate through the entire value. See the attached test script, rtlqueryreg.py. The following output was obtained after two calls to MoveFileEx with the flag MOVEFILE_DELAY_UNTIL_REBOOT that pended operations to delete "foo" and rename "meep" to "bar":

    PendingFileRenameOperations, REG_SZ, \??\C:\Temp\foo
    PendingFileRenameOperations, REG_SZ, <empty>
    PendingFileRenameOperations, REG_SZ, \??\C:\Temp\meep
    PendingFileRenameOperations, REG_SZ, \??\C:\Temp\bar

Currently, winreg terminates reading this value at the first empty string. For example:

    >>> hkey = winreg.OpenKey(winreg.HKEY_LOCAL_MACHINE,
    ...     r'SYSTEM\CurrentControlSet\Control\Session Manager')
    >>> winreg.QueryValueEx(hkey, 'PendingFileRenameOperations')
    (['\\??\\C:\\Temp\\foo'], 7)

This behavior is inconsistent with Windows, which I think presents a strong case for the suggested change. As always, there's an issue with introducing breaking changes. It may be too late in the development cycle to introduce this change in 3.7.
msg310217 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-01-18 01:03
I don't think it's too late (and I don't think this counts as a feature either, so the b1 deadline doesn't apply), but the patch would need to include an update to What's New as well as the usual stuff.
msg310254 - (view) Author: Seppo Yli-Olli (nanonyme) Date: 2018-01-18 19:28
I don't personally know enough of CPython (I barely know enough C) to do the patches but I'd just comment that as this might increase complexity of implementation and wide char API's are used, it might be prudent to make sure the new implementation works with some more exotic input than fits into ASCII. PendingFileRenameOperations can contain any valid Windows path. Microsoft also warns that the end null byte can be redacted in some cases by the party storing the data.
msg310266 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-01-19 00:50
Currently countStrings and fixupMultiSZ halt the outer loop as soon as they hit an empty string. Basically, we'd be getting rid of that check. Then we have to prevent including an empty string for the final NUL in the normal case. We can do this by decrementing the length by 1 if the data ends on a NUL. We also need a check in the inner loop of fixupMultiSZ to ensure it doesn't get out of bounds. 

Example:

    static void
    fixupMultiSZ(wchar_t **str, wchar_t *data, int len)
    {
        int i;
        wchar_t *P, *Q;

        if (len > 0 && data[len - 1] == '\0') {
            Q = data + len - 1;
        } else {
            Q = data + len;
        }

        for (P = data, i = 0; P < Q; P++, i++) {
            str[i] = P;
            for(; P < Q && *P != '\0'; P++)
                ;
        }
    }

    static int
    countStrings(wchar_t *data, int len)
    {
        int strings;
        wchar_t *P, *Q;

        if (len > 0 && data[len - 1] == '\0') {
            Q = data + len - 1;
        } else {
            Q = data + len;
        }

        for (P = data, strings = 0; P < Q; P++, strings++) {
            for (; P < Q && *P != '\0'; P++)
                ;
        }
        return strings;
    }

Also, the REG_MULTI_SZ case in Reg2Py should use wcsnlen instead of wcslen, in case of malformed data that doesn't end on at least one NUL character. The upper limit would be the remaining length.
msg342349 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-05-13 15:54
Updating the title to actually reflect the issue - Zackery, could you update your PR and NEWS entry?

I just spent way too long looking for a MoveFileEx call that doesn't exist, because this isn't actually about doing things on reboot :)

For tests, you should be able to create your own REG_MULTI_SZ key with zero-length strings and read it back. If the winreg module won't let you do it, you may need ctypes (but that's okay in Windows-only tests).

Alternatively, if you want to dive deeper into the C API code, you could expose the fixupMultiSz function as a private function (winreg._split_multi_sz) and test it directly. That's my preferred way of unit testing things like this, but it's a decent amount of boilerplate just for a test.
msg342385 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-05-13 21:15
> For tests, you should be able to create your own REG_MULTI_SZ key with 
> zero-length strings and read it back. If the winreg module won't let 
> you do it, you may need ctypes (but that's okay in Windows-only 
> tests).

winreg.SetValueEx can create the value. We just need to add a case to test_case in Lib/test/test_winreg.py, such as the following:

    ("Multi-nul", ["", "", "", ""], REG_MULTI_SZ)

> I just spent way too long looking for a MoveFileEx call that doesn't 
> exist, because this isn't actually about doing things on reboot :)

Do you mean a variation of MOVEFILE_DELAY_UNTIL_REBOOT that doesn't require administrator access? That would be useful if it existed. I suppose a MOVEFILE_DELAY_UNTIL_LOGOFF flag could be added to have the logon process (winlogon.exe, instead of smss.exe) perform the delete/rename operation under impersonation at the end of the session, or (given a system crash or power failure) the next time the user logs on.
msg342431 - (view) Author: Zackery Spytz (ZackerySpytz) * (Python triager) Date: 2019-05-14 04:41
Thank you for the comments. I've updated the PR.

> winreg.SetValueEx can create the value. We just need to add a case to test_case in Lib/test/test_winreg.py, such as the following:

test_case? I think you mean test_data?
msg351376 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-09-09 09:26
New changeset e223ba13d8d871ee58570dfca4e82a591189cc2f by Steve Dower (Zackery Spytz) in branch 'master':
bpo-32587: Make winreg.REG_MULTI_SZ support zero-length strings (#13239)
https://github.com/python/cpython/commit/e223ba13d8d871ee58570dfca4e82a591189cc2f
msg351401 - (view) Author: miss-islington (miss-islington) Date: 2019-09-09 10:04
New changeset bbf02da42e2368ae6b40015d6e92eaac4120f2dc by Miss Islington (bot) in branch '3.7':
bpo-32587: Make winreg.REG_MULTI_SZ support zero-length strings (GH-13239)
https://github.com/python/cpython/commit/bbf02da42e2368ae6b40015d6e92eaac4120f2dc
msg351402 - (view) Author: miss-islington (miss-islington) Date: 2019-09-09 10:11
New changeset ebca7eb093f31052ff9f245b306d38941c28a1ad by Miss Islington (bot) in branch '3.8':
bpo-32587: Make winreg.REG_MULTI_SZ support zero-length strings (GH-13239)
https://github.com/python/cpython/commit/ebca7eb093f31052ff9f245b306d38941c28a1ad
msg351411 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-09-09 10:33
This introduced a new warning:

c:\projects\cpython\pc\winreg.c(775): warning C4267: '-=': conversion from 'size_t' to 'int', possible loss of data [C:\Projects\cpython\PCbuild\pythoncore.vcxproj]

I'll fix it (after I'm done with my current task) unless someone else gets there first.
msg351456 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-09-09 13:24
New changeset ef66f31ce21cd759cc0c618c5c42ba6da0a06834 by Jason R. Coombs (Steve Dower) in branch 'master':
bpo-32587: Fixes unsafe downcast in PC/winreg.c (GH-15766)
https://github.com/python/cpython/commit/ef66f31ce21cd759cc0c618c5c42ba6da0a06834
msg351459 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-09-09 13:33
Fixed, assuming the backports merge okay.
msg351477 - (view) Author: miss-islington (miss-islington) Date: 2019-09-09 14:23
New changeset 44729c9f5198211faf533da49fa0fa26693a1993 by Miss Islington (bot) in branch '3.8':
bpo-32587: Fixes unsafe downcast in PC/winreg.c (GH-15766)
https://github.com/python/cpython/commit/44729c9f5198211faf533da49fa0fa26693a1993
msg351479 - (view) Author: miss-islington (miss-islington) Date: 2019-09-09 14:25
New changeset 55a6f73b49625ebff575521c3a0b919880f5f0ec by Miss Islington (bot) in branch '3.7':
bpo-32587: Fixes unsafe downcast in PC/winreg.c (GH-15766)
https://github.com/python/cpython/commit/55a6f73b49625ebff575521c3a0b919880f5f0ec
History
Date User Action Args
2019-09-09 14:25:23miss-islingtonsetmessages: + msg351479
2019-09-09 14:23:39miss-islingtonsetmessages: + msg351477
2019-09-09 13:33:21steve.dowersetstatus: open -> closed
messages: + msg351459

assignee: steve.dower
resolution: fixed
stage: patch review -> resolved
2019-09-09 13:24:37miss-islingtonsetpull_requests: + pull_request15431
2019-09-09 13:24:30miss-islingtonsetpull_requests: + pull_request15430
2019-09-09 13:24:18jaracosetnosy: + jaraco
messages: + msg351456
2019-09-09 12:16:03steve.dowersetpull_requests: + pull_request15419
2019-09-09 10:33:57steve.dowersetmessages: + msg351411
2019-09-09 10:11:04miss-islingtonsetmessages: + msg351402
2019-09-09 10:04:57miss-islingtonsetnosy: + miss-islington
messages: + msg351401
2019-09-09 09:28:37miss-islingtonsetpull_requests: + pull_request15400
2019-09-09 09:26:27miss-islingtonsetpull_requests: + pull_request15399
2019-09-09 09:26:18steve.dowersetmessages: + msg351376
2019-05-14 04:41:22ZackerySpytzsetmessages: + msg342431
2019-05-13 21:15:18eryksunsetmessages: + msg342385
2019-05-13 15:54:04steve.dowersetmessages: + msg342349
title: Make REG_MULTI_SZ support PendingFileRenameOperations -> Make REG_MULTI_SZ support zero-length strings
2019-05-10 22:11:08ZackerySpytzsetnosy: + ZackerySpytz
2019-05-10 21:32:03ZackerySpytzsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request13149
2018-01-19 00:50:33eryksunsetmessages: + msg310266
2018-01-18 19:28:39nanonymesetmessages: + msg310254
2018-01-18 01:03:38steve.dowersetmessages: + msg310217
2018-01-18 00:38:23eryksunsetfiles: + rtlqueryreg.py

type: behavior
versions: + Python 3.7, Python 3.8
nosy: + eryksun

messages: + msg310216
stage: needs patch
2018-01-17 20:20:19ned.deilysetnosy: + paul.moore, tim.golden, zach.ware, steve.dower
components: + Windows
2018-01-17 18:09:12nanonymecreate