classification
Title: _winreg.EnumValue sometimes raises WindowsError ("More data is available")
Type: behavior Stage: resolved
Components: Windows Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: brian.curtin Nosy List: brian.curtin, enolte, loewis, r.david.murray, stutzbach, techtonik
Priority: normal Keywords: needs review, patch

Created on 2008-05-10 17:37 by stutzbach, last changed 2010-05-26 18:11 by brian.curtin. This issue is now closed.

Files
File name Uploaded Description Edit
issue_2810.reg enolte, 2008-10-02 17:20 Windows registry file to create two keys containing unicode characters
issue2810_registry.png techtonik, 2010-04-01 17:33 regedit screenshot
winreg.patch stutzbach, 2010-04-07 16:43 Patch to fix this bug
winreg_test.patch stutzbach, 2010-04-28 21:32 Patch to add test cases for this bug
issue2810_py3k.diff brian.curtin, 2010-05-26 14:46
Messages (27)
msg66542 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2008-05-10 17:37
_winreg.EnumValue raises a WindowsError ("More data is available") if
the registry data includes multibyte unicode characters.

Inspecting PyEnumValue in _winreg.c, I believe I see the problem.  The
function uses RegQueryInfoKey to determine the maximum data and key name
sizes to pass to RegEnumValue.

Unfortunately, RegQueryInfoKey returns the size in number of unicode
characters, while RegEnumValue expects a size in bytes.  This is OK if
all the values are ASCII, but it fails if there are any multibyte
unicode characters.

I believe it would be sufficient to multiply the sizes by 4, since
that's the maximum width of a unicode character.

The bug exists in at least Python 2.5 and Python 3.0 (based on source
code inspection).

References:

RegEnumValue: http://msdn.microsoft.com/en-us/library/ms724865(VS.85).aspx

RegQueryInfoKey:
http://msdn.microsoft.com/en-us/library/ms724902(VS.85).aspx
msg66547 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-10 17:52
Is that for Python 2.5 or 3.0?
msg66553 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2008-05-10 18:14
The bug is in both.

On Sat, May 10, 2008 at 12:52 PM, Martin v. Löwis
<report@bugs.python.org> wrote:
>
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
> Is that for Python 2.5 or 3.0?
>
> ----------
> nosy: +loewis
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue2810>
> __________________________________
>
msg66563 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-10 18:57
Can you please provide a test case then? The 3.0 code doesn't use
RegQueryInfoKey, but RegQueryInfoKeyW.
msg66649 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2008-05-11 18:39
After several failed attempts at making a test case, and stepping
through C code with a debugger, I see that my initial diagnose is
quite wrong.  RegQueryInfoKey *does* return the sizes in units of
bytes (even though the Microsoft documentation says otherwise).  My
apologies.

I do still have a stack trace from an end-user of my python2.5-based
product, showing that _winreg.EnumValue raises:
WindowsError: [Error 234] More data is available

The application reliably crashes on start-up for this user, when
trying to read some registry entries written by another program and
hitting the above exception.

Unfortunately, I have been unable to reproduce the problem locally.  I
tried a variety of Unicode characters (including some that encode to 4
bytes), and that didn't raise an exception.  I also tried putting some
very long data strings (more than 64kb) into the registry, and that
worked fine too (even though the Microsoft documentation says the ANSI
version *should* return the above exception!).

I'm going to try building a custom PyEnumValue that will dynamically
grow the buffer size when that error occurs.  I'll report back on how
that works out for the end user.

In the meantime, I'm open to other theories on what might cause
RegEnumValue to fail with that error.

The end user is running Vista, if it matters.
msg66653 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-11 19:21
I suggest to use regedit /e to dump the failing key into a file. That
should allow to reproduce it on a different system.
msg74181 - (view) Author: Erik Nolte (enolte) Date: 2008-10-02 17:20
To reproduce on Windows XP with Python 2.4, 2.5, and 2.6:

1. Import issue_2810.reg (it creates two keys under
HKCU\PythonTestUnicodeKeys: one with a one byte unicode char in the key,
the other with multiple 2-byte unicode chars)

2. Run the following:
import _winreg

key = _winreg.OpenKey( _winreg.HKEY_CURRENT_USER, "PythonTestUnicodeKeys" )

one_byte_key = _winreg.EnumKey( key, 0 )
two_byte_key = _winreg.EnumKey( key, 1 )

_winreg.CloseKey( key )

try:
    unicode( one_byte_key )
except Exception, ex:
    print "EnumKey didn't return a valid string:", ex
    
print "should be unicode, not str:", two_byte_key.__class__
for ch in two_byte_key:
    print ord( ch ),
print ""
msg77492 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-10 08:28
IIUC, no fix for this bug has been proposed, so the issue is no
candidate for 2.5.3.
msg102113 - (view) Author: anatoly techtonik (techtonik) Date: 2010-04-01 17:29
I propose to close this as "invalid", because the bug with _winreg.EnumValue can not be confirmed.

However, it seems to be impossible to return unicode data from _winreg.EnumKey, and this deserves a new bug.
msg102124 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-04-01 19:24
Attached is a file that reliably produces the "More data is available" error from _winreg.EnumValue in Python 2.6.

The script triggers the error via a race condition, by modifying the value after PyEnumValue() calls RegQueryInfoKey() but before it calls RegEnumValue().  I don't know if that's what caused the original problem for my end-user, but it certainly is one way to trigger the problem.

I can work on a patch next week for a proper test case and to fix PyEnumValue().
msg102125 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-04-01 19:26
Here's another script that causes a "More data is available" error.  This one creates a key with a name that's exactly 256 characters long.  (The Windows Registry Editor can't display the key either)

I'm testing this on XP.  Newer versions of Windows might handle long key names better, or have a different magic length that causes failures.
msg102163 - (view) Author: anatoly techtonik (techtonik) Date: 2010-04-02 12:44
If this doesn't relate to multibyte strings anymore, but just to long strings then I'd open new bug.

If even regedit fails to query then maybe its WinAPI flaw? Maybe it will worth to try ctypes.
msg102170 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-04-02 14:05
It was never 100% clear that it ever related to multi-byte strings, so (as the original reporter) I'd prefer to continue using this bug and just update the title to: _winreg.EnumValue sometimes raises WindowsError ("More data is available").

The patch I have planned will fix the problem regardless of the cause, (except for impossible-to-fix causes, such as if there really is an API flaw).
msg102171 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-04-02 14:28
I just wrote a C program that can read the long key name just fine, so it's not a Windows API bug.
msg102537 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-04-07 13:11
I just found a one line example of the problem:

>>> EnumValue(HKEY_PERFORMANCE_DATA, 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
WindowsError: [Error 234] More data is available

Other functions are also affected:

>>> QueryValueEx(HKEY_PERFORMANCE_DATA, None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
WindowsError: [Error 234] More data is available

In a nutshell, the Python implementation of these functions works like this:
  1) Query the API for the length of the value
  2) Allocate a buffer of that size
  3) Query the API for the value

However, the API functions called in step #1 are not guaranteed to be accurate.  It works *most* of the time, but if step #3 returns ERROR_MORE_DATA then we need to dynamically grow the buffer and try again until the data fits.

I'll have a patch (with test cases) ready later today.
msg102538 - (view) Author: anatoly techtonik (techtonik) Date: 2010-04-07 13:18
Great analysis!
msg102547 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-04-07 16:44
I've uploaded two patches (against trunk): one to add test cases that demonstrate this bug and another to fix the bug.
msg104403 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-04-28 03:29
test_changing_value is giving inconsistent results when the _winreg.c patch is not applied. It mostly fails on QueryValue, sometimes on EnumValue, and about 1/10 times the test does not fail at all. Ideally the tests should not use threads -- can the same thing be tested without a thread? It seems like the issue isn't related to concurrency, but maybe I missed something.

With _winreg.c patched, the tests seem to pass, but I haven't run that as much as I have the unpatched version.

winreg_test.pach
- test_changing_value (assuming we need do this as a thread)
-- I'd just use HKEY_CURRENT_USER directly instead of storing it locally
-- done doesn't need to be a list, it could just be the True/False
-- the if/else could be shortened to `s = "x" if short else "x"*2000`

- test_long_key
-- After SetValue I'd just call EnumKey(key, 0) since you can only ever have that value to use. The loop isn't really used.
msg104406 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-04-28 04:00
After a quick glance, the _winreg.c changes look ok. I'll try to fit in a review shortly.
msg104408 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-04-28 04:17
Thank for the feedback.  I'll revise the patch tomorrow based on your code-cleanup suggestions.

To answer your question about the thread and explain why the test sometimes passes:

The goal of test_changing_value is to try to trigger a race condition that exists inside many of these function.  So, yes, the thread is necessary. :) Thankfully the functions call Py_BEGIN_ALLOW_THREADS; otherwise, I'd have to launch another process to try to trigger it.

Sometimes we never manage to hit the race condition and the test passes even though the bug is present.  I can increase the number of attempts to trigger it (currently "for _ in range(100)") to make it more likely to occur every time the test is run.
msg104464 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-04-28 21:32
Attached is a new test-case patch.
msg104488 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-04-29 12:08
How much would increasing the loop count slow down the test?  Since if the bug isn't present the loop will run to the end every time, if it is non-trivial it would probably be better to keep it shorter, since Brian said it failed one way or another 9 out of 10 times.  Either way, a note in the test that it may occasionally pass even if the bug is present, but should fail most of the time, would be a good idea.
msg104499 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-04-29 13:02
In the updated patch I uploaded yesterday, I increased the loop count from 100 to 1000 and it still runs virtually instantly.  I also added a comment stating that the test is trying to trigger a race condition.
msg106525 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-05-26 13:26
Committed to trunk in r81517 and release26-maint in r81540.

I'll cover the 3.x stuff today and then close it out.
msg106528 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-05-26 14:46
test_dynamic_key fails on py3k, but not because of these changes.

RegQueryValueExW doesn't appear to work with NULL for the second parameter (valueName), although it is documented to and the ANSI version on 2.x works fine. The empty string is also documented as an acceptable parameter, so while testing it out, I hardcoded "" in PyQueryValueEx and the calls worked. With NULL, it raises "WindowsError: [Error 87] The parameter is incorrect".

Thoughts? The current py3k patch is attached.
msg106530 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-26 15:12
I wrote a short C program to test a few different variations.  It looks to me like a bug in the operating system's implementation of HKEY_PERFORMANCE_DATA (which is a virtual registry key).  If I pass NULL as the second parameter to a more conventional key that has a default value, everything works fine.

I suggest changing the test to pass "" instead of None.
msg106564 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-05-26 18:11
Committed to py3k in r81547 and release31-maint in r81546.

Thanks for the patch!
History
Date User Action Args
2010-05-26 18:11:27brian.curtinsetstatus: open -> closed

messages: + msg106564
2010-05-26 15:12:24stutzbachsetmessages: + msg106530
2010-05-26 14:46:46brian.curtinsetfiles: + issue2810_py3k.diff

messages: + msg106528
2010-05-26 13:26:59brian.curtinsetassignee: stutzbach -> brian.curtin
resolution: fixed
messages: + msg106525
stage: patch review -> resolved
2010-04-29 13:02:10stutzbachsetmessages: + msg104499
2010-04-29 12:08:28r.david.murraysetnosy: + r.david.murray
messages: + msg104488
2010-04-28 21:34:27stutzbachsetfiles: - winreg_test.pach
2010-04-28 21:32:45stutzbachsetfiles: + winreg_test.patch

messages: + msg104464
2010-04-28 04:17:01stutzbachsetmessages: + msg104408
2010-04-28 04:00:32brian.curtinsetmessages: + msg104406
2010-04-28 03:29:31brian.curtinsetmessages: + msg104403
2010-04-27 15:19:44stutzbachsetkeywords: + needs review
assignee: stutzbach
stage: needs patch -> patch review
2010-04-27 13:59:19stutzbachsettitle: _winreg.EnumValue fails when the registry data includes multibyte unicode characters -> _winreg.EnumValue sometimes raises WindowsError ("More data is available")
2010-04-07 16:44:38stutzbachsetmessages: + msg102547
2010-04-07 16:43:33stutzbachsetfiles: - more_data_is_available2.py
2010-04-07 16:43:29stutzbachsetfiles: - more_data_is_available.py
2010-04-07 16:43:07stutzbachsetfiles: + winreg.patch
keywords: + patch
2010-04-07 16:42:50stutzbachsetfiles: + winreg_test.pach
2010-04-07 14:05:23brian.curtinsetassignee: brian.curtin -> (no value)
2010-04-07 14:01:21brian.curtinsetpriority: normal
assignee: brian.curtin
stage: needs patch
versions: - Python 2.5, Python 2.4, Python 3.3
2010-04-07 13:18:34techtoniksetmessages: + msg102538
2010-04-07 13:12:00stutzbachsetmessages: + msg102537
versions: + Python 3.1, Python 3.2, Python 3.3
2010-04-02 14:28:29stutzbachsetmessages: + msg102171
2010-04-02 14:05:09stutzbachsetmessages: + msg102170
2010-04-02 12:44:48techtoniksetmessages: + msg102163
2010-04-01 19:26:55stutzbachsetfiles: + more_data_is_available2.py

messages: + msg102125
2010-04-01 19:24:01stutzbachsetfiles: + more_data_is_available.py

messages: + msg102124
2010-04-01 17:33:45techtoniksetfiles: + issue2810_registry.png
2010-04-01 17:32:01brian.curtinsetnosy: + brian.curtin
2010-04-01 17:29:35techtoniksetnosy: + techtonik

messages: + msg102113
versions: + Python 2.7
2008-12-10 08:28:44loewissetmessages: + msg77492
versions: - Python 2.5.3
2008-10-07 15:07:08stutzbachsetversions: + Python 2.5.3
2008-10-02 17:20:27enoltesetfiles: + issue_2810.reg
nosy: + enolte
messages: + msg74181
versions: + Python 2.6, Python 2.4, - Python 3.0
2008-05-11 19:21:48loewissetmessages: + msg66653
2008-05-11 18:39:25stutzbachsetmessages: + msg66649
2008-05-10 18:57:01loewissetmessages: + msg66563
2008-05-10 18:14:53stutzbachsetmessages: + msg66553
2008-05-10 17:52:44loewissetnosy: + loewis
messages: + msg66547
2008-05-10 17:37:33stutzbachcreate