classification
Title: In 3.x, os.confstr() returns garbage if value is longer than 255 bytes
Type: behavior Stage:
Components: Extension Modules Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: baikie, vstinner
Priority: normal Keywords: patch

Created on 2010-08-12 19:12 by baikie, last changed 2010-09-10 23:50 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
confstr-reduce-bufsize.diff baikie, 2010-08-12 19:12 Shrink stack buffer to one byte for testing
confstr-long-result.diff baikie, 2010-08-12 19:13 Allocate separate buffer to receive result
confstr-minimal.diff baikie, 2010-08-19 18:47 Fix 255-byte issue without attempting to handle changing length
Messages (8)
msg113699 - (view) Author: David Watson (baikie) Date: 2010-08-12 19:12
It may be hard to find a configuration string this long, but you
can see the problem if you apply the attached
confstr-reduce-bufsize.diff to reduce the size of the local array
buffer that posix_confstr() uses.  With it applied:

>>> import os
>>> print(ascii(os.confstr("CS_PATH")))
'\x00\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb'

The problem arises because the code first tries to receive the
configuration string into the local buffer (char buffer[256],
reduced to char buffer[1] above), but then tries to receive it
directly into a string object if it doesn't fit.  You can see
what's gone wrong by comparing the working code in 2.x:

    if ((unsigned int)len >= sizeof(buffer)) {
        result = PyString_FromStringAndSize(NULL, len-1);
        if (result != NULL)
            confstr(name, PyString_AS_STRING(result), len);
    }
    else
        result = PyString_FromStringAndSize(buffer, len-1);

with the code in 3.x:

    if ((unsigned int)len >= sizeof(buffer)) {
        result = PyUnicode_FromStringAndSize(NULL, len-1);
        if (result != NULL)
            confstr(name, _PyUnicode_AsString(result), len);
    }
    else
        result = PyUnicode_FromStringAndSize(buffer, len-1);

Namely, that in 3.x it tries to receive the string into the bytes
object returned by _PyUnicode_AsString(), not the str object it
has just allocated (which has the wrong format anyway -
Py_UNICODE as opposed to char).

The attached confstr-long-result.diff fixes this by allocating a
separate buffer when necessary to receive the result, before
creating the string object from it.  By putting the confstr()
call and allocation in a loop, it also handles the possibility
that the value's length might change between calls.
msg113701 - (view) Author: David Watson (baikie) Date: 2010-08-12 19:18
The returned string should also be decoded with the file system
encoding and surrogateescape error handler, as per PEP 383 -
there's a patch at issue #9580 to do this.
msg113724 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-13 00:02
About confstr-long-result.diff: why do you use a loop to reallocate the buffer? confstr() result may change?
msg113808 - (view) Author: David Watson (baikie) Date: 2010-08-13 18:41
I don't see why confstr() values shouldn't change.  sysconf() values can change between calls, IIRC.  Implementations can also define their own confstr variables - they don't have to stick to the POSIX stuff.

And using a loop means the confstr() call only appears once in the source, which is more elegant, right? :)
msg113840 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-13 22:07
I just fear that the loop is "endless". Imagine the worst case: confstr() returns a counter (n, n+1, n+2, ...). In 64 bits, it can be long.

I would prefer to see a condition to stop after 2 steps. It should maybe stop when an error at the 3rd step.
msg113924 - (view) Author: David Watson (baikie) Date: 2010-08-14 19:18
> I just fear that the loop is "endless". Imagine the worst case: confstr() returns a counter (n, n+1, n+2, ...). In 64 bits, it can be long.

The returned length is supposed to be determined by the length of
the variable, not the length of the buffer passed by the caller,
so I don't see why the OS would have a bug like that, and it
would probably be exposed by the test suite anyway (there's
currently a simple test using CS_PATH).

> I would prefer to see a condition to stop after 2 steps. It should maybe stop when an error at the 3rd step.

That is, raise an exception?  Yeah, possibly, but I think it's
better to just believe what the OS tells you rather than have an
exception that's only raised once in a blue moon for something
that may just be a low-probability event, and not an error.
msg114400 - (view) Author: David Watson (baikie) Date: 2010-08-19 18:47
I've opened a separate issue for the changing-length problem
(issue #9647; it affects 2.x as well).  Here is a patch that
fixes the 255-byte issue only, and has similar results to the 2.x
code if the value changes length between calls (except that it
could raise a UnicodeError if the string is truncated inside a
multibyte character encoding).
msg116062 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-10 23:50
Fixed in r84696+r84697: confstr-minimal.diff + PyUnicode_DecodeFSDefaultAndSize().
History
Date User Action Args
2010-09-10 23:50:57vstinnersetstatus: open -> closed
resolution: fixed
2010-09-10 23:50:48vstinnersetmessages: + msg116062
2010-08-19 18:47:08baikiesetfiles: + confstr-minimal.diff

messages: + msg114400
2010-08-14 19:18:25baikiesetmessages: + msg113924
2010-08-13 22:07:12vstinnersetmessages: + msg113840
2010-08-13 18:41:23baikiesetmessages: + msg113808
2010-08-13 00:02:08vstinnersetmessages: + msg113724
2010-08-12 20:38:34pitrousetnosy: + vstinner
2010-08-12 19:18:10baikiesetmessages: + msg113701
2010-08-12 19:13:29baikiesetfiles: + confstr-long-result.diff
2010-08-12 19:12:12baikiecreate