Title: bugs in scanstring_str() and scanstring_unicode() of _json module
Components: Library (Lib) Versions: Python 2.6
Nosy List: bob.ippolito, georg.brandl, vstinner
Created on 2008-07-08 22:58 by vstinner

_json.patch vstinner, 2008-07-19 13:34 A patch to see the problem and maybe fix the crash
Author: STINNER Victor (vstinner) Date: 2008-07-08 22:58
scanstring_str() and scanstring_unicode() functions don't end value 
whereas it can be outside input string range. A check like this is 
    if (end < 0 || len <= end) {
        PyErr_SetString(PyExc_ValueError, "xxx");
        return NULL;

next is set to begin but few lines later (before first use of next), 
it's set to end: for (next = end; ...). 

In error message, eg. "Invalid control character at (...)", begin is 
used as character position but I think that the right position is in 
the variable "end" (or maybe "next"?).

I'm unable to fix these functions because I don't understand the code.
Author: STINNER Victor (vstinner) Date: 2008-07-19 11:16
To reproduce the crash, try very big negative integer as second 
argument. Example:

>>> _json.scanstring("test", -23492394)
Erreur de segmentation (core dumped)

>>> _json.scanstring(u"test", -1239239)
Erreur de segmentation (core dumped)
Author: Georg Brandl (georg.brandl) Date: 2008-07-19 13:01
Bob, do you know how to fix this?
Author: STINNER Victor (vstinner) Date: 2008-07-19 13:34
I wrote that I'm unable to fix the bug correctly, but I wrote a patch 
to avoid the crash:
- replace begin by end in error messages: is it correct?
- use "end < 0 || len <= end" test to check scanstring() second 
argument => raise a ValueError if end value is invalid
Author: Bob Ippolito (bob.ippolito) Date: 2008-07-19 21:24
Am I to understand that the bug here is that the C extension doesn't
validate input properly if you call into it directly? Without a test I'm
not entirely sure exactly how you could possibly get negative values
into those functions using the json module as-is.
Author: Bob Ippolito (bob.ippolito) Date: 2008-07-19 21:48
I've audited the patch, while it does fix the input range it looks like
it regresses other things (at least the error messages). "begin" was
intentionally used. The patch is not suitable for use, I'll create a
minimal patch that just fixes input validation.
Author: Bob Ippolito (bob.ippolito) Date: 2008-07-19 22:00
I just committed a fix to trunk in r65147, needs port to py3k?
Author: Georg Brandl (georg.brandl) Date: 2008-07-20 07:26
Was merged in r65148.
