Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submit the re, json, csv, & struct modules to oss-fuzz testing #73691

Closed
gpshead opened this issue Feb 8, 2017 · 29 comments
Closed

Submit the re, json, csv, & struct modules to oss-fuzz testing #73691

gpshead opened this issue Feb 8, 2017 · 29 comments
Assignees
Labels
3.7 (EOL) end of life type-security A security issue

Comments

@gpshead
Copy link
Member

gpshead commented Feb 8, 2017

BPO 29505
Nosy @brettcannon, @terryjreedy, @gpshead, @tiran, @ssbr, @alex, @bitdancer, @ammaraskar, @miss-islington
PRs
  • bpo-29505: Add fuzz tests for float(str), int(str), unicode(str) #2878
  • bpo-29505: Avoid UB in test selection macro. #3407
  • bpo-29505: Fuzz ast.parse() / compile() with PyCF_ONLY_AST #3437
  • bpo-29505: Fix interpreter in fuzzing targets to be relocatable #13907
  • [3.8] bpo-29505: Fix interpreter in fuzzing targets to be relocatable (GH-13907) #13914
  • [3.7] bpo-29505: Fix interpreter in fuzzing targets to be relocatable (GH-13907) #13915
  • bpo-29505: Fuzz json module, enforce size limit on int(x) fuzz #13991
  • [3.8] bpo-29505: Fuzz json module, enforce size limit on int(x) fuzz (GH-13991) #14005
  • [3.7] bpo-29505: Fuzz json module, enforce size limit on int(x) fuzz (GH-13991) #14006
  • bpo-29505: Add fuzzing for re.compile, re.load and csv.reader #14255
  • [3.8] bpo-29505: Add more fuzzing for re.compile, re.load and csv.reader (GH-14255) #14478
  • [3.7] bpo-29505: Add more fuzzing for re.compile, re.load and csv.reader (GH-14255) #14479
  • bpo-29505: Fuzz struct.unpack and catch RecursionError in re.compile #18679
  • bpo-29505: Add fuzzer for ast.literal_eval #28777
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2021-04-02.16:05:39.827>
    created_at = <Date 2017-02-08.20:54:21.924>
    labels = ['type-security', '3.7']
    title = 'Submit the re, json, csv, & struct modules to oss-fuzz testing'
    updated_at = <Date 2021-10-06.23:22:19.809>
    user = 'https://github.com/gpshead'

    bugs.python.org fields:

    activity = <Date 2021-10-06.23:22:19.809>
    actor = 'miss-islington'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2021-04-02.16:05:39.827>
    closer = 'ammar2'
    components = []
    creation = <Date 2017-02-08.20:54:21.924>
    creator = 'gregory.p.smith'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 29505
    keywords = ['patch']
    message_count = 29.0
    messages = ['287363', '287568', '292821', '293263', '293340', '299183', '299207', '301494', '301495', '301530', '301535', '301732', '301759', '301897', '301901', '301918', '301922', '301932', '345039', '345044', '345045', '345304', '345306', '345307', '346916', '346918', '347496', '390076', '403340']
    nosy_count = 10.0
    nosy_names = ['brett.cannon', 'terry.reedy', 'gregory.p.smith', 'christian.heimes', 'Devin Jeanpierre', 'alex', 'r.david.murray', 'Guido', 'ammar2', 'miss-islington']
    pr_nums = ['2878', '3407', '3437', '13907', '13914', '13915', '13991', '14005', '14006', '14255', '14478', '14479', '18679', '28777']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue29505'
    versions = ['Python 2.7', 'Python 3.6', 'Python 3.7']

    @gpshead
    Copy link
    Member Author

    gpshead commented Feb 8, 2017

    For reference, read https://github.com/google/oss-fuzz.

    We should investigate creating fuzz targets for the Python re module (_sre.c) at a minimum. There are probably other good targets as well such as _json.c and _csv.c.

    pickle and marshal are not intended to be secure so ignore those.

    @gpshead gpshead added 3.7 (EOL) end of life type-security A security issue labels Feb 8, 2017
    @terryjreedy
    Copy link
    Member

    It does not appear to me that targets have to be security critical, though that is certainly a good place to start. The Chrome tests found 100s of "security vulnerabilities and stability bugs".

    The important thing is that there be someone willing to receive and act on reports. Would 'make public after 90 days' ever be a problem? AFAIK, most Python security issues are already public here on the tracker from day 1.

    @ssbr
    Copy link
    Mannequin

    ssbr mannequin commented May 2, 2017

    Aha, I found an existing issue!

    For adding to oss-fuzz, is there a contact email we can use that is connected to a google account? I am tempted to just put gregory.p.smith on there if not. :)

    I can volunteer to fuzz some interesting subset of the stdlib. The list I've come up with (by counting uses in my code) is:

    the XML parser (which seems to be written in C)
    struct (unpack)
    the various builtins that parse strings (like int())
    hashlib
    binascii
    datetime's parsing
    json

    I'd also suggest the ast module, since people do use ast.literal_eval on untrusted strings, but I probably won't do that one myself.

    I wrote a fuzz test for json via upstream simplejson, but the bug on github is getting stale: simplejson/simplejson#163

    Should I add it to CPython instead?

    We should investigate creating fuzz targets for the Python re module (_sre.c) at a minimum.

    If we prioritize based on security risk, I'd argue that this is lower priority than things like json's speedup extension module, because people should generally not pass untrusted strings to the re module: it's very easy to DOS a service with regexes unless you're using RE2 or similar -- which is fuzzed. In contrast, json is supposed to accept untrusted input and people do that very often.

    (OTOH, I would be willing to bet that fuzzing re will yield more bugs than fuzzing json.)

    @gpshead
    Copy link
    Member Author

    gpshead commented May 9, 2017

    you can list me as a oss-fuzz contact. use my work email address.

    simplejson is worthy but as both it and the python standard library ship separately people use both so they both ultimately deserve fuzzing and fixing on their own so I'd add it to CPython as well.

    @ssbr
    Copy link
    Mannequin

    ssbr mannequin commented May 9, 2017

    google/oss-fuzz#583 is the PR to oss-fuzz to add the project. I'm working on actual tests to be submitted here.

    @terryjreedy
    Copy link
    Member

    As I read 583, they are planning to fuzz 3.6. Why not branch master? I think it more likely that we accidentally add a vulnerability to master then that we accidentally close one.

    @ssbr
    Copy link
    Mannequin

    ssbr mannequin commented Jul 26, 2017

    I think they misspoke, it's normal with fuzzing to test against master. The current draft of the code runs this git pull before building/launching any tests:

    git clone --depth 1 https://github.com/python/cpython.git cpython
    

    Speaking of which, I forgot to update this bug thread with the followup PR to actually run CPython's fuzz tests (when they exist): google/oss-fuzz#731. That's where I grabbed the git clone statement from. I think that will be merged after some version of PR 2878 lands in CPython (still in code review / broken).

    For Python 2 I guess it's different, and we will test against the 2.7 branch, right?

    @gpshead
    Copy link
    Member Author

    gpshead commented Sep 6, 2017

    New changeset c5bace2 by Gregory P. Smith (Devin Jeanpierre) in branch 'master':
    bpo-29505: Add fuzz tests for float(str), int(str), unicode(str) (bpo-2878)
    c5bace2

    @gpshead
    Copy link
    Member Author

    gpshead commented Sep 6, 2017

    alright, with that in, feel free to figure out the oss-fuzz configuration side and fire things up Devin. :)

    @gpshead gpshead self-assigned this Sep 6, 2017
    @tiran
    Copy link
    Member

    tiran commented Sep 6, 2017

    GCC complains about the patch:

    /home/heimes/dev/python/cpython/Modules/_xxtestfuzz/fuzzer.c: In functionLLVMFuzzerTestOneInput’:
    /home/heimes/dev/python/cpython/Modules/_xxtestfuzz/fuzzer.c:109:1: warning: this use of "defined" may not be portable [-Wexpansion-to-defined]
     #if _Py_FUZZ_YES(fuzz_builtin_float)
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    /home/heimes/dev/python/cpython/Modules/_xxtestfuzz/fuzzer.c:109:1: warning: this use of "defined" may not be portable [-Wexpansion-to-defined]
    /home/heimes/dev/python/cpython/Modules/_xxtestfuzz/fuzzer.c:112:1: warning: this use of "defined" may not be portable [-Wexpansion-to-defined]
     #if _Py_FUZZ_YES(fuzz_builtin_int)
     ^~~~~~~~~~~~~~~~~~~~~~~~~
    /home/heimes/dev/python/cpython/Modules/_xxtestfuzz/fuzzer.c:112:1: warning: this use of "defined" may not be portable [-Wexpansion-to-defined]
    /home/heimes/dev/python/cpython/Modules/_xxtestfuzz/fuzzer.c:115:1: warning: this use of "defined" may not be portable [-Wexpansion-to-defined]
     #if _Py_FUZZ_YES(fuzz_builtin_unicode)
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /home/heimes/dev/python/cpython/Modules/_xxtestfuzz/fuzzer.c:115:1: warning: this use of "defined" may not be portable [-Wexpansion-to-defined]

    @ssbr
    Copy link
    Mannequin

    ssbr mannequin commented Sep 6, 2017

    Huh. I would not have predicted that.

    https://gcc.gnu.org/onlinedocs/cpp/Defined.html

    I'll send a fix.

    @ssbr
    Copy link
    Mannequin

    ssbr mannequin commented Sep 8, 2017

    So here's an interesting issue: oss-fuzz requires that the built location be movable. IOW, we build Python into $OUT, and then the $OUT directory gets moved somewhere else and the fuzz test gets run from there. This causes problems because Python can no longer find where the modules it needs are (encodings for example).

    First thought: wouldn't it be nice if we could make a prepackaged and hermetic executable that we can move around freely?

    Second thought: isn't that "Hermetic Python", as used within Google?

    Third thought: doesn't Google have an internal fuzz testing environment we can use, instead of oss-fuzz?

    So unless someone says this is a bad idea, I'd propose we not run these in oss-fuzz and instead run them in Google proper. The alternative is if there's a way to make it easy to move Python around -- is there a way to build it s.t. the import path is relative and so on?

    @gpshead
    Copy link
    Member Author

    gpshead commented Sep 9, 2017

    i'd rather make this work in oss-fuzz on cpython. can you point me to how oss-fuzz works and what it wants to do so i can better understand what it needs?

    it it has an expectation that the thing being fuzzed is a single binary with no data or directory tree layout dependencies that can just be plopped somewhere else is not a great assumption to make.

    but environment variables _should_ be able to be set to point the python binary at what it needs if it must work that way.

    @ssbr
    Copy link
    Mannequin

    ssbr mannequin commented Sep 11, 2017

    i'd rather make this work in oss-fuzz on cpython. can you point me to how oss-fuzz works and what it wants to do so i can better understand what it needs?

    I don't have any details except for what's in the PR to oss-fuzz (google/oss-fuzz#731) My understanding is matches what you've said so far:

    Python is built to one directory (/out/), but then needs to be run from another directory (/out/ is renamed to /foo/bar/baz/out/). We need python to still work. I have no idea how to do this.

    The only suggestion on #python-dev IRC was to statically link a libpython.a, but this doesn't avoid needing to import libraries like "encodings" dynamically, so they still need to be locatable on disk.

    Is there a way to build python so that it doesn't use absolute paths to everything, and so that the install can be moved at will? Or is there a way to tell it that it was moved at runtime? (I am unconvinced PYTHONPATH is a maintainable solution, if it works at all...)

    oss-fuzz is not going to change away from its model (I asked if they could, they said no), so we're stuck with making Python compatible with it one way or another. This is why I am so drawn to running the test internally on Google's infrastructure anyway: we already _did_ all this work already, via hermetic python. Doing it a second time, but worse, seems annoying.

    @ssbr
    Copy link
    Mannequin

    ssbr mannequin commented Sep 11, 2017

    kcc strongly disagrees though. Copying latest comment:

    """
    fwiw - I object to us running any of this internally at Google. We need to be part of the main oss-fuzz project pulling from upstream revisions. Doing this testing within our blackhole of internal stuff adds more work for us internally (read: which we're not going to do) and wouldn't provide results feedback to the upstream CPython project in a useful timely manner.

    We must figure out how to get this to build and run on the external oss-fuzz infrastructure
    """

    @gpshead
    Copy link
    Member Author

    gpshead commented Sep 11, 2017

    misquote. that was me objecting to running it internally. :)

    i believe this is solvable, i haven't had time to spend on this part yet.

    @ssbr
    Copy link
    Mannequin

    ssbr mannequin commented Sep 11, 2017

    Oops, so it is. I can't read apparently.

    I'll spend my time on making more fuzz tests in the meantime.

    @bitdancer
    Copy link
    Member

    Seems like it ought to be possible to use the same hooks that venv uses to make this work, but I haven't looked at the details of how those work.

    @gpshead
    Copy link
    Member Author

    gpshead commented Jun 8, 2019

    New changeset a15a7bc by Gregory P. Smith (Ammar Askar) in branch 'master':
    bpo-29505: Fix interpreter in fuzzing targets to be relocatable (GH-13907)
    a15a7bc

    @miss-islington
    Copy link
    Contributor

    New changeset 22b69da by Miss Islington (bot) in branch '3.8':
    bpo-29505: Fix interpreter in fuzzing targets to be relocatable (GH-13907)
    22b69da

    @miss-islington
    Copy link
    Contributor

    New changeset 6692d35 by Miss Islington (bot) in branch '3.7':
    bpo-29505: Fix interpreter in fuzzing targets to be relocatable (GH-13907)
    6692d35

    @gpshead
    Copy link
    Member Author

    gpshead commented Jun 12, 2019

    New changeset a6e190e by Gregory P. Smith (Ammar Askar) in branch 'master':
    bpo-29505: Fuzz json module, enforce size limit on int(x) fuzz (GH-13991)
    a6e190e

    @miss-islington
    Copy link
    Contributor

    New changeset 534136a by Miss Islington (bot) in branch '3.7':
    bpo-29505: Fuzz json module, enforce size limit on int(x) fuzz (GH-13991)
    534136a

    @miss-islington
    Copy link
    Contributor

    New changeset 878227e by Miss Islington (bot) in branch '3.8':
    bpo-29505: Fuzz json module, enforce size limit on int(x) fuzz (GH-13991)
    878227e

    @gpshead
    Copy link
    Member Author

    gpshead commented Jun 30, 2019

    New changeset 5cbbbd7 by Gregory P. Smith (Ammar Askar) in branch 'master':
    bpo-29505: Add more fuzzing for re.compile, re.load and csv.reader (GH-14255)
    5cbbbd7

    @miss-islington
    Copy link
    Contributor

    New changeset ffcc161 by Miss Islington (bot) in branch '3.8':
    bpo-29505: Add more fuzzing for re.compile, re.load and csv.reader (GH-14255)
    ffcc161

    @Guido
    Copy link
    Mannequin

    Guido mannequin commented Jul 8, 2019

    Hi,

    I've built a generic Python fuzzer and submitted it to OSS-Fuzz.

    It works by implementing a "def FuzzerRunOne(FuzzerInput):" function in Python in which some arbitrary code is run based on FuzzerInput, which is a bytes object.

    This is a more versatile solution than the current re, json, csv fuzzers as it requires no custom C code and adding more fuzzing targets is as easy as writing a new harness in Python and adding a build rule.

    Code coverage is measured at both the CPython level (.c) and the Python level (.py). CPython is compiled with AddressSanitizer. What this means is that both CPython memory bugs and Python library bugs (excessive memory consumption, hangs, slowdowns, unexpected exceptions) are expected to transpire.

    You can see my current set of fuzzers here: https://github.com/guidovranken/python-library-fuzzers

    The PR to OSS-Fuzz is google/oss-fuzz#2567

    Currently, the only Python maintainer who will be receiving automated bug reports is gpshead. Are there any other developers who normally process Python security bug reports and would like to receive notifications?

    Feel free to respond directly in the OSS-Fuzz PR thread.

    @gpshead gpshead changed the title Submit the re, json, & csv modules to oss-fuzz testing Submit the re, json, csv, & struct modules to oss-fuzz testing Feb 28, 2020
    @ammaraskar
    Copy link
    Member

    All the modules prescribed in the original bug are now actively being fuzzed, thank you to Devin for the original work on this and Gregory for reviewing all these changes.

    I'm closing this now, feel free to make a new bug and nosy me in if there are any other C-based modules commonly exposed to user input that should be fuzzed.

    @miss-islington
    Copy link
    Contributor

    New changeset db72e58 by Ammar Askar in branch 'main':
    bpo-29505: Add fuzzer for ast.literal_eval (GH-28777)
    db72e58

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    gpshead added a commit that referenced this issue Nov 3, 2022
    Now that our int<->str conversions are size limited and we have the
    _pylong module handling larger integers, we don't need to limit
    everything just to avoid wasting time in the quadratic time DoS-like
    case while fuzzing.
    
    We can tweak these further after seeing how this goes.
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants