Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyUnicode_FromFormatV() bugs with "%" and "%%" format strings #55038

Closed
vstinner opened this issue Jan 5, 2011 · 10 comments
Closed

PyUnicode_FromFormatV() bugs with "%" and "%%" format strings #55038

vstinner opened this issue Jan 5, 2011 · 10 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@vstinner
Copy link
Member

vstinner commented Jan 5, 2011

BPO 10829
Nosy @amauryfa, @vstinner
Files
  • pyunicode_fromformatv-2.patch
  • parse_format_flags.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-03-02.00:12:38.858>
    created_at = <Date 2011-01-05.00:46:25.948>
    labels = ['interpreter-core', 'type-bug', 'expert-unicode']
    title = 'PyUnicode_FromFormatV() bugs with "%" and "%%" format strings'
    updated_at = <Date 2011-03-03.08:34:27.362>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2011-03-03.08:34:27.362>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-03-02.00:12:38.858>
    closer = 'vstinner'
    components = ['Interpreter Core', 'Unicode']
    creation = <Date 2011-01-05.00:46:25.948>
    creator = 'vstinner'
    dependencies = []
    files = ['20264', '20828']
    hgrepos = []
    issue_num = 10829
    keywords = ['patch']
    message_count = 10.0
    messages = ['125387', '125392', '125397', '125722', '128961', '129013', '129038', '129836', '129931', '129938']
    nosy_count = 3.0
    nosy_names = ['amaury.forgeotdarc', 'vstinner', 'ysj.ray']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue10829'
    versions = ['Python 2.6', 'Python 3.1', 'Python 2.7', 'Python 3.2']

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 5, 2011

    Steps 1 and 3 of PyUnicode_FromFormatV() doesn't handle the format string "%%" correctly. The loop responsible to skip the precision moves outside the format string, and the function will then read uninitialized memory. The loop:

             while (*++f && *f != '%' && !Py_ISALPHA((unsigned)*f))
                 ;
    

    This is another issue:

        for (f = format; *f; f++) {
             if (*f == '%') {
                 if (*(f+1)=='%')
                     continue;
        ...

    continue only skips the first %: with "%%", the second % will be interpreted (and not escaped).

    Attached patch fixes the issue, but I don't feal confortable with this ugly function, and I would appreciate a review :-) The patch adds unit tests.

    I found the bug when trying to add new tests before trying to implement "%zi" format. I was first surprised that "%zi" (and %li and %lli) is not supported, but now I am surprised because I found bugs :-)

    @vstinner vstinner added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode labels Jan 5, 2011
    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 5, 2011

    Oh, my first patch was wrong: it failed on %c format (in zipimport with "%U%c%U"). The second version of the patch should be better.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 5, 2011

    PyBytes_FromFormatV() has the same issue.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 7, 2011

    PyBytes_FromFormatV() has the same issue.

    Oh no, it's ok, the function works as expected.

    @ysjray
    Copy link
    Mannequin

    ysjray mannequin commented Feb 21, 2011

    Hi, haypo, Your patch seems cannot be applied cleanly on current py3k trunk. And after modified your patch, test_unicode.py runs into Segmentation fault. Is there something wrong or some changes which could influence this bug had been already made since the patch is worked out?

    On the current trunk, I guess the bug could be fixed in a simpler way:

    In step 1, before check '%%', check '%'(a string ends with '%') first. Then check '%%' and skip it.

    The whole patch:

    Index: Objects/unicodeobject.c
    ===================================================================

    --- Objects/unicodeobject.c	(revision 88453)
    +++ Objects/unicodeobject.c	(working copy)
    @@ -750,8 +750,12 @@
          * result in an array) */
         for (f = format; *f; f++) {
              if (*f == '%') {
    -             if (*(f+1)=='%')
    +             if (*(f+1)=='\0')
    +                continue;
    +             if (*(f+1)=='%') {
    +                 f++;
                      continue;
    +             }
                  if (*(f+1)=='S' || *(f+1)=='R' || *(f+1)=='A')
                      ++callcount;
                  while (Py_ISDIGIT((unsigned)*f))

    After applying this small patch and tests in your patch, test_unicode.py can passed.

    @ysjray ysjray mannequin added the type-bug An unexpected behavior, bug, or error label Feb 21, 2011
    @vstinner
    Copy link
    Member Author

    Well, the main problem is that there are 3 different codes to parse the format string, and each code is different... Attached patch factorizes the code: create one subfunction parse_format_flags(). It fixes also this issue and prepares the work to fix bpo-10831.

    @ysjray
    Copy link
    Mannequin

    ysjray mannequin commented Feb 22, 2011

    Well, the main problem is that there are 3 different codes to parse the format string, and each code is different... Attached patch factorizes the code: create one subfunction parse_format_flags(). It fixes also this issue and prepares the work to fix bpo-10831.

    Sounds nice! Maybe several related issues can also use this, like bpo-7330.

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 2, 2011

    Fixed in Python 3.3 (r88702+r88703).

    I don't really want to backport it to 3.2, 3.1 or 2.7, because the patch is complex, changes a very important function, and the fixed bug (don't crash on invalid format strings) is minor. If you would like the fix into in old release, please reopen the issue and tell me why.

    @vstinner vstinner closed this as completed Mar 2, 2011
    @ysjray
    Copy link
    Mannequin

    ysjray mannequin commented Mar 3, 2011

    Hi, haypo, would you mind modify your newly added parse_format_flags() function so that it can diff the precision value of '%.0s' and '%s'(Currently both of them return precision as 0)? Because if used with string formatters(%s, %R, %S, %A, ...), they should be very different. And so I can work on bpo-7330 with it.

    Just return precision as -1 to indicate no precision is designated could work.

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 3, 2011

    Hi, haypo, would you mind modify your newly added parse_format_flags()
    function so that it can diff the precision value of '%.0s' and
    '%s'(Currently both of them return precision as 0)?

    You should update your patch attached to bpo-7330. I hope that your patch will be simpler with parse_format_flags() than before.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant