classification
Title: punycode codec raises IndexError in decode_generalized_number()
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Vikram Hegde, berker.peksag, miss-islington, r.david.murray, vikhegde
Priority: normal Keywords: patch

Created on 2017-06-04 15:49 by Vikram Hegde, last changed 2020-02-25 04:10 by berker.peksag. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 1986 closed vikhegde, 2017-06-07 16:18
PR 18632 merged berker.peksag, 2020-02-24 01:51
PR 18651 merged miss-islington, 2020-02-25 03:19
PR 18652 merged miss-islington, 2020-02-25 03:19
Messages (9)
msg295127 - (view) Author: Vikram Hegde (Vikram Hegde) * Date: 2017-06-04 15:49
Here is the relevant code snippet from  decode_generalized_number() in punycode.py

    try:
            char = ord(extended[extpos])
        except IndexError:
            if errors == "strict":
                raise UnicodeError("incomplete punicode string")
            return extpos + 1, None
        extpos += 1
        if 0x41 <= char <= 0x5A: # A-Z
            digit = char - 0x41
        elif 0x30 <= char <= 0x39:
            digit = char - 22 # 0x30-26
        elif errors == "strict":
            raise UnicodeError("Invalid extended code point '%s'"
                               % extended[extpos])

   While raising the UnicodeError() in the last line above, it accesses extended[extpos]. However extpos was incremented by 1 a few lines above that. This causes two errors:
   1) The UnicodeError() prints the wrong character (the one after the character we want)
   2) If the previous extpos was the last character in the string, then attempting to print character at extpos+1 will raise an IndexError.
msg295149 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-04 23:22
Can you provide a reproducer, please?
msg295270 - (view) Author: Vikram Hegde (Vikram Hegde) * Date: 2017-06-06 15:46
I have a patch for this problem but my contributor agreement has not been accepted yet, so I can't do a pull request.

Use the python package tldextract to trigger the bug. Here is a sample

Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 12:22:00) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tldextract
>>> tldextract.extract("xn--w&")
Traceback (most recent call last):
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/punycode.py", line 207, in decode
    res = punycode_decode(input, errors)
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/punycode.py", line 194, in punycode_decode
    return insertion_sort(base, extended, errors)
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/punycode.py", line 165, in insertion_sort
    bias, errors)
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/punycode.py", line 146, in decode_generalized_number
    % extended[extpos])
IndexError: string index out of range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/tldextract.py", line 358, in extract
    return TLD_EXTRACTOR(url)
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/tldextract.py", line 237, in __call__
    translations = [decode_punycode(label).lower() for label in labels]
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/tldextract.py", line 237, in <listcomp>
    translations = [decode_punycode(label).lower() for label in labels]
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/tldextract.py", line 232, in decode_punycode
    return idna.decode(label.encode('ascii'))
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/idna/core.py", line 384, in decode
    result.append(ulabel(label))
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/idna/core.py", line 302, in ulabel
    label = label.decode('punycode')
IndexError: decoding with 'punycode' codec failed (IndexError: string index out of range)
>>>
msg295278 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-06 17:16
You don't need an eternal package, just decoding b'xn--w&' with punycode will produce the traceback.
msg302070 - (view) Author: Vikram Hegde (vikhegde) * Date: 2017-09-13 12:57
Could someone please review my PR. It has been in the pending state for over three months.
msg362613 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2020-02-25 03:19
New changeset ba22e8f174309979d90047c5dc64fcb63bc2c32e by Berker Peksag in branch 'master':
bpo-30566: Fix IndexError when using punycode codec (GH-18632)
https://github.com/python/cpython/commit/ba22e8f174309979d90047c5dc64fcb63bc2c32e
msg362617 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2020-02-25 03:42
New changeset daef21ce7dfd3735101d85d6ebf7554187c33ab8 by Miss Islington (bot) in branch '3.8':
bpo-30566: Fix IndexError when using punycode codec (GH-18632)
https://github.com/python/cpython/commit/daef21ce7dfd3735101d85d6ebf7554187c33ab8
msg362618 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2020-02-25 03:43
New changeset 55be9a6c09d4415f50b14212ce22eccefa83ca64 by Miss Islington (bot) in branch '3.7':
bpo-30566: Fix IndexError when using punycode codec (GH-18632)
https://github.com/python/cpython/commit/55be9a6c09d4415f50b14212ce22eccefa83ca64
msg362623 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2020-02-25 04:10
Thanks for the report and for the initial patch!
History
Date User Action Args
2020-02-25 04:10:48berker.peksagsetstatus: open -> closed
versions: + Python 3.7, Python 3.8, Python 3.9, - Python 3.6
messages: + msg362623

resolution: fixed
stage: patch review -> resolved
2020-02-25 03:43:49berker.peksagsetmessages: + msg362618
2020-02-25 03:42:45berker.peksagsetmessages: + msg362617
2020-02-25 03:19:39miss-islingtonsetpull_requests: + pull_request18013
2020-02-25 03:19:32miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request18012
2020-02-25 03:19:07berker.peksagsetmessages: + msg362613
2020-02-24 01:51:46berker.peksagsetkeywords: + patch
nosy: + berker.peksag

pull_requests: + pull_request17998
stage: patch review
2018-07-11 07:48:59serhiy.storchakasettype: crash -> behavior
2017-09-13 12:57:03vikhegdesetnosy: + vikhegde
messages: + msg302070
2017-06-07 16:18:51vikhegdesetpull_requests: + pull_request2053
2017-06-06 17:16:36r.david.murraysetmessages: + msg295278
2017-06-06 15:46:30Vikram Hegdesetnosy: + Vikram Hegde
messages: + msg295270
2017-06-04 23:22:53r.david.murraysetnosy: + r.david.murray
messages: + msg295149
2017-06-04 15:53:33Vikram Hegdesetnosy: - Vikram Hegde
-> (no value)
2017-06-04 15:49:54Vikram Hegdecreate