Title: punycode codec raises IndexError in decode_generalized_number()
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.6
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Vikram Hegde, r.david.murray, vikhegde
Priority: normal Keywords:

Created on 2017-06-04 15:49 by Vikram Hegde, last changed 2018-07-11 07:48 by serhiy.storchaka.

Pull Requests
URL Status Linked Edit
PR 1986 open vikhegde, 2017-06-07 16:18
Messages (5)
msg295127 - (view) Author: Vikram Hegde (Vikram Hegde) * Date: 2017-06-04 15:49
Here is the relevant code snippet from  decode_generalized_number() in

            char = ord(extended[extpos])
        except IndexError:
            if errors == "strict":
                raise UnicodeError("incomplete punicode string")
            return extpos + 1, None
        extpos += 1
        if 0x41 <= char <= 0x5A: # A-Z
            digit = char - 0x41
        elif 0x30 <= char <= 0x39:
            digit = char - 22 # 0x30-26
        elif errors == "strict":
            raise UnicodeError("Invalid extended code point '%s'"
                               % extended[extpos])

   While raising the UnicodeError() in the last line above, it accesses extended[extpos]. However extpos was incremented by 1 a few lines above that. This causes two errors:
   1) The UnicodeError() prints the wrong character (the one after the character we want)
   2) If the previous extpos was the last character in the string, then attempting to print character at extpos+1 will raise an IndexError.
msg295149 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-04 23:22
Can you provide a reproducer, please?
msg295270 - (view) Author: Vikram Hegde (Vikram Hegde) * Date: 2017-06-06 15:46
I have a patch for this problem but my contributor agreement has not been accepted yet, so I can't do a pull request.

Use the python package tldextract to trigger the bug. Here is a sample

Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 12:22:00) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tldextract
>>> tldextract.extract("xn--w&")
Traceback (most recent call last):
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/", line 207, in decode
    res = punycode_decode(input, errors)
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/", line 194, in punycode_decode
    return insertion_sort(base, extended, errors)
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/", line 165, in insertion_sort
    bias, errors)
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/", line 146, in decode_generalized_number
    % extended[extpos])
IndexError: string index out of range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/", line 358, in extract
    return TLD_EXTRACTOR(url)
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/", line 237, in __call__
    translations = [decode_punycode(label).lower() for label in labels]
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/", line 237, in <listcomp>
    translations = [decode_punycode(label).lower() for label in labels]
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/", line 232, in decode_punycode
    return idna.decode(label.encode('ascii'))
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/idna/", line 384, in decode
  File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/idna/", line 302, in ulabel
    label = label.decode('punycode')
IndexError: decoding with 'punycode' codec failed (IndexError: string index out of range)
msg295278 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-06 17:16
You don't need an eternal package, just decoding b'xn--w&' with punycode will produce the traceback.
msg302070 - (view) Author: Vikram Hegde (vikhegde) * Date: 2017-09-13 12:57
Could someone please review my PR. It has been in the pending state for over three months.
Date User Action Args
2018-07-11 07:48:59serhiy.storchakasettype: crash -> behavior
2017-09-13 12:57:03vikhegdesetnosy: + vikhegde
messages: + msg302070
2017-06-07 16:18:51vikhegdesetpull_requests: + pull_request2053
2017-06-06 17:16:36r.david.murraysetmessages: + msg295278
2017-06-06 15:46:30Vikram Hegdesetnosy: + Vikram Hegde
messages: + msg295270
2017-06-04 23:22:53r.david.murraysetnosy: + r.david.murray
messages: + msg295149
2017-06-04 15:53:33Vikram Hegdesetnosy: - Vikram Hegde
-> (no value)
2017-06-04 15:49:54Vikram Hegdecreate