classification
Title: email.errors.HeaderParseError if base64url is used
Type: Stage:
Components: email, Library (Lib) Versions: Python 3.5, Python 3.4, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, barry, ctheune, guettli, pconnell, r.david.murray
Priority: normal Keywords: patch

Created on 2011-07-04 14:38 by guettli, last changed 2014-04-24 01:59 by r.david.murray.

Files
File name Uploaded Description Edit
62b280b61de7.diff ctheune, 2014-04-17 15:06 review
732e7d4515c0.diff ctheune, 2014-04-17 15:07 review
Repositories containing patches
https://bitbucket.org/ctheune/cpython#3.4-12489
https://bitbucket.org/ctheune/cpython#2.7-12489
Messages (6)
msg139776 - (view) Author: Thomas Guettler (guettli) Date: 2011-07-04 14:38
from email.header import decode_header
decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/email/header.py", line 101, in decode_header
    raise HeaderParseError
email.errors.HeaderParseError

thunderbird is able to decode the header:
"Anmeldung Netzanschluss Südring3p.jpg"

According to Peter Otten base64url encoding was used:

My question in the newsgroup:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/9cf9ccd4109481cc/9f76bd627676b5f1?show_docid=9f76bd627676b5f1
msg139778 - (view) Author: Thomas Guettler (guettli) Date: 2011-07-04 14:39
This happens on Python3:
root@ubuntu1004devel64:~# python3
Python 3.1.2 (r312:79147, Sep 27 2010, 09:57:50) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from email.header import decode_header
>>> decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
Traceback (most recent call last):
  File "/usr/lib/python3.1/email/header.py", line 98, in decode_header
    word = email.base64mime.decode(encoded_string)
  File "/usr/lib/python3.1/email/base64mime.py", line 112, in decode
    return a2b_base64(string.encode('raw-unicode-escape'))
binascii.Error: Incorrect padding

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.1/email/header.py", line 100, in decode_header
    raise HeaderParseError('Base64 decoding error')
email.errors.HeaderParseError: Base64 decoding error
msg139793 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-07-04 17:16
This gives the correct result:
decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU/xkcmluZzNwLmpwZw==?=')
(I replaced _ with /)

The header was probably generated by a variant of the base64 encoding, like this one: http://www.doughellmann.com/PyMOTW/base64/#url-safe-variations

Does this header comes from a real message? How was it generated?
msg139855 - (view) Author: Thomas Guettler (guettli) Date: 2011-07-05 11:41
I received this email. Here is the creator:

X-Mailer: CommuniGate Pro MAPI Connector 1.52.53.10/1.53.10.1
msg216690 - (view) Author: Christian Theune (ctheune) * Date: 2014-04-17 14:31
So, in addition to "+/" and "-_" there are quite a few base64 variants. Worst thing: there are the two ambigious variants "-_" and "_-", even though "_-" supposedly is "non-standard" for its use.

See http://en.wikipedia.org/wiki/Base64

The shortest fix I can see would be to not use binascii directly from the email module but go through the base64 module (as pointed out by the blogpost) and call the urlsafe version. That should catch both cases.

Preparing a patch right now.
msg216705 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-04-17 16:05
The patch looks good.  I'd like the comment to say "We use urlsafe_b64decode here because some mailers apparently use the urlsafe b64 alphabet, and urlsafe_b64decode will correctly decode both the urlsafe and regular alphabets".

Also, the new header parser doesn't handle this case either, and furthermore it doesn't handle binascii errors at all (my comment in the code indicates I didn't think it could ever raise there).  The fact that it doesn't handle the error at all can be considered a different issue, but it would be nice to add the same decode fix (and a test in test_email/test_headerregistry.py) for the new header parser.  Here's one way to reproduce the issue:

>>> from email import message_from_string as mfs
>>> from email.policy import default
>>> m = mfs("From: =?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=\n\n", policy=default)
>>> m['From']
Traceback (most recent call last):
  File "/home/rdmurray/python/p34/Lib/email/_encoded_words.py", line 109, in decode_b
    return base64.b64decode(padded_encoded, validate=True), defects
  File "/home/rdmurray/python/p34/Lib/base64.py", line 89, in b64decode
    raise binascii.Error('Non-base64 digit found')
binascii.Error: Non-base64 digit found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rdmurray/python/p34/Lib/email/message.py", line 391, in __getitem__
    return self.get(name)
  File "/home/rdmurray/python/p34/Lib/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/home/rdmurray/python/p34/Lib/email/policy.py", line 145, in header_fetch_parse
    return self.header_factory(name, ''.join(value.splitlines()))
  File "/home/rdmurray/python/p34/Lib/email/headerregistry.py", line 583, in __call__
    return self[name](name, value)
  File "/home/rdmurray/python/p34/Lib/email/headerregistry.py", line 194, in __new__
    cls.parse(value, kwds)
  File "/home/rdmurray/python/p34/Lib/email/headerregistry.py", line 334, in parse
    kwds['parse_tree'] = address_list = cls.value_parser(value)
  File "/home/rdmurray/python/p34/Lib/email/headerregistry.py", line 325, in value_parser
    address_list, value = parser.get_address_list(value)
  File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 2313, in get_address_list
    token, value = get_address(value)
  File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 2290, in get_address
    token, value = get_group(value)
  File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 2246, in get_group
    token, value = get_display_name(value)
  File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 2072, in get_display_name
    token, value = get_phrase(value)
  File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 1747, in get_phrase
    token, value = get_word(value)
  File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 1728, in get_word
    token, value = get_atom(value)
  File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 1645, in get_atom
    token, value = get_encoded_word(value)
  File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 1421, in get_encoded_word
    text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
  File "/home/rdmurray/python/p34/Lib/email/_encoded_words.py", line 166, in decode
    bstring, defects = _cte_decoders[cte](bstring)
  File "/home/rdmurray/python/p34/Lib/email/_encoded_words.py", line 124, in decode_b
    raise AssertionError("unexpected binascii.Error")
AssertionError: unexpected binascii.Error
History
Date User Action Args
2014-04-24 01:59:48r.david.murraysetnosy: + barry
components: + email
2014-04-17 16:05:20r.david.murraysetmessages: + msg216705
2014-04-17 15:07:07ctheunesetfiles: + 732e7d4515c0.diff
2014-04-17 15:06:50ctheunesetfiles: + 62b280b61de7.diff
keywords: + patch
2014-04-17 14:36:25ctheunesethgrepos: + hgrepo240
2014-04-17 14:36:20ctheunesethgrepos: + hgrepo239
versions: + Python 3.4, Python 3.5
2014-04-17 14:31:55ctheunesetnosy: + r.david.murray, ctheune
messages: + msg216690
2013-04-19 18:46:47pconnellsetnosy: + pconnell
2011-07-05 11:41:23guettlisetmessages: + msg139855
2011-07-04 17:16:24amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg139793
2011-07-04 14:39:12guettlisetmessages: + msg139778
2011-07-04 14:38:19guettlicreate