Title: email.utils.getaddresses behavior contradicts RFC2822
Type: behavior Stage: resolved
Components: email, Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
Status: closed Resolution: fixed
Assigned To: Nosy List: Ivan.Egorov, barry, r.david.murray, v+python
Created on 2011-01-28 22:30 by Ivan.Egorov, last changed 2022-04-11 14:57 by admin.

email.utils.getaddresses.patch Ivan.Egorov, 2011-01-28 22:30 Proposed patch
Messages (2)
msg127357 - (view) Author: Ivan Egorov (Ivan.Egorov) Date: 2011-01-28 22:30
email.utils.getaddresses behaves wrong in following folding cases (outer single quote is not a part of value): 
'"A\r\n (B)" <>'
'(A\r\n C) <>'

The misbehavior occurs in at least 2.6, 2.7 and branches/py3k.

Both these strings are RFC 2822 compliant, but current getaddresses() implementation misbehaves on 'quoted-string' and 'comment' containing CRLF.

Following references the related RFC sections:

Attachment contains tests and patch for the case.
msg161800 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-05-28 19:38
The pre 3.3 email package does not do any header unfolding.  You can make this work by doing the header unfolding before passing it to getaddresses:

  >>> email.utils.getaddresses([''.join(m['to'].splitlines())])
  [('A (B)', ''), ('', '')]

The new provisional policy that was just added to 3.3 (which will eventually become the standard interface) does do the unfolding before parsing the addresses, so it does not have this issue.  In 3.3 we now have this:

  >>> import email
  >>> from email.policy import SMTP
  >>> m = email.message_from_string("To: \"A\r\n (B)\" <>, (A\r\n C) <>\r\nSubject: test\r\n\r\nbody", policy=SMTP)
  >>> m['to'].addresses
  (Address(display_name='A (B)', username='c', domain=''), Address(display_name='', username='d', domain=''))
