Title: email.Utils.parseaddr fails to parse valid addresses
Type: behavior Stage: patch review
Components: email, Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.2, Python 3.3, Python 2.7
Assigned To: Nosy List: barry, christian.heimes, eric.araujo, ioanatia, martin.panter, melicertes, r.david.murray, sascha_silbe, sdgathman, thehesiod
Created on 2004-09-09 20:43 by melicertes, last changed 2022-04-11 14:56 by admin.

Messages (10)
msg22413 - (view) Author: Charles (melicertes) Date: 2004-09-09 20:43
email.Utils.parseaddr() does not successfully parse a
field value into a (comment, address) pair if the
address contains a source route with more than one hop.

i.e., it is successfully parses this:

  "God" <>

to get the address <>, but it fails to do
the same if supplied with a 2-hop source route:

  "God" <,>

In this case, it gets the comment ("God") right, but
fails to extract the address.

Multi-hop source routes, while deprecated, are still
valid in rfc2822.

msg59669 - (view) Author: Stuart D Gathman (sdgathman) Date: 2008-01-10 16:34
# A quick and very dirty fix for common broken cases, with test cases.

import rfc822

def parseaddr(t):
  """Split email into Fullname and address.

  >>> parseaddr('')
  ('', '')
  >>> parseaddr('"Full Name" <>')
  ('Full Name', '')
  >>> parseaddr(' <>')
  ('', '')
  >>> parseaddr('"God" <,>')
  ('God', '')
  #return email.Utils.parseaddr(t)
  res = rfc822.parseaddr(t)
  # dirty fix for some broken cases
  if not res[0]:
    pos = t.find('<')
    if pos > 0 and t[-1] == '>':
      addrspec = t[pos+1:-1]
      pos1 = addrspec.rfind(':')
      if pos1 > 0:
        addrspec = addrspec[pos1+1:]
      return rfc822.parseaddr('"%s" <%s>' % (t[:pos].strip(),addrspec))
  if not res[1]:
    pos = t.find('<')
    if pos > 0 and t[-1] == '>':
      addrspec = t[pos+1:-1]
      pos1 = addrspec.rfind(':')
      if pos1 > 0:
        addrspec = addrspec[pos1+1:]
      return rfc822.parseaddr('%s<%s>' % (t[:pos].strip(),addrspec))
  return res
msg59670 - (view) Author: Stuart D Gathman (sdgathman) Date: 2008-01-10 16:37
Ok, I see the '@' is technically not allowed in an atom.  But it either
needs to throw an invalid syntax exception, or heuristically return
something reasonable.
msg59671 - (view) Author: Stuart D Gathman (sdgathman) Date: 2008-01-10 16:39
Repeating because previous real life test case was rejected as 'spam':

It also fails to parse:
>>> from email.Utils import parseaddr
>>> parseaddr(' <>')
('', '')

Getting the wrong part as the actual email to boot!  Checked 2.4 and 2.5.
msg59735 - (view) Author: Stuart D Gathman (sdgathman) Date: 2008-01-11 19:08
Same or related issues: Issue1221, Issue1409460
msg59737 - (view) Author: Stuart D Gathman (sdgathman) Date: 2008-01-11 19:14
Test cases so far:
  >>> parseaddr('')
  ('', '')
  >>> parseaddr('"Full Name" <>')
  ('Full Name', '')
  >>> parseaddr(' <>')
  ('', '')
  >>> parseaddr('God@heaven <,>')
  ('God@heaven', '')
  >>> parseaddr('Real Name ((comment)) <>')
  ('Real Name', '')
  >>> parseaddr('a(WRONG)@b')
  ('', 'a@b')
msg59751 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-01-11 21:18
An example from #1221:

>>> email.Utils.parseaddr("a(WRONG)@b")
('WRONG WRONG', 'a@b')
msg59753 - (view) Author: Stuart D Gathman (sdgathman) Date: 2008-01-11 21:21
tiran: yes, but that is the wrong answer, and that example is already in
the testcase list (with what I believe is the right answer).
msg301485 - (view) Author: Alexander Mohr (thehesiod) * Date: 2017-09-06 16:59
from 3.6:
>>> AddrlistClass('John Smith <john.smith(comment)>').getcomment()

>>> AddrlistClass('John Smith <john.smith(comment)>').getdomain()

totally messed up :)
msg301504 - (view) Author: Alexander Mohr (thehesiod) * Date: 2017-09-06 19:45
looks like these were meant to be internal methods, retracting new issues
