This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: email.utils.getaddresses improper parsing of unicode realnames
Type: enhancement Stage:
Components: email Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, konstantin2, r.david.murray, trrhodes
Priority: normal Keywords:

Created on 2020-12-30 16:01 by konstantin2, last changed 2022-04-11 14:59 by admin.

Messages (2)
msg384069 - (view) Author: Konstantin Ryabitsev (konstantin2) Date: 2020-12-30 16:01
What it currently does:

>>> import email.utils
>>> email.utils.getaddresses(['Shuming [范書銘] <shumingf@realtek.com>'])
[('', 'Shuming'), ('', ''), ('', '范書銘'), ('', ''), ('', 'shumingf@realtek.com')]

What it should do:

>>> import email.utils
>>> email.utils.getaddresses(['Shuming [范書銘] <shumingf@realtek.com>'])
[('Shuming [范書銘]'), 'shumingf@realtek.com')]
msg384071 - (view) Author: Ross Rhodes (trrhodes) * Date: 2020-12-30 16:47
Hi Konstantin,

Thanks for raising this issue. It appears the field provided in your example does not conform to RFC 2822 followed by this email library. Square brackets are treated as special characters in [section 3.2.1](https://tools.ietf.org/html/rfc2822#section-3.2.1), which is handled in the [_parseaddr](https://github.com/python/cpython/blob/master/Lib/email/_parseaddr.py#L219) file.

The above combined with the fact that any [failed parsing returns an two-tuple of ('', '')](https://github.com/python/cpython/blob/master/Lib/email/utils.py#L212) I believe explains the behavior observed.
History
Date User Action Args
2022-04-11 14:59:39adminsetgithub: 86953
2020-12-30 16:47:32trrhodessetnosy: + trrhodes
messages: + msg384071
2020-12-30 16:01:25konstantin2create