Issue 34155: [CVE-2019-16056] email.utils.parseaddr mistakenly parse an email

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/78336

classification

Title:	[CVE-2019-16056] email.utils.parseaddr mistakenly parse an email
Type:	security	Stage:	resolved
Components:	email	Versions:	Python 3.9, Python 3.8, Python 3.7, Python 3.6, Python 3.5, Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	Anselmo Melo, Dain Dwarf, Windson Yang, aeros, barry, bortzmeyer, cnicodeme, jpic, kal.sze, larry, lukasz.langa, maxking, mcepl, miss-islington, msapiro, ned.deily, nicoe, r.david.murray, rcsanchez97, rschiron, vstinner, xtreak
Priority:	critical	Keywords:	patch, security_issue

Created on 2018-07-19 14:53 by cnicodeme, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
Screen Shot 2019-05-02 at 22.07.27.png	barry, 2019-05-03 02:09

Pull Requests
URL	Status	Linked	Edit
PR 13079	merged	python-dev, 2019-05-03 21:27
PR 14824	merged	miss-islington, 2019-07-17 21:54
PR 14825	merged	miss-islington, 2019-07-17 21:54
PR 14826	merged	miss-islington, 2019-07-17 21:54
PR 15317	merged	maxking, 2019-08-17 02:14
PR 16006	merged	python-dev, 2019-09-11 21:10

Messages (52)
msg321956 - (view)	Author: Cyril Nicodème (cnicodeme)	Date: 2018-07-19 14:53
Hi! I'm trying to parse some emails, and I discovered that email.utils.parseaddr wrongly parse an email. Here's the corresponding header: > From: =?utf-8?Q?zq@redacted.com.cn=E3=82=86=E2=86=91=E3=82=86?= =?utf-8?Q?=E3=82=83=E3=82=85=E3=81=87=E3=81=BA=E3=81=BD=E3=81=BC"\=E3?= =?utf-8?Q?=81=A9=E3=81=A5=E3=81=A2l=E3=81=A0=E3=81=B0=E3=81=A8=E3=81?= =?utf-8?Q?=8FKL=E3=81=84=E3=82=8C=E3=82=8B=E3=82=86>KL=E3=82=89JF?= <mxvu@redacted2.com> Once this has been parsed via `decode_header`, we obtain this value: > From: zq@redacted.com.cnゆ↑ゆゃゅぇぺぽぼ"\どづぢlだばとくKLいれるゆ>KLらJF <mxvu@redacted2.com> (I agree, not really a nice looking From email ...) Then, when this value is given to parseaddr, here's the result: > ('', 'zq@redacted.com.cnゆ↑ゆゃゅぇぺぽぼ') But it should be: > ('zq@redacted.com.cnゆ↑ゆゃゅぇぺぽぼ"\どづぢlだばとくKLいれるゆ>KLらJF', 'mxvu@redacted2.com') (Note that the email in the "name" part is not the same as the email in the "email" part!)
msg321957 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-07-19 15:18
That does appear to be a bug. Note that the new email API handles it correctly: >>> x = """ ... > From: =?utf-8?Q?zq@redacted.com.cn=E3=82=86=E2=86=91=E3=82=86?= ... =?utf-8?Q?=E3=82=83=E3=82=85=E3=81=87=E3=81=BA=E3=81=BD=E3=81=BC"\=E3?= ... =?utf-8?Q?=81=A9=E3=81=A5=E3=81=A2l=E3=81=A0=E3=81=B0=E3=81=A8=E3=81?= ... =?utf-8?Q?=8FKL=E3=81=84=E3=82=8C=E3=82=8B=E3=82=86>KL=E3=82=89JF?= ... <mxvu@redacted2.com> ... """ >>> from email import message_from_string >>> from email.policy import default >>> m = message_from_string(x+'\n\ntest', policy=default) >>> m['from'] '"zq@redacted.com.cnゆ↑ゆゃゅぇぺぽぼ\\"\\\\� ��づぢlだばと� �KLいれるゆ>KLらJF" <mxvu@redacted2.com>' >>> m['from'].addresses[0].addr_spec 'mxvu@redacted2.com' >>> m['from'].addresses[0].display_name 'zq@redacted.com.cnゆ↑ゆゃゅぇぺぽぼ"\\\udce3 \udc81\udca9づぢlだばと\udce3\udc81 \udc8fKLいれるゆ>KLらJF' I'm not particularly interested myself in fixing parseaddr to handle this case correctly, since it is the legacy API, but if someone else wants to I'll review the patch.
msg321958 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-07-19 15:19
Oops, I left out a step in that cut and paste. For completeness: >>> x = x[3:]
msg321959 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-07-19 15:21
Ah, maybe it doesn't handle it completely correctly; that decode looks different now that I look at it in detail.
msg321967 - (view)	Author: Jakub Wilk (jwilk)	Date: 2018-07-19 21:03
You should not use decode_header() on the whole From header, because that loses information. You should parse the header first, then decode the parts that could be RFC2047-encoded. Quoting <https://tools.ietf.org/html/rfc2047#section-6.2>: > NOTE: Decoding and display of encoded-words occurs after a > structured field body is parsed into tokens. It is therefore > possible to hide 'special' characters in encoded-words which, when > displayed, will be indistinguishable from 'special' characters in the > surrounding text. For this and other reasons, it is NOT generally > possible to translate a message header containing 'encoded-word's to > an unencoded form which can be parsed by an RFC 822 mail reader. So I don't see a bug in parseaddr() here, except that the API is a bit of a footgun.
msg329372 - (view)	Author: Mark Sapiro (msapiro) *	Date: 2018-11-06 18:14
The issue is illustrated much more simply as follows: email.utils.parseaddr('John Doe jdoe@example.com <other@example.net>') returns ('', 'John Doe jdoe@example.com') whereas it should return ('John Doe jdoe@example.com', 'other@example.net') I'll look at developing a patch.
msg329376 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-11-06 19:23
>>> m = message_from_string("From: John Doe jdoe@example.com <other@example.net>\n\n", policy=default) >>> m['From'].addresses(Address(display_name='', username='John Doe jdoe', domain='example.com'),) The new policies have more error recovery for non-RFC compliant addresses than decode_header, but the two agree in this case. What is happening here is that (1) an unquoted/unencoded '@' is not allowed in a display name (2) if the address is not '<>' quoted, then everything before the @ is the username and (3) in the absence of a comma after the end of the fqdn (which is not allowed to contain blanks) any additional tokens are discarded. One could argue that we could treat the blank after the FQDN as a "missing comma", and there would be some merit to that argument. You could also argue that a "<>" quoted string would trump the occurrence of the @ earlier in the token list. However, the RFC822 grammar is designed to be parsed character by character, so that would not be a typical way for an RFC822 parser to try to do postel-style error recovery. So, I don't think there is a bug here, but I'd be curious what other email address parsing libraries do, and that could influence whether extensions to the "make a guess when the string doesn't conform to the RFC" code would be acceptable.
msg329377 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-11-06 19:24
The formatting of that doctest paragraph got messed up. Let me try again: >>> m = message_from_string("From: John Doe jdoe@example.com <other@example.net>\n\n", policy=default) >>> m['From'].addresses (Address(display_name='', username='John Doe jdoe', domain='example.com'),)
msg329379 - (view)	Author: Karthikeyan Singaravelan (xtreak) *	Date: 2018-11-06 19:48
Is this a case of realname having @ inside an unquoted string? As I can see from the RFC the acceptable characters of an atom other than alphabets and digits that comprises a phrase are ['!', '#', '$', '%', '&', "'", '', '+', '-', '/', '=', '?', '^', '_', '`', '{', '\|', '}', '~'] . So just curious if it's a case of @ inside unquoted string as name? >>> for char in accepted: ... print(parseaddr(f'John Doe jdoe{char}example.com <other@example.net>')) ... ('John Doe jdoe!example.com', 'other@example.net') ('John Doe jdoe#example.com', 'other@example.net') ('John Doe jdoe$example.com', 'other@example.net') ('John Doe jdoe%example.com', 'other@example.net') ('John Doe jdoe&example.com', 'other@example.net') ("John Doe jdoe'example.com", 'other@example.net') ('John Doe jdoeexample.com', 'other@example.net') ('John Doe jdoe+example.com', 'other@example.net') ('John Doe jdoe-example.com', 'other@example.net') ('John Doe jdoe/example.com', 'other@example.net') ('John Doe jdoe=example.com', 'other@example.net') ('John Doe jdoe?example.com', 'other@example.net') ('John Doe jdoe^example.com', 'other@example.net') ('John Doe jdoe_example.com', 'other@example.net') ('John Doe jdoe`example.com', 'other@example.net') ('John Doe jdoe{example.com', 'other@example.net') ('John Doe jdoe\|example.com', 'other@example.net') ('John Doe jdoe}example.com', 'other@example.net') ('John Doe jdoe~example.com', 'other@example.net') >>> parseaddr('"John Doe jdoe@example.com" <other@example.net>') ('John Doe jdoe@example.com', 'other@example.net') >>> parseaddr('John Doe jdoe@example.com <other@example.net>') ('', 'John Doe jdoe@example.com')
msg329380 - (view)	Author: Mark Sapiro (msapiro) *	Date: 2018-11-06 19:55
I agree that my example with an @ in the 'display name', although actually seen in the wild, is non-compliant, and that the behavior of parseaddr() in this case is not a bug. Sorry for the noise.
msg329382 - (view)	Author: Karthikeyan Singaravelan (xtreak) *	Date: 2018-11-06 20:27
Ah sorry, I was typing so long and had an idle session that I didn't realize @r.david.murray added a comment with the explanation. Just to add I tried using Perl module (https://metacpan.org/release/Email-Address) that uses regex for parsing that returns me two addresses and the regex is also not much comprehensible. use v5.14; use Email::Address; my $line = 'John Doe jdoe@example.com <other@example.net>'; my @addresses = Email::Address->parse($line); say $addresses[0]; say $addresses[1]; say "Angle address regex"; say $Email::Address::angle_addr; jdoe@example.com other@example.net Angle address regex (?^:(?^:(?^:\s$(?:\s(?^:(?^:(?>[^()\\]+))\|(?^:\\(?^:[^\x0A\x0D]))\|))\s$\s)\|\s+)<(?^:(?^:(?^:(?^:(?^:\s$(?:\s(?^:(?^:(?>[^()\\]+))\|(?^:\\(?^:[^\x0A\x0D]))\|))\s$\s)\|\s+)(?^:[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+))(?^:(?^:\s$(?:\s(?^:(?^:(?>[^()\\]+))\|(?^:\\(?^:[^\x0A\x0D]))\|))\s$\s)\|\s+))\|(?^:(?^:(?^:\s$(?:\s(?^:(?^:(?>[^()\\]+))\|(?^:\\(?^:[^\x0A\x0D]))\|))\s$\s)\|\s+)"(?^:(?^:[^\\"])\|(?^:\$?^:[^\x0A\x0D])))"(?^:(?^:\s\((?:\s(?^:(?^:(?>[^()\\]+))\|(?^:\\(?^:[^\x0A\x0D]))\|))\s$\s)\|\s+)))\@(?^:(?^:(?^:(?^:\s$(?:\s(?^:(?^:(?>[^()\\]+))\|(?^:\\(?^:[^\x0A\x0D]))\|))\s$\s)\|\s+)(?^:[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+))(?^:(?^:\s$(?:\s(?^:(?^:(?>[^()\\]+))\|(?^:\\(?^:[^\x0A\x0D]))\|))\s$\s)\|\s+))\|(?^:(?^:(?^:\s$(?:\s(?^:(?^:(?>[^()\\]+))\|(?^:\\(?^:[^\x0A\x0D]))\|))\s$\s)\|\s+)\[(?:\s(?^:(?^:[^\[\]\\])\|(?^:\$?^:[^\x0A\x0D]))))\s\](?^:(?^:\s\((?:\s(?^:(?^:(?>[^()\\]+))\|(?^:\\(?^:[^\x0A\x0D]))\|))\s$\s)\|\s+))))>(?^:(?^:\s$(?:\s(?^:(?^:(?>[^()\\]+))\|(?^:\\(?^:[^\x0A\x0D]))\|))\s$\s)\|\s+)) Thanks
msg329463 - (view)	Author: Kal Sze (kal.sze)	Date: 2018-11-08 08:23
Another failure case: >>> from email.utils import parseaddr >>> parseaddr('fo@o@bar.com') ('', 'fo@o') If I understand the RFC correctly, the correct results should be ('', '') because there are two '@' signs. The first '@' would need to be quoted for the address to be valid.
msg340534 - (view)	Author: Stéphane Bortzmeyer (bortzmeyer)	Date: 2019-04-19 10:16
Note that this bug was used in an actual security attack so it is serious https://medium.com/@fs0c131y/tchap-the-super-not-secure-app-of-the-french-government-84b31517d144 https://twitter.com/fs0c131y/status/1119143946687434753
msg340535 - (view)	Author: Karthikeyan Singaravelan (xtreak) *	Date: 2019-04-19 10:28
Relevant attack from matrix blog post. https://matrix.org/blog/2019/04/18/security-update-sydent-1-0-2/ > sydent uses python's email.utils.parseaddr function to parse the input email address before sending validation mail to it, but it turns out that if you hand parseaddr an malformed email address of form a@b.com@c.com, it silently discards the @c.com prefix without error. The result of this is that if one requested a validation token for 'a@malicious.org@important.com', the token would be sent to 'a@malicious.org', but the address 'a@malicious.org@important.com' would be marked as validated. This release fixes this behaviour by asserting that the parsed email address is the same as the input email address. I am marking this as a security issue.
msg340933 - (view)	Author: jpic (jpic) *	Date: 2019-04-26 16:42
Given the situation, could raising a SecurityWarning and a DeprecationWarning fix this issue ?
msg341069 - (view)	Author: Dain Dwarf (Dain Dwarf)	Date: 2019-04-29 11:42
Hello, kind of new here. I just wanted to note that the issue that lead to Tchap's security attack still exists in the non-deprecated message_from_string function: email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) So, deprecating parseaddr is not enough for security purpose, unless there is another ticket for the new email API.
msg341294 - (view)	Author: Ned Deily (ned.deily) *	Date: 2019-05-02 18:07
@barry, @r.david.murray, With the additional info about attacks in the wild, should we now consider this a security issue? If so, someone needs to provide an actual PR. (Raising the priority to "deferred blocker" pending evaluation.)
msg341320 - (view)	Author: Windson Yang (Windson Yang) *	Date: 2019-05-03 01:45
I found the issue located in https://github.com/python/cpython/blob/master/Lib/email/_parseaddr.py#L277 elif self.field[self.pos] in '.@': # email address is just an addrspec # this isn't very efficient since we start over self.pos = oldpos self.commentlist = oldcl addrspec = self.getaddrspec() returnlist = [(SPACE.join(self.commentlist), addrspec)] The parseaddr function runs a for in loop over the input string, when it meets '.@' it will do something. That is why when the input string is 'foo@bar.com@example.com' will return ('', 'foo@bar.com'). One possible solution will be to check the string in the reverse order then we can always get the last '@' in the string.
msg341322 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2019-05-03 02:09
Well, at least we're not alone. Here's a screen capture from Mail.app Version 12.4 (3445.104.8).
msg341362 - (view)	Author: jpic (jpic) *	Date: 2019-05-03 23:57
I haven't found this specific case in an RFC, but checked Go's net/mail library behavior and it just considers it broken: $ cat mail.go package main import "fmt" import "net/mail" func main() { fmt.Println((&mail.AddressParser{}).Parse("a@example.com")) fmt.Println((&mail.AddressParser{}).Parse("a@malicious.org@example.com ")) } $ go run mail.go <a@example.com> <nil> <nil> mail: expected single address, got "@example.com" That would fix the security issue but not the whole ticket.
msg341367 - (view)	Author: jpic (jpic) *	Date: 2019-05-04 01:10
The pull request has been updated to mimic net/mail's behavior rather than trying to workaround user input. Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') I like what I saw under the hood, please feel free to hack me for other tasks in the email stdlib.
msg341370 - (view)	Author: Windson Yang (Windson Yang) *	Date: 2019-05-04 02:16
Frome the answer from Alnitak (https://stackoverflow.com/questions/12355858/how-many-symbol-can-be-in-an-email-address). Maybe we should raise an error when the address has multiple @ in it.
msg341381 - (view)	Author: jpic (jpic) *	Date: 2019-05-04 13:02
At is allowed in the local part if quoted, the proposed patch acts within get_domain() and getdomain() and does not affect local part parsing. This still works: >>> parseaddr('"fo@bar"@bar.com') ('', '"fo@bar"@bar.com') >>> email.message_from_string('From: "a@b"@ex.com ',policy=email.policy.default)['from'].addresses (Address(display_name='', username='a@b', domain='ex.com'),) I'm not against raising an exception in parseaddr, but you should know that currently nothing in parseaddr seems to raise an exception: jpic@ci:~/cpython$ grep raise Lib/email/_parseaddr.py jpic@ci:~/cpython$ For example: >>> parseaddr('aoeu') ('', 'aoeu') >>> parseaddr('a@') ('', 'a@') None of the above calls raised an exception. That is the reason why I did not introduce a new Exception in the getdomain() change: I thought it would be more consistent with the rest of the API as such. As for the new API, the patch does raise a parse error: # this detect that the next caracter right after the parsed domain is another @ if value and value[0] == '@': raise errors.HeaderParseError('Multiple domains') But that's in the lower level API that is planned for going public later on (probably when it will have unit tests), it's just the higher level API that the user faces that swallows it. As a user you can still know about that parse problem using the defects attribute: >>> email.message_from_string('From: a@malicious.org@example.com', policy=email.policy.default)['from'].defects[0] InvalidHeaderDefect('invalid address in address-list')
msg344030 - (view)	Author: Abhilash Raj (maxking) *	Date: 2019-05-31 06:26
How about we go a slightly different route than suggested by jpic and instead of returning a None value, we return the entire rest of the string as the domain? That would take care of the security issue since it won't be a valid domain anymore. msg = email.message_from_string( 'From: SomeAbhilashRaj <abhilash@malicious.org@important.com>', policy=email.policy.default) print(msg['From'].addresses) print(msg['From'].defects) (Address(display_name='SomeAbhilashRaj', username='abhilash', domain='malicious.org@important.com>'),) (InvalidHeaderDefect('invalid address in address-list'), InvalidHeaderDefect("missing trailing '>' on angle-addr"), InvalidHeaderDefect("unpected '@' in domain"), ObsoleteHeaderDefect("period in 'phrase'")) This lets us do postel-style error recovery while working in RFC 2822 style grammar. I wrote this patch to achieve this: @@ -1573,6 +1574,11 @@ def get_domain(value): domain.append(DOT) token, value = get_atom(value[1:]) domain.append(token) + if value and value[0] == '@': + domain.defects.append(errors.InvalidHeaderDefect( + "unpected '@' in domain")) + token = get_unstructured(value) + domain.append(token) return domain, value Does this makes sense?
msg344157 - (view)	Author: jpic (jpic) *	Date: 2019-06-01 07:51
The email API does error recovery without loading invalid domains into the domain variable which could lead to dangerous situations, example with "a@foo.": >>> email.message_from_string('From: a@foo.',policy=email.policy.default)['from'].addresses[0].domain '' In perspective with the new patch proposed by maxking that lets an invalid domain make it to the domain variable: >>> email.message_from_string('From: a@b@c.com',policy=email.policy.default)['from'].addresses[0].domain 'b@c.com' For me maxking's suggestion opens the question of where to draw the line between invalid domains should be loaded into the domain variable and what invalid domains should not be loaded into the domain variable. Another smaller advantage of of Go's net/mail behaviour is that results between parseaddr and email are consistently empty strings for an invalid domain: parseaddr does not seem able to return a list of defects.
msg344205 - (view)	Author: Abhilash Raj (maxking) *	Date: 2019-06-01 19:20
I don't know if we can make the API consistent between parseaddr and the parsing header value since they are completely different even right now. Like you already noticed there is no way to register defects and instead parseaddr returns ('', '') to denote the failure to parse. About parsing malicious domain, my line of thinking was along the lines of presenting whatever is there to user of the API, without 'hiding' that information. It would be harder to figure out the exception if the domain is missing. Even though the domain is parsed in the `domain` value, the value itself is clearly invalid. Any attempt to ever use that Address() will definitely cause an error (perhaps, there should be a sanity check in SMTP.send_message for that?).
msg344389 - (view)	Author: jpic (jpic) *	Date: 2019-06-03 07:51
Thanks for your explanation, but in perspective with other invalid domains, such as "foo." currently resulting in an empty string too: >>> email.message_from_string('From: a@foo.',policy=email.policy.default)['from'].addresses[0].domain '' Do you think this should also change the value of domain to "foo." ? Also yes with parseaddr it seems that domain is an empty string if it didn't find a valid domain at all, which is pretty safe in case of malicious injection attempt - if that's what we're trying to save python programs from, a clarification of the objective of the patch would be welcome.
msg344431 - (view)	Author: Abhilash Raj (maxking) *	Date: 2019-06-03 15:44
I agree that we currently abandon parsing (raise `HeaderParseError`) when we encounter a unexpected token when parsing domain (expected token is dot-atom-text). However, that mechanism is meant to signal the higher level parser that we should look for a different type of token. In case of domain, we don't fallback to anything. I believe we should fallback to `get_unstructured` when we do encounter an invalid domain (either `foo.` or `foo@exaomple` or `foo@example.com`) and register defect. But, the `.domain` attribute on the address class should be None if the domain is invalid. My proposed solution of `get_unstrucutured` is perhaps not a great idea either since we would end up parsing more than we should (maybe we should parse until `>`?) in case of AddrList or something. I would love to know what David and Barry think about this one?
msg344432 - (view)	Author: Abhilash Raj (maxking) *	Date: 2019-06-03 15:46
slight typo in the previous message: s/fallback to `get_unstructured` /fallback to something/g
msg347157 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2019-07-02 21:43
I still think the only way to read the documentation for parseaddr('a@b@c') is to return ('', '') - a tuple of empty strings. The documentations says: "Returns a tuple of that information, unless the parse fails, in which case a 2-tuple of ('', '') is returned." Of course, it doesn't define exactly what a "failing parse" is, but I would claim that a non-RFC compliant address should fail to parse, at least for the parseaddr() interface. I'm not concerned about inconsistencies between message_from_string() and parseaddr(). They are difference APIs. I'll follow up on the PR, but does anybody disagree with that reasoning?
msg347183 - (view)	Author: Cyril Nicodème (cnicodeme)	Date: 2019-07-03 07:17
This thread has been really interesting to follow, I'm glad to have opened it :) I would agree with Barry here, it should follow the documentation. BUT, I would suggest to add a "strict" parameter that would throw exceptions depending on the parsing issue (missing a @, having multiple @, etc). That way, a basic usage would return the empty strings, letting the developer know the email is invalid, and advanced case would still be possible. By default, I think having strict set to False would be logical, since it would follow the documentation.
msg347223 - (view)	Author: jpic (jpic) *	Date: 2019-07-03 12:10
Thanks for the kind words Cyril, sorry that this patch doesn't address exactly the issue that you have described initially, but rather the security issue related to it. The exception depending on the parsing issue is already supported by the new API, although it's just "Invalid Domain" for now. For user interfaces it would be nice to detail parse errors indeed. Again I wonder if this should be a separate issue. Concerning the default behavior, @maxking will know but I would try to defend the "secure by default" paradigm if necessary, especially in the deprecated API. Meanwhile, I think it would create more value for Python to invest in feature development in the new API, that has a very nice private API but apparently lacks unit tests and documentation before becoming available to users.
msg348082 - (view)	Author: miss-islington (miss-islington)	Date: 2019-07-17 21:54
New changeset 8cb65d1381b027f0b09ee36bfed7f35bb4dec9a9 by Miss Islington (bot) (jpic) in branch 'master': bpo-34155: Dont parse domains containing @ (GH-13079) https://github.com/python/cpython/commit/8cb65d1381b027f0b09ee36bfed7f35bb4dec9a9
msg349278 - (view)	Author: miss-islington (miss-islington)	Date: 2019-08-09 08:30
New changeset c48d606adcef395e59fd555496c42203b01dd3e8 by Miss Islington (bot) in branch '3.7': bpo-34155: Dont parse domains containing @ (GH-13079) https://github.com/python/cpython/commit/c48d606adcef395e59fd555496c42203b01dd3e8
msg349279 - (view)	Author: miss-islington (miss-islington)	Date: 2019-08-09 08:31
New changeset 217077440a6938a0b428f67cfef6e053c4f8673c by Miss Islington (bot) in branch '3.8': bpo-34155: Dont parse domains containing @ (GH-13079) https://github.com/python/cpython/commit/217077440a6938a0b428f67cfef6e053c4f8673c
msg349292 - (view)	Author: Ned Deily (ned.deily) *	Date: 2019-08-09 15:22
New changeset 13a19139b5e76175bc95294d54afc9425e4f36c9 by Ned Deily (Miss Islington (bot)) in branch '3.6': bpo-34155: Dont parse domains containing @ (GH-13079) (GH-14826) https://github.com/python/cpython/commit/13a19139b5e76175bc95294d54afc9425e4f36c9
msg349357 - (view)	Author: Abhilash Raj (maxking) *	Date: 2019-08-10 20:31
Closing this since teh PRs are merged.
msg349464 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-08-12 13:00
I change the issue type to security because of https://bugs.python.org/issue34155#msg340534: "Note that this bug was used in an actual security attack so it is serious".
msg349465 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-08-12 13:02
This issue is a security issue so Python 2.7, 3.5, 3.6 should also be fixed, no?
msg349820 - (view)	Author: Abhilash Raj (maxking) *	Date: 2019-08-15 19:19
@Victor: This is already backported to 3.6. I am not sure about what gets backported to 3.5 right now, I don't even see a 'Backport to 3.5' label on Github (which made me think we are discouraged to backport to 3.5). I can work on a manual backport if needed? This patch most probably won't backport to 2.7 without re-writing it completely since the implementation in 2.7 is much different than what we have today.
msg349891 - (view)	Author: Kyle Stanley (aeros) *	Date: 2019-08-17 00:01
> This is already backported to 3.6. I am not sure about what gets backported to 3.5 right now, I don't even see a 'Backport to 3.5' label on Github (which made me think we are discouraged to backport to 3.5). I can work on a manual backport if needed? As far as I'm aware, backports to 3.5 have to be manually approved by those with repository management permissions, such the the organization owners (https://devguide.python.org/devcycle/#current-owners) and admins (https://devguide.python.org/devcycle/#current-administrators)
msg349892 - (view)	Author: Abhilash Raj (maxking) *	Date: 2019-08-17 02:15
Created a backport PR for 3.5.
msg349957 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-08-19 14:04
> Created a backport PR for 3.5. Thanks. I reviewed it (LGTM). What about Python 2.7, it's also vulnerable, no?
msg349968 - (view)	Author: Abhilash Raj (maxking) *	Date: 2019-08-19 20:28
2.7 needs a separate PR since the code is very different and my familiarity with 2.7 version of email package is very limited. I am going to work on a separate patch later this week for 2.7.
msg350291 - (view)	Author: Łukasz Langa (lukasz.langa) *	Date: 2019-08-23 14:07
Downgraded the severity since 3.6 - 3.9 are merged.
msg351281 - (view)	Author: Larry Hastings (larry) *	Date: 2019-09-07 05:24
New changeset 063eba280a11d3c9a5dd9ee5abe4de640907951b by larryhastings (Abhilash Raj) in branch '3.5': [3.5] bpo-34155: Dont parse domains containing @ (GH-13079) (#15317) https://github.com/python/cpython/commit/063eba280a11d3c9a5dd9ee5abe4de640907951b
msg351283 - (view)	Author: Larry Hastings (larry) *	Date: 2019-09-07 05:39
All PRs merged. Thanks, everybody!
msg351364 - (view)	Author: Riccardo Schirone (rschiron)	Date: 2019-09-09 08:51
CVE-2019-16056 has been assigned to this issue. See https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-16056 .
msg351377 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-09-09 09:31
I reopen the issue since Python 2.7 is still vulnerable.
msg352230 - (view)	Author: Roberto C. Sánchez (rcsanchez97) *	Date: 2019-09-12 23:06
I am working on Debian LTS support. I have submitted a PR that contains the necessary adjustments to implement the fix in 2.7.
msg352444 - (view)	Author: miss-islington (miss-islington)	Date: 2019-09-14 17:26
New changeset 4cbcd2f8c4e12b912e4d21fd892eedf7a3813d8e by Miss Islington (bot) (Roberto C. Sánchez) in branch '2.7': [2.7] bpo-34155: Dont parse domains containing @ (GH-13079) (GH-16006) https://github.com/python/cpython/commit/4cbcd2f8c4e12b912e4d21fd892eedf7a3813d8e
msg352445 - (view)	Author: Abhilash Raj (maxking) *	Date: 2019-09-14 17:43
Merged in 2.7, closing this one finally! Thanks to everyone who helped with this :)

History
Date	User	Action	Args
2022-04-11 14:59:03	admin	set	github: 78336
2019-09-14 17:43:40	maxking	set	status: open -> closed resolution: fixed messages: + msg352445 stage: patch review -> resolved
2019-09-14 17:26:40	miss-islington	set	messages: + msg352444
2019-09-12 23:06:37	rcsanchez97	set	nosy: + rcsanchez97 messages: + msg352230
2019-09-11 21:10:36	python-dev	set	stage: resolved -> patch review pull_requests: + pull_request15632
2019-09-11 01:32:47	Anselmo Melo	set	nosy: + Anselmo Melo
2019-09-09 13:49:10	mcepl	set	nosy: + mcepl
2019-09-09 09:33:42	vstinner	set	title: email.utils.parseaddr mistakenly parse an email -> [CVE-2019-16056] email.utils.parseaddr mistakenly parse an email
2019-09-09 09:31:27	vstinner	set	status: closed -> open resolution: fixed -> (no value) messages: + msg351377
2019-09-09 08:51:52	rschiron	set	nosy: + rschiron messages: + msg351364
2019-09-07 05:39:24	larry	set	status: open -> closed resolution: fixed messages: + msg351283
2019-09-07 05:24:08	larry	set	nosy: + larry messages: + msg351281
2019-08-23 14:07:28	lukasz.langa	set	priority: deferred blocker -> critical nosy: + lukasz.langa messages: + msg350291
2019-08-19 20:28:31	maxking	set	messages: + msg349968
2019-08-19 14:04:52	vstinner	set	messages: + msg349957
2019-08-17 02:15:34	maxking	set	messages: + msg349892 stage: patch review -> resolved
2019-08-17 02:14:00	maxking	set	stage: resolved -> patch review pull_requests: + pull_request15036
2019-08-17 00:01:00	aeros	set	nosy: + aeros messages: + msg349891
2019-08-15 19:19:43	maxking	set	messages: + msg349820
2019-08-12 13:02:14	vstinner	set	status: closed -> open messages: + msg349465 versions: + Python 2.7, Python 3.5, Python 3.6
2019-08-12 13:00:38	vstinner	set	type: behavior -> security messages: + msg349464
2019-08-10 20:31:49	maxking	set	status: open -> closed messages: + msg349357 stage: patch review -> resolved
2019-08-09 15:22:25	ned.deily	set	messages: + msg349292
2019-08-09 08:31:33	miss-islington	set	messages: + msg349279
2019-08-09 08:30:50	miss-islington	set	messages: + msg349278
2019-07-17 21:54:55	miss-islington	set	pull_requests: + pull_request14620
2019-07-17 21:54:48	miss-islington	set	pull_requests: + pull_request14619
2019-07-17 21:54:41	miss-islington	set	pull_requests: + pull_request14618
2019-07-17 21:54:31	miss-islington	set	nosy: + miss-islington messages: + msg348082
2019-07-03 12:10:34	jpic	set	messages: + msg347223
2019-07-03 07:17:23	cnicodeme	set	messages: + msg347183
2019-07-02 21:43:09	barry	set	messages: + msg347157 versions: + Python 3.9
2019-06-03 15:46:14	maxking	set	messages: + msg344432
2019-06-03 15:44:34	maxking	set	messages: + msg344431
2019-06-03 07:51:50	jpic	set	messages: + msg344389
2019-06-01 19:20:33	maxking	set	messages: + msg344205
2019-06-01 07:51:22	jpic	set	messages: + msg344157
2019-05-31 06:26:22	maxking	set	nosy: + maxking messages: + msg344030
2019-05-04 13:02:09	jpic	set	messages: + msg341381
2019-05-04 02:16:20	Windson Yang	set	messages: + msg341370
2019-05-04 01:10:58	jpic	set	messages: + msg341367
2019-05-03 23:57:59	jpic	set	messages: + msg341362
2019-05-03 21:27:43	python-dev	set	keywords: + patch stage: patch review pull_requests: + pull_request12994
2019-05-03 02:09:13	barry	set	files: + Screen Shot 2019-05-02 at 22.07.27.png messages: + msg341322
2019-05-03 01:45:21	Windson Yang	set	nosy: + Windson Yang messages: + msg341320
2019-05-02 18:07:09	ned.deily	set	priority: normal -> deferred blocker nosy: + ned.deily messages: + msg341294
2019-04-29 12:03:35	jwilk	set	nosy: - jwilk
2019-04-29 11:42:55	Dain Dwarf	set	nosy: + Dain Dwarf messages: + msg341069
2019-04-26 16:42:43	jpic	set	nosy: + jpic messages: + msg340933
2019-04-23 07:45:20	vstinner	set	nosy: + vstinner
2019-04-23 07:13:33	vstinner	set	nosy: - vstinner
2019-04-19 12:02:10	nicoe	set	nosy: + nicoe
2019-04-19 10:28:43	xtreak	set	keywords: + security_issue nosy: + vstinner messages: + msg340535
2019-04-19 10:16:21	bortzmeyer	set	nosy: + bortzmeyer messages: + msg340534
2018-11-08 08:23:11	kal.sze	set	nosy: + kal.sze messages: + msg329463
2018-11-06 20:27:34	xtreak	set	messages: + msg329382
2018-11-06 19:55:26	msapiro	set	messages: + msg329380
2018-11-06 19:48:48	xtreak	set	nosy: + xtreak messages: + msg329379
2018-11-06 19:24:59	r.david.murray	set	messages: + msg329377
2018-11-06 19:23:24	r.david.murray	set	messages: + msg329376
2018-11-06 18:14:44	msapiro	set	nosy: + msapiro messages: + msg329372
2018-07-19 21:03:38	jwilk	set	nosy: + jwilk messages: + msg321967
2018-07-19 15:21:25	r.david.murray	set	messages: + msg321959
2018-07-19 15:19:13	r.david.murray	set	messages: + msg321958
2018-07-19 15:18:03	r.david.murray	set	messages: + msg321957 versions: + Python 3.7, Python 3.8, - Python 3.6
2018-07-19 14:53:43	cnicodeme	create