Issue 18022: Inconsistency between quopri.decodestring() and email.quoprimime.decode()

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/62222

classification

Title:	Inconsistency between quopri.decodestring() and email.quoprimime.decode()
Type:	behavior	Stage:
Components:	email, Library (Lib)	Versions:	Python 3.7, Python 3.6, Python 3.5, Python 2.7

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Jeremy.Hylton, barry, gvanrossum, martin.panter, r.david.murray, serhiy.storchaka
Priority:	normal	Keywords:

Created on 2013-05-20 14:13 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin.

Messages (12)
msg189663 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-05-20 14:13
>>> import quopri, email.quoprimime >>> quopri.decodestring(b'==41') b'=41' >>> email.quoprimime.decode('==41') '=A' I don't see a rule about double '=' in RFC 1521-1522 or RFCs 2045-2047 and I think quopri is wrong. Other half of this bug (encoding '=' as '==') was fixed in 9bc52706d283.
msg190874 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-06-09 21:28
There are other inconsistencies. email.quoprimime.decode(), binascii.a2b_qp() and pure Python (by default binascii used) quopri.decodestring() returns different results for following data: quoprimime binascii quopri b'=' '' b'' b'=' b'==' '=' b'=' b'==' b'= ' '' b'= ' b'= ' b'= \n' '' b'= \n' b'' b'=\r' '' b'' b'=\r' b'==41' '=A' b'=41' b'=A'
msg190876 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2013-06-09 22:51
Most of the variations represent different invalid-input recovery choices. I believe binascii's decoding of b'= \n' is incorrect, as is its decoding of b'==41'. quopri's decoding of b'=\r' is arguably incorrect as well, given that python generally supports universal line ends. Otherwise the decodings are all responses to erroneous input for which the behavior is not specified. That said, we ought to pick one error recovery scheme and implement it in all places, and IMO it shouldn't be exactly any of the ones we've got. Or better yet, use one common implementation. Untangling quopri is on my (too large) List of Things To Do :)
msg190889 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-06-10 06:35
Perl's MIME::QuotedPrint produces same result as pure Python quopri. konwert qp-8bit produces same result as binascii (except '==41' it decodes as '=A'). RFC 2045 says: """A reasonable approach by a robust implementation might be to include the "=" character and the following character in the decoded data without any transformation and, if possible, indicate to the user that proper decoding was not possible at this point in the data. """
msg197665 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-09-13 21:13
So what scheme we will picked?
msg197715 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2013-09-14 15:07
As I said, not exactly any of the above. I'll get back to this after I finish the new email code (which should happen before the end of the month). I need to take some time to look over the RFCs and real world examples and come up with the most appropriate rules.
msg290521 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-03-26 10:13
Ping.
msg290525 - (view)	Author: Martin Panter (martin.panter) *	Date: 2017-03-26 12:08
The double equals "==" case for the “quopri” implementation in Python is now consistent with the others thanks to the fix in Issue 23681 (see also Issue 21511). According to Issue 20121, the quopri (Python) implementation only supports LF (\n) characters as line breaks, and the binascii (C) implementation also supports CRLF. So I agree that the whitespace-before-newline case "= \n" is a genuine bug (see Issue 16473). But the CR case "=\r" is not supported because neither quopri nor binascii support universal newlines or CR line breaks on their own.
msg290533 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-03-26 15:08
Thus currently the table of discrepancies looks as: quoprimime binascii quopri b'=' '' b'' b'=' b'= ' '' b'= ' b'= ' b'= \n' '' b'= \n' b'' b'=\r' '' b'' b'=\r' b'==41' '=A' b'=41' b'=41'
msg316563 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-05-14 19:41
OK, I've finally gotten around to looking at this. It looks like quopri and binascii are not stripping trailing whitespace. quoprimime binascii quopri preferred b'=' '' b'' b'=' '=' b'= ' '' b'= ' b'= ' '=' b'= \n' '' b'= \n' b'' quoprimime binascii quopri b'=' '' b'' b'=' b'= ' '' b'= ' b'= ' b'= \n' '' b'= \n' b'' b'=\r' '' b'' b'=\r' b'==41' '=A' b'=41' b'=41' '=\n' b'=\r' '' b'' b'=\r' '=\r' b'==41' '=A' b'=41' b'=41' '=A' b'= \n f\n' ' f\n' b'= \n f\n' b'= \n f\n' ' f\n' The RFC recommends that a trailing = be preserved, but that trailing whitespace be ignored. It doesn't speak directly to the ==41 case, but one can infer that the first = in the == pair is most likely to have "come from the source text" and not been encoded, while the =41 was an intentional encoding and so should be decoded. Now, that said, the actual behavior that our libraries have had for a long time is to treat the "last line" just like all other lines, and strip a trailing =. So I would be inclined to keep that behavior for backward compatibility reasons rather than change it to be more RFC compliant, given that we don't have any actual bug report related to it, and "fixing" it could break things. Given that, the current quoprimime behavior becomes the reference. However, backward compatibility concerns also arise around starting to strip trailing space in quopri and binascii. Maybe we only make that change in 3.8?
msg316622 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2018-05-15 05:28
Many thanks David! But sorry, your table confused me. I can't read it. Could you please reformat it?
msg316642 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-05-15 11:59
I should have just deleted the table, actually. The only important info in it is that per RFC '=', '=\n', and '= \n' all ought to become '='. But I don't think we should make that change, I think we should continue to turn those into ''. So I consider the (current!) bwehavior of quoprimime to be the correct behavior. I also gave the example of '= \n foo\n', to show that quopri and binascii aren't stripping trailing blanks, as Martin noted in the other issue. They fold lines if they see '=\n', but not if they see '= \n', which is wrong per the (email!) RFC. I'm not clear if it is wrong for non-email uses of quopric, I haven't tried to research that.

History
Date	User	Action	Args
2022-04-11 14:57:45	admin	set	github: 62222
2018-05-15 11:59:26	r.david.murray	set	messages: + msg316642
2018-05-15 05:28:59	serhiy.storchaka	set	messages: + msg316622
2018-05-14 19:41:13	r.david.murray	set	messages: + msg316563
2017-03-26 15:08:41	serhiy.storchaka	set	messages: + msg290533
2017-03-26 12:08:43	martin.panter	set	messages: + msg290525
2017-03-26 10:13:29	serhiy.storchaka	set	messages: + msg290521 versions: + Python 3.5, Python 3.6, Python 3.7, - Python 3.3, Python 3.4
2015-04-11 21:48:00	gvanrossum	unlink	issue21511 dependencies
2015-01-17 22:43:15	martin.panter	set	nosy: + martin.panter
2014-05-16 19:22:01	r.david.murray	link	issue21511 dependencies
2013-09-14 15:07:05	r.david.murray	set	messages: + msg197715
2013-09-13 21:13:51	serhiy.storchaka	set	messages: + msg197665
2013-06-10 06:35:36	serhiy.storchaka	set	messages: + msg190889
2013-06-09 22:51:15	r.david.murray	set	messages: + msg190876
2013-06-09 21:28:22	serhiy.storchaka	set	messages: + msg190874
2013-05-20 14:13:28	serhiy.storchaka	create