classification
Title: Inconsistency between quopri.decodestring() and email.quoprimime.decode()
Type: behavior Stage:
Components: email, Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Jeremy.Hylton, barry, gvanrossum, martin.panter, r.david.murray, serhiy.storchaka
Priority: normal Keywords:

Created on 2013-05-20 14:13 by serhiy.storchaka, last changed 2018-05-15 11:59 by r.david.murray.

Messages (12)
msg189663 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-20 14:13
>>> import quopri, email.quoprimime
>>> quopri.decodestring(b'==41')
b'=41'
>>> email.quoprimime.decode('==41')
'=A'

I don't see a rule about double '=' in RFC 1521-1522 or RFCs 2045-2047 and I think quopri is wrong.

Other half of this bug (encoding '=' as '==') was fixed in 9bc52706d283.
msg190874 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-06-09 21:28
There are other inconsistencies. email.quoprimime.decode(), binascii.a2b_qp() and pure Python (by default binascii used) quopri.decodestring() returns different results for following data:

           quoprimime  binascii  quopri

 b'='      ''          b''       b'='
 b'=='     '='         b'='      b'=='
 b'= '     ''          b'= '     b'= '
 b'= \n'   ''          b'= \n'   b''
 b'=\r'    ''          b''       b'=\r'
 b'==41'   '=A'        b'=41'    b'=A'
msg190876 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-06-09 22:51
Most of the variations represent different invalid-input recovery choices.  I believe binascii's decoding of b'= \n' is incorrect, as is its decoding of b'==41'.  quopri's decoding of b'=\r' is arguably incorrect as well, given that python generally supports universal line ends.  Otherwise the decodings are all responses to erroneous input for which the behavior is not specified.

That said, we ought to pick one error recovery scheme and implement it in all places, and IMO it shouldn't be exactly any of the ones we've got.  Or better yet, use one common implementation.  Untangling quopri is on my (too large) List of Things To Do :)
msg190889 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-06-10 06:35
Perl's MIME::QuotedPrint produces same result as pure Python quopri. konwert qp-8bit produces same result as binascii (except '==41' it decodes as '=A').

RFC 2045 says:

"""A
          reasonable approach by a robust implementation might be
          to include the "=" character and the following
          character in the decoded data without any
          transformation and, if possible, indicate to the user
          that proper decoding was not possible at this point in
          the data.
"""
msg197665 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-13 21:13
So what scheme we will picked?
msg197715 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-09-14 15:07
As I said, not exactly any of the above.

I'll get back to this after I finish the new email code (which should happen before the end of the month).  I need to take some time to look over the RFCs and real world examples and come up with the most appropriate rules.
msg290521 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-26 10:13
Ping.
msg290525 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-03-26 12:08
The double equals "==" case for the “quopri” implementation in Python is now consistent with the others thanks to the fix in Issue 23681 (see also Issue 21511).

According to Issue 20121, the quopri (Python) implementation only supports LF (\n) characters as line breaks, and the binascii (C) implementation also supports CRLF. So I agree that the whitespace-before-newline case "= \n" is a genuine bug (see Issue 16473). But the CR case "=\r" is not supported because neither quopri nor binascii support universal newlines or CR line breaks on their own.
msg290533 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-26 15:08
Thus currently the table of discrepancies looks as:

           quoprimime  binascii  quopri

 b'='      ''          b''       b'='
 b'= '     ''          b'= '     b'= '
 b'= \n'   ''          b'= \n'   b''
 b'=\r'    ''          b''       b'=\r'
 b'==41'   '=A'        b'=41'    b'=41'
msg316563 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-05-14 19:41
OK, I've finally gotten around to looking at this.   It looks like quopri and binascii are not stripping trailing whitespace.

              quoprimime  binascii     quopri       preferred

 b'='         ''          b''          b'='         '='
 b'= '        ''          b'= '        b'= '        '='
 b'= \n'      ''          b'= \n'      b''                quoprimime  binascii  quopri

 b'='      ''          b''       b'='
 b'= '     ''          b'= '     b'= '
 b'= \n'   ''          b'= \n'   b''
 b'=\r'    ''          b''       b'=\r'
 b'==41'   '=A'        b'=41'    b'=41'    '=\n'
 b'=\r'       ''          b''          b'=\r'       '=\r'
 b'==41'      '=A'        b'=41'       b'=41'       '=A'
 b'= \n f\n'  ' f\n'      b'= \n f\n'  b'= \n f\n'  ' f\n'

The RFC recommends that a trailing = be preserved, but that trailing whitespace be ignored.  It doesn't speak directly to the ==41 case, but one can infer that the first = in the == pair is most likely to have "come from the source text" and not been encoded, while the =41 was an intentional encoding and so should be decoded.

Now, that said, the actual behavior that our libraries have had for a long time is to treat the "last line" just like all other lines, and strip a trailing =.  So I would be inclined to keep that behavior for backward compatibility reasons rather than change it to be more RFC compliant, given that we don't have any actual bug report related to it, and "fixing" it could break things.  Given that, the current quoprimime behavior becomes the reference.

However, backward compatibility concerns also arise around starting to strip trailing space in quopri and binascii. Maybe we only make that change in 3.8?
msg316622 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-15 05:28
Many thanks David! But sorry, your table confused me. I can't read it. Could you please reformat it?
msg316642 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-05-15 11:59
I should have just deleted the table, actually.

The only important info in it is that per RFC '=', '=\n', and '= \n' all ought to become '='.  But I don't think we should make that change, I think we should continue to turn those into ''.  So I consider the (current!) bwehavior of quoprimime to be the correct behavior.

I also gave the example of '= \n foo\n', to show that quopri and binascii aren't stripping trailing blanks, as Martin noted in the other issue.  They fold lines if they see '=\n', but not if they see '= \n', which is wrong per the (email!) RFC.  I'm not clear if it is wrong for non-email uses of quopric, I haven't tried to research that.
History
Date User Action Args
2018-05-15 11:59:26r.david.murraysetmessages: + msg316642
2018-05-15 05:28:59serhiy.storchakasetmessages: + msg316622
2018-05-14 19:41:13r.david.murraysetmessages: + msg316563
2017-03-26 15:08:41serhiy.storchakasetmessages: + msg290533
2017-03-26 12:08:43martin.pantersetmessages: + msg290525
2017-03-26 10:13:29serhiy.storchakasetmessages: + msg290521
versions: + Python 3.5, Python 3.6, Python 3.7, - Python 3.3, Python 3.4
2015-04-11 21:48:00gvanrossumunlinkissue21511 dependencies
2015-01-17 22:43:15martin.pantersetnosy: + martin.panter
2014-05-16 19:22:01r.david.murraylinkissue21511 dependencies
2013-09-14 15:07:05r.david.murraysetmessages: + msg197715
2013-09-13 21:13:51serhiy.storchakasetmessages: + msg197665
2013-06-10 06:35:36serhiy.storchakasetmessages: + msg190889
2013-06-09 22:51:15r.david.murraysetmessages: + msg190876
2013-06-09 21:28:22serhiy.storchakasetmessages: + msg190874
2013-05-20 14:13:28serhiy.storchakacreate