classification
Title: quopri module differences in quoted-printable text with whitespace
Type: behavior Stage: needs patch
Components: Documentation, email, Tests Versions: Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: aleperalta, barry, berker.peksag, brett.cannon, docs@python, jcea, martin.panter, ncoghlan, python-dev, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2012-11-14 21:22 by aleperalta, last changed 2019-02-24 22:39 by BreamoreBoy.

Files
File name Uploaded Description Edit
test_quopri.diff aleperalta, 2012-11-14 21:22 review
codec-impl.patch martin.panter, 2015-01-19 05:46 Document and test quotetabs=True for quopri-codec review
Messages (13)
msg175593 - (view) Author: Alejandro Javier Peralta Frías (aleperalta) Date: 2012-11-14 21:22
New to python-dev; I grab a beginner tasks "increase test coverage" and I decided to add coverage to this bit of code in the quopri module:

# quopri.py
L138            while n > 0 and line[n-1:n] in b" \t\r": 
L139                n = n-1

As far as I understand to get into that while-loop the line to decode should end in " \t\r\n".

So the I added the following test:

    def test_decodestring_badly_enconded(self):
        e = b"hello     \t\r\n"
        p = b"hello\n"
        s = self.module.decodestring(e)
        self.assertEqual(s, p)

but that only passes when the module doesn't use binascii. In fact I change test_quopri to use support.import_fresh_module to disable binascii and removed a decorator that was used.

The decode text when binascci is used is:

>>> quopri.decodestring("hello \t\r\n")
'hello \t\r\n'

which differs from

>>> quopri.a2b_qp = None
>>> quopri.b2a_qp = None
>>> quopri.decodestring("hello \t\r\n")
'hello\n

And what's the deal with:

>>> import quopri
>>> quopri.encodestring("hello \t\r")
'hello \t\r'
>>> "hello \t\r".encode("quopri")
'hello=20=09\r'
msg175594 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-11-14 21:32
I think I can answer your last question.  There are two quopri algorithms, one where spaces are allowed (message body) and one where they aren't (email headers).

For the rest, I'd have to take a closer look than I have time for right now.
msg175595 - (view) Author: Alejandro Javier Peralta Frías (aleperalta) Date: 2012-11-14 21:35
I think I can answer your last question.  There are two quopri algorithms,
> one where spaces are allowed (message body) and one where they aren't
> (email headers).
>
> OK, thank you. Good to know.
msg179744 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2013-01-11 23:14
Ping.
msg222121 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-07-02 20:11
I'll take this on if I can.  Is binascii available on all platforms, as if it is the quopri code could be simplified slightly along with the test code?
msg222122 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-02 20:26
The first problem is determining the "best" error recovery algorithms by reading through the RFCs and considering use cases.
msg234300 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-01-19 05:46
Three slightly different points here:

1. Decoding trailing whitespace: My understanding is quoted-printable encoding aims to be tolerant of whitespace being added to and removed from the end of encoded lines. So I assume the “binascii” module is wrong to leave trailing whitespace in the decoded output, and the native “quopri” implementation is correct to ignore it.

2. CRLF handling: See Issue 20121. It seems CRLF newlines should be valid, and I have added a patch to that issue to make the native Python implementation handle CRLF newlines.

3. Whitespace encoding: The quopri-codec actually sets quotetabs=True. Here is a patch to document and test that, as well as correct the functions used by other codecs.
msg234304 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-01-19 06:26
Regarding decoding trailing whitespace, <https://tools.ietf.org/html/rfc1521.html#section-5.1> rule #3 says:

“When decoding a Quoted-Printable body, any trailing white space on a line must be deleted, as it will necessarily have been added by intermediate transport agents.”
msg250506 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-12 00:50
Will commit a slightly modified version of my doc patch to 3.4+, since mentioning the wrong functions is confusing. But I think we still need to fix the “binascii” decoding, and have a look at Alejandro’s test suite patch.
msg250508 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-12 01:44
New changeset de82f41d6669 by Martin Panter <vadmium> in branch '3.4':
Issue #16473: Fix byte transform codec documentation; test quotetabs=True
https://hg.python.org/cpython/rev/de82f41d6669

New changeset 28cd11dc2915 by Martin Panter <vadmium> in branch '3.5':
Issue #16473: Merge codecs doc and test from 3.4 into 3.5
https://hg.python.org/cpython/rev/28cd11dc2915

New changeset 3ecb5766ba15 by Martin Panter <vadmium> in branch 'default':
Issue #16473: Merge codecs doc and test from 3.5
https://hg.python.org/cpython/rev/3ecb5766ba15
msg250509 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-12 02:56
New changeset cfb0481c89d7 by Martin Panter <vadmium> in branch '2.7':
Issue #16473: Fix byte transform codec documentation; test quotetabs=True
https://hg.python.org/cpython/rev/cfb0481c89d7
msg250514 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-12 08:13
Mentioned functions are not exact equivalents of codecs. They are preferable way to to obtain the similar (apart from minor details) output.
msg250520 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-12 12:16
The list of functions were added in Issue 17844. I made the change today because I forgot that the listed functions weren’t exactly equivalent when investigating Issue 25075.

Base64-codec encodes to multiple lines, but b64encode() returns the raw encoding without line breaks. I see that base64.encodebytes() is listed as a “legacy interface”, but as far as I can tell nothing outside the legacy interface does any line splitting.

Hex-codec encodes to lowercase, but b16encode() returns uppercase, following RFC 4648.

Quopri-codec encodes all whitespace, but quopri.encodestring() lets most whitespace through verbatim by default. In this case I think it would be reasonable to change back to encodestring() if we say that quotetabs=True is passed in.
History
Date User Action Args
2019-02-24 22:39:40BreamoreBoysetnosy: - BreamoreBoy
2015-09-12 12:16:04martin.pantersetmessages: + msg250520
2015-09-12 08:13:33serhiy.storchakasetnosy: + serhiy.storchaka, ncoghlan
messages: + msg250514
2015-09-12 02:56:31python-devsetmessages: + msg250509
2015-09-12 01:44:43python-devsetnosy: + python-dev
messages: + msg250508
2015-09-12 00:50:28martin.pantersetversions: + Python 2.7, Python 3.4, Python 3.5, Python 3.6, - Python 3.3
nosy: + berker.peksag

messages: + msg250506

type: behavior
stage: needs patch
2015-07-23 01:54:38martin.panterlinkissue20132 dependencies
2015-01-19 06:26:53martin.pantersetmessages: + msg234304
2015-01-19 05:46:36martin.pantersetfiles: + codec-impl.patch

assignee: docs@python
components: + Documentation
title: quopri module minor difference in decoding quoted-printable text -> quopri module differences in quoted-printable text with whitespace
nosy: + docs@python, martin.panter

messages: + msg234300
2014-07-02 20:26:57r.david.murraysetmessages: + msg222122
2014-07-02 20:11:58BreamoreBoysetnosy: + BreamoreBoy

messages: + msg222121
title: Minor difference in decoding quoted-printable text -> quopri module minor difference in decoding quoted-printable text
2013-01-11 23:14:44jceasetmessages: + msg179744
2012-11-14 22:00:37jceasetnosy: + jcea
2012-11-14 21:35:10aleperaltasetmessages: + msg175595
2012-11-14 21:32:36r.david.murraysetnosy: + barry, r.david.murray
messages: + msg175594
components: + email
2012-11-14 21:22:57aleperaltasetnosy: + brett.cannon
2012-11-14 21:22:06aleperaltacreate