quopri_codec newline handling #64320

fredstober · 2014-01-04T11:54:51Z

BPO	20121
Nosy	@malemburg, @bitdancer, @vadmium, @serhiy-storchaka, @vajrasky
Files	quopri-newline.patch quopri-newline.v2.patch: Also fix line break issues

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2014-01-04.11:54:50.808>
labels = ['type-bug', 'library', 'docs']
title = 'quopri_codec newline handling'
updated_at = <Date 2015-01-20.04:44:00.458>
user = 'https://bugs.python.org/fredstober'

bugs.python.org fields:

activity = <Date 2015-01-20.04:44:00.458>
actor = 'martin.panter'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation', 'Library (Lib)']
creation = <Date 2014-01-04.11:54:50.808>
creator = 'fredstober'
dependencies = []
files = ['37756', '37783']
hgrepos = []
issue_num = 20121
keywords = ['patch']
message_count = 8.0
messages = ['207281', '207413', '232812', '232814', '232826', '232827', '234223', '234343']
nosy_count = 7.0
nosy_names = ['lemburg', 'r.david.murray', 'docs@python', 'martin.panter', 'serhiy.storchaka', 'vajrasky', 'fredstober']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue20121'
versions = ['Python 2.7', 'Python 3.4']

fredstober · 2014-01-04T11:54:51Z

While trying to encode some binary data, I encountered this behaviour of the quopri_codec:

>>> '\r\n\n'.encode('quopri_codec').decode('quopri_codec')
'\r\n\r\n'
>>> '\n\r\n'.encode('quopri_codec').decode('quopri_codec')
'\n\n'

If this behaviour is really intended, it should be mentioned in the documentation that this coded is not bijective.

vajrasky · 2014-01-06T07:24:25Z

The quopri_codec uses binascii.b2a_qp method.

>>> binascii.b2a_qp('\r\n\n\n\n')
'\r\n\r\n\r\n\r\n'

The logic in b2a_qp when dealing with newlines is check whether the first line uses \r\n or \n.

If it uses \r\n, then all remaning lines' new lines will be converted to \r\n. if it uses \n, then all remaning lines' new lines will be converted to \n.

It has comment on the source code.

/* See if this string is using CRLF line ends */
/* XXX: this function has the side effect of converting all of
 * the end of lines to be the same depending on this detection
 * here */

I am not sure what the appropriate action here. But doc fix should be acceptable.

vadmium · 2014-12-17T11:52:24Z

RFC 1521 says that a text newline should be encoded as CRLF, and that any combination of 0x0D and 0x0A bytes that do not represent newlines should be encoded like other control characters as =0D and =0A.

Since in Python 3 the codec outputs bytes, I don’t think there is any excuse for it to be outputting plain CR or LF bytes. The question is, do they represent newlines to be encoded as CRLF, or just data bytes that need ordinary encoding.

malemburg · 2014-12-17T12:26:48Z

I agree with Vajrasky: a patch for the documentation would probably be a good idea.

Note that mixing line end conventions in a single text is never a good idea. If you stick to one line end convention, there's no problem with the codec, AFAICT.

>>> codecs.encode(b'\r\n\r\n', 'quopri_codec')
b'\r\n\r\n'
>>> codecs.decode(_, 'quopri_codec')
b'\r\n\r\n'
>>> codecs.encode(b'\n\n', 'quopri_codec')
b'\n\n'
>>> codecs.decode(_, 'quopri_codec')
b'\n\n'

vadmium · 2014-12-17T20:11:15Z

Okay so maybe the documentation should include these restrictions on encoding:

The data being encoded should only include \r or \n bytes that are part of \n or \r\n newline sequences. Encoding arbitrary non-text data is not supported.
The two kinds of newlines should not be mixed
If \n is used for newlines in the input, the encoder will output \n newlines, and they will need converting to CRLF in a later step to conform to the RFC

serhiy-storchaka · 2014-12-17T20:50:36Z

Pure Python implementation returns different result.

>>> import quopri
>>> quopri.encodestring(b'\r\n')
b'\r\n'
>>> quopri.a2b_qp = quopri.b2a_qp = None
>>> quopri.encodestring(b'\r\n')
b'=0D\n'

See also bpo-18022.

vadmium · 2015-01-18T06:32:33Z

Here is a patch that clarifies in the documentation and test suite how newlines work in the “quopri” and “binascii” modules. It also fixes the native Python implementation to support CRLFs.

\n is used by default (e.g. for soft line breaks if the input has no hard line breaks)
CRLF is used instead if found in input (even in non-text mode!)
Typo errors in documentation
quopri uses istext=True
header flag does not affect newline encoding; only istext affects it

One corner case concerns me slightly: binascii.b2a_qp(istext=False) will use \n for soft line breaks by default, but will suddenly switch to CRLF if the input data happens to contain a CRLF sequence. This is despite the CRLFs from the data being encoded and therefore not appearing in the output themselves.

vadmium · 2015-01-20T04:44:00Z

Here is patch v2, which fixes some more bugs I uncovered in the quoted-printable encoders:

The binascii version would unnecessarily break a 76-character line (maximum length) if it would end with an =XX escape code
The native Python version would insert soft line breaks in the middle of =XX escape codes

fredstober mannequin added the stdlib Python modules in the Lib dir label Jan 4, 2014

vadmium added the docs Documentation in the Doc dir label Jan 18, 2015

vadmium assigned docspython Jan 18, 2015

vadmium added the type-bug An unexpected behavior, bug, or error label Jan 20, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quopri_codec newline handling #64320

quopri_codec newline handling #64320

fredstober mannequin commented Jan 4, 2014

fredstober mannequin commented Jan 4, 2014

vajrasky mannequin commented Jan 6, 2014

vadmium commented Dec 17, 2014

malemburg commented Dec 17, 2014

vadmium commented Dec 17, 2014

serhiy-storchaka commented Dec 17, 2014

vadmium commented Jan 18, 2015

vadmium commented Jan 20, 2015

quopri_codec newline handling #64320

quopri_codec newline handling #64320

Comments

fredstober mannequin commented Jan 4, 2014

fredstober mannequin commented Jan 4, 2014

vajrasky mannequin commented Jan 6, 2014

vadmium commented Dec 17, 2014

malemburg commented Dec 17, 2014

vadmium commented Dec 17, 2014

serhiy-storchaka commented Dec 17, 2014

vadmium commented Jan 18, 2015

vadmium commented Jan 20, 2015