New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quopri_codec newline handling #64320
Comments
While trying to encode some binary data, I encountered this behaviour of the quopri_codec: >>> '\r\n\n'.encode('quopri_codec').decode('quopri_codec')
'\r\n\r\n'
>>> '\n\r\n'.encode('quopri_codec').decode('quopri_codec')
'\n\n' If this behaviour is really intended, it should be mentioned in the documentation that this coded is not bijective. |
The quopri_codec uses binascii.b2a_qp method. >>> binascii.b2a_qp('\r\n\n\n\n')
'\r\n\r\n\r\n\r\n' The logic in b2a_qp when dealing with newlines is check whether the first line uses \r\n or \n. If it uses \r\n, then all remaning lines' new lines will be converted to \r\n. if it uses \n, then all remaning lines' new lines will be converted to \n. It has comment on the source code.
I am not sure what the appropriate action here. But doc fix should be acceptable. |
RFC 1521 says that a text newline should be encoded as CRLF, and that any combination of 0x0D and 0x0A bytes that do not represent newlines should be encoded like other control characters as =0D and =0A. Since in Python 3 the codec outputs bytes, I don’t think there is any excuse for it to be outputting plain CR or LF bytes. The question is, do they represent newlines to be encoded as CRLF, or just data bytes that need ordinary encoding. |
I agree with Vajrasky: a patch for the documentation would probably be a good idea. Note that mixing line end conventions in a single text is never a good idea. If you stick to one line end convention, there's no problem with the codec, AFAICT. >>> codecs.encode(b'\r\n\r\n', 'quopri_codec')
b'\r\n\r\n'
>>> codecs.decode(_, 'quopri_codec')
b'\r\n\r\n'
>>> codecs.encode(b'\n\n', 'quopri_codec')
b'\n\n'
>>> codecs.decode(_, 'quopri_codec')
b'\n\n' |
Okay so maybe the documentation should include these restrictions on encoding:
|
Pure Python implementation returns different result. >>> import quopri
>>> quopri.encodestring(b'\r\n')
b'\r\n'
>>> quopri.a2b_qp = quopri.b2a_qp = None
>>> quopri.encodestring(b'\r\n')
b'=0D\n' See also bpo-18022. |
Here is a patch that clarifies in the documentation and test suite how newlines work in the “quopri” and “binascii” modules. It also fixes the native Python implementation to support CRLFs.
One corner case concerns me slightly: binascii.b2a_qp(istext=False) will use \n for soft line breaks by default, but will suddenly switch to CRLF if the input data happens to contain a CRLF sequence. This is despite the CRLFs from the data being encoded and therefore not appearing in the output themselves. |
Here is patch v2, which fixes some more bugs I uncovered in the quoted-printable encoders:
|
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: