Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quopri_codec newline handling #64320

Open
fredstober mannequin opened this issue Jan 4, 2014 · 8 comments
Open

quopri_codec newline handling #64320

fredstober mannequin opened this issue Jan 4, 2014 · 8 comments
Labels
docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@fredstober
Copy link
Mannequin

fredstober mannequin commented Jan 4, 2014

BPO 20121
Nosy @malemburg, @bitdancer, @vadmium, @serhiy-storchaka, @vajrasky
Files
  • quopri-newline.patch
  • quopri-newline.v2.patch: Also fix line break issues
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2014-01-04.11:54:50.808>
    labels = ['type-bug', 'library', 'docs']
    title = 'quopri_codec newline handling'
    updated_at = <Date 2015-01-20.04:44:00.458>
    user = 'https://bugs.python.org/fredstober'

    bugs.python.org fields:

    activity = <Date 2015-01-20.04:44:00.458>
    actor = 'martin.panter'
    assignee = 'docs@python'
    closed = False
    closed_date = None
    closer = None
    components = ['Documentation', 'Library (Lib)']
    creation = <Date 2014-01-04.11:54:50.808>
    creator = 'fredstober'
    dependencies = []
    files = ['37756', '37783']
    hgrepos = []
    issue_num = 20121
    keywords = ['patch']
    message_count = 8.0
    messages = ['207281', '207413', '232812', '232814', '232826', '232827', '234223', '234343']
    nosy_count = 7.0
    nosy_names = ['lemburg', 'r.david.murray', 'docs@python', 'martin.panter', 'serhiy.storchaka', 'vajrasky', 'fredstober']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue20121'
    versions = ['Python 2.7', 'Python 3.4']

    @fredstober
    Copy link
    Mannequin Author

    fredstober mannequin commented Jan 4, 2014

    While trying to encode some binary data, I encountered this behaviour of the quopri_codec:

    >>> '\r\n\n'.encode('quopri_codec').decode('quopri_codec')
    '\r\n\r\n'
    >>> '\n\r\n'.encode('quopri_codec').decode('quopri_codec')
    '\n\n'

    If this behaviour is really intended, it should be mentioned in the documentation that this coded is not bijective.

    @fredstober fredstober mannequin added the stdlib Python modules in the Lib dir label Jan 4, 2014
    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Jan 6, 2014

    The quopri_codec uses binascii.b2a_qp method.

    >>> binascii.b2a_qp('\r\n\n\n\n')
    '\r\n\r\n\r\n\r\n'

    The logic in b2a_qp when dealing with newlines is check whether the first line uses \r\n or \n.

    If it uses \r\n, then all remaning lines' new lines will be converted to \r\n. if it uses \n, then all remaning lines' new lines will be converted to \n.

    It has comment on the source code.

    /* See if this string is using CRLF line ends */
    /* XXX: this function has the side effect of converting all of
     * the end of lines to be the same depending on this detection
     * here */
    

    I am not sure what the appropriate action here. But doc fix should be acceptable.

    @vadmium
    Copy link
    Member

    vadmium commented Dec 17, 2014

    RFC 1521 says that a text newline should be encoded as CRLF, and that any combination of 0x0D and 0x0A bytes that do not represent newlines should be encoded like other control characters as =0D and =0A.

    Since in Python 3 the codec outputs bytes, I don’t think there is any excuse for it to be outputting plain CR or LF bytes. The question is, do they represent newlines to be encoded as CRLF, or just data bytes that need ordinary encoding.

    @malemburg
    Copy link
    Member

    I agree with Vajrasky: a patch for the documentation would probably be a good idea.

    Note that mixing line end conventions in a single text is never a good idea. If you stick to one line end convention, there's no problem with the codec, AFAICT.

    >>> codecs.encode(b'\r\n\r\n', 'quopri_codec')
    b'\r\n\r\n'
    >>> codecs.decode(_, 'quopri_codec')
    b'\r\n\r\n'
    >>> codecs.encode(b'\n\n', 'quopri_codec')
    b'\n\n'
    >>> codecs.decode(_, 'quopri_codec')
    b'\n\n'

    @vadmium
    Copy link
    Member

    vadmium commented Dec 17, 2014

    Okay so maybe the documentation should include these restrictions on encoding:

    • The data being encoded should only include \r or \n bytes that are part of \n or \r\n newline sequences. Encoding arbitrary non-text data is not supported.
    • The two kinds of newlines should not be mixed
    • If \n is used for newlines in the input, the encoder will output \n newlines, and they will need converting to CRLF in a later step to conform to the RFC

    @serhiy-storchaka
    Copy link
    Member

    Pure Python implementation returns different result.

    >>> import quopri
    >>> quopri.encodestring(b'\r\n')
    b'\r\n'
    >>> quopri.a2b_qp = quopri.b2a_qp = None
    >>> quopri.encodestring(b'\r\n')
    b'=0D\n'

    See also bpo-18022.

    @vadmium
    Copy link
    Member

    vadmium commented Jan 18, 2015

    Here is a patch that clarifies in the documentation and test suite how newlines work in the “quopri” and “binascii” modules. It also fixes the native Python implementation to support CRLFs.

    • \n is used by default (e.g. for soft line breaks if the input has no hard line breaks)
    • CRLF is used instead if found in input (even in non-text mode!)
    • Typo errors in documentation
    • quopri uses istext=True
    • header flag does not affect newline encoding; only istext affects it

    One corner case concerns me slightly: binascii.b2a_qp(istext=False) will use \n for soft line breaks by default, but will suddenly switch to CRLF if the input data happens to contain a CRLF sequence. This is despite the CRLFs from the data being encoded and therefore not appearing in the output themselves.

    @vadmium vadmium added the docs Documentation in the Doc dir label Jan 18, 2015
    @vadmium
    Copy link
    Member

    vadmium commented Jan 20, 2015

    Here is patch v2, which fixes some more bugs I uncovered in the quoted-printable encoders:

    • The binascii version would unnecessarily break a 76-character line (maximum length) if it would end with an =XX escape code
    • The native Python version would insert soft line breaks in the middle of =XX escape codes

    @vadmium vadmium added the type-bug An unexpected behavior, bug, or error label Jan 20, 2015
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    Development

    No branches or pull requests

    3 participants