Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header.decode_header eats up spaces #43184

Closed
mgoutell mannequin opened this issue Apr 10, 2006 · 10 comments
Closed

Header.decode_header eats up spaces #43184

mgoutell mannequin opened this issue Apr 10, 2006 · 10 comments
Labels
topic-email type-bug An unexpected behavior, bug, or error

Comments

@mgoutell
Copy link
Mannequin

mgoutell mannequin commented Apr 10, 2006

BPO 1467619
Nosy @warsaw, @birkenfeld, @bitdancer
Superseder
  • bpo-1079: decode_header does not follow RFC 2047
  • Files
  • emailheader.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-06-03.16:14:50.396>
    created_at = <Date 2006-04-10.10:33:54.000>
    labels = ['type-bug', 'expert-email']
    title = 'Header.decode_header eats up spaces'
    updated_at = <Date 2012-06-03.16:14:50.394>
    user = 'https://bugs.python.org/mgoutell'

    bugs.python.org fields:

    activity = <Date 2012-06-03.16:14:50.394>
    actor = 'r.david.murray'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-06-03.16:14:50.396>
    closer = 'r.david.murray'
    components = ['email']
    creation = <Date 2006-04-10.10:33:54.000>
    creator = 'mgoutell'
    dependencies = []
    files = ['1963']
    hgrepos = []
    issue_num = 1467619
    keywords = []
    message_count = 10.0
    messages = ['28181', '28182', '28183', '28184', '28185', '114651', '114722', '150459', '150470', '162219']
    nosy_count = 7.0
    nosy_names = ['barry', 'georg.brandl', 'alexanderweb', 'mgoutell', 'r.david.murray', 'BreamoreBoy', 'runtux']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '1079'
    type = 'behavior'
    url = 'https://bugs.python.org/issue1467619'
    versions = ['Python 3.3']

    @mgoutell
    Copy link
    Mannequin Author

    mgoutell mannequin commented Apr 10, 2006

    The Header.decode_header function eats up spaces in
    non-encoded part of a header.

    See the following source:
    # -- coding: iso-8859-1 --
    from email.Header import Header, decode_header
    h = Header('Essai ', None)
    h.append('éè', 'iso-8859-1')
    print h
    print decode_header(h)

    This prints:
    Essai =?iso-8859-1?q?=E9=E8?=
    [('Test', None), ('\xe9\xe8', 'iso-8859-1')]

    This should print:
    Essai =?iso-8859-1?q?=E9=E8?=
    [('Test ', None), ('\xe9\xe8', 'iso-8859-1')]
    ^ This space disappears

    This appears in Python 2.3 but the source code of the
    function didn't change in 2.4 so the same problem
    should still exist. Bug "[ 1372770 ] email.Header
    should preserve original FWS" may be linked to that one
    although I'm not sure this is exactly the same.

    This patch (not extensively tested though) seems to
    solve this problem:

    --- /usr/lib/python2.3/email/Header.py  2005-09-05
    00:20:03.000000000 +0200
    +++ Header.py   2006-04-10 12:27:27.000000000 +0200
    @@ -90,7 +90,7 @@
                 continue
             parts = ecre.split(line)
             while parts:
    -            unenc = parts.pop(0).strip()
    +            unenc = parts.pop(0).rstrip()
                 if unenc:
                     # Should we continue a long line?
                     if decoded and decoded[-1][1] is None:

    @mgoutell mgoutell mannequin assigned warsaw Apr 10, 2006
    @mgoutell mgoutell mannequin added the stdlib Python modules in the Lib dir label Apr 10, 2006
    @mgoutell mgoutell mannequin assigned warsaw Apr 10, 2006
    @mgoutell mgoutell mannequin added the stdlib Python modules in the Lib dir label Apr 10, 2006
    @alexanderweb
    Copy link
    Mannequin

    alexanderweb mannequin commented May 12, 2006

    Logged In: YES
    user_id=254738

    I can confirm this bug and have been bitten by it as well.

    @mgoutell
    Copy link
    Mannequin Author

    mgoutell mannequin commented May 16, 2007

    Hello,
    Any news about this bug. It seems still there in 2.5 after a one year notice...
    Regards,

    @birkenfeld
    Copy link
    Member

    I propose the attached patch. RFC 2047 specifies to ignore whitespace between encoded-words, but IMHO not between ordinary text and encoded-words.
    File Added: emailheader.diff

    @warsaw
    Copy link
    Member

    warsaw commented May 16, 2007

    IIRC, I tried the OP's patch and it broke too many of the email package's test suite. I made an attempt at fixing the problem to be much more RFC compliant, but couldn't get the test suite to pass completely. This points to a much deeper problem with email package header management. I don't think the problem is a bug, I think it's a design flaw.

    @devdanzin devdanzin mannequin added type-bug An unexpected behavior, bug, or error labels Mar 21, 2009
    @warsaw warsaw assigned bitdancer and unassigned warsaw May 5, 2010
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Aug 22, 2010

    Would someone like to comment on Georg's patch.

    @bitdancer
    Copy link
    Member

    Georg's patch no longer applies to py3k. I ported it, but the result is not functional. It causes extra spaces during header generation, because it is there that email4/5 "deals" with "ignoring" spaces between encoded words by *adding* spaces when they are adjacent to non-encoded words. (In general email4/5 compresses runs of whitespace into single spaces.) I tried fixing that, but then ran in to the fact that header parsing/generation currently depends on the whitespace compression in order to handle the header folding cases. So, the logic used for header parsing and generation in emai5 does not allow for an easy patch to fix this bug. I'm deferring it to email6, where I an rewriting the header parser/generator.

    @runtux
    Copy link
    Mannequin

    runtux mannequin commented Jan 2, 2012

    I've been bitten by this too (in python up to 2.7 in roundup the bug-tracker). We're currently using a workaround that re-inserts spaces, see git on roundup.sourceforge.net file mailgw.py method _decode_header_to_utf8

    RFC2047 even has a test-case at the end, it specifies:

    encoded form displayed as
    (=?ISO-8859-1?Q?a?= b) (a b)

    note the space between 'a' and 'b' above. Spaces between non-encoded and encoded parts should be preserved. And it's probably a good idea to put the examples from the RFC into the regression test.

    @bitdancer
    Copy link
    Member

    Antoine, I marked this for Python 3.3 only because there is no good way to fix it in 2.7/3.2. (If someone comes up with a way I'll be happy to review it, though!)

    @bitdancer bitdancer added topic-email and removed stdlib Python modules in the Lib dir labels May 24, 2012
    @bitdancer bitdancer removed their assignment May 24, 2012
    @bitdancer bitdancer added topic-email and removed stdlib Python modules in the Lib dir labels May 24, 2012
    @bitdancer bitdancer removed their assignment May 24, 2012
    @bitdancer
    Copy link
    Member

    This is fixed by the fix in bpo-1079. Ralf found a *relatively* backward compatible way to fix it, but since the point is preserving whitespace that wasn't preserved before, there is an unavoidable behavior change, so it can't be backported.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-email type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants