Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add decode_header_as_string method to email.utils #50551

Closed
bitdancer opened this issue Jun 18, 2009 · 13 comments
Closed

Add decode_header_as_string method to email.utils #50551

bitdancer opened this issue Jun 18, 2009 · 13 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@bitdancer
Copy link
Member

BPO 6302
Nosy @warsaw, @pitrou, @bitdancer
Dependencies
  • bpo-4661: email.parser: impossible to read messages encoded in a different encoding
  • Files
  • decode_header.patch
  • decode_header.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/bitdancer'
    closed_at = <Date 2012-05-16.01:18:13.427>
    created_at = <Date 2009-06-18.01:36:14.728>
    labels = ['type-feature', 'library']
    title = 'Add decode_header_as_string method to email.utils'
    updated_at = <Date 2012-05-16.01:18:13.426>
    user = 'https://github.com/bitdancer'

    bugs.python.org fields:

    activity = <Date 2012-05-16.01:18:13.426>
    actor = 'r.david.murray'
    assignee = 'r.david.murray'
    closed = True
    closed_date = <Date 2012-05-16.01:18:13.427>
    closer = 'r.david.murray'
    components = ['Library (Lib)']
    creation = <Date 2009-06-18.01:36:14.728>
    creator = 'r.david.murray'
    dependencies = ['4661']
    files = ['19115', '19118']
    hgrepos = []
    issue_num = 6302
    keywords = ['patch']
    message_count = 13.0
    messages = ['89488', '109900', '109926', '117906', '117910', '117912', '117913', '117956', '117957', '117958', '117959', '123889', '160791']
    nosy_count = 3.0
    nosy_names = ['barry', 'pitrou', 'r.david.murray']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = 'test needed'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue6302'
    versions = ['Python 3.3']

    @bitdancer
    Copy link
    Member Author

    decode_header only accepts str as input. If the input contains no
    encoded words, the output is str (ie: the input string) and None. If it
    does contain encoded words, the output is pairs of bytes plus str
    charset identifiers (or None). This makes it difficult to use this
    function to decode headers, since one has to test for the special case
    of str output when there are no encoded words.

    I think decode_header should take bytes as input, and output (bytes.
    str) pairs consistently.

    In any case, the documentation is wrong since it says it returns
    strings. The example is also wrong, since the actual output is:

    [(b'p\xf6stal', 'iso-8859-1')]

    @bitdancer bitdancer added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Jun 18, 2009
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 10, 2010

    Anyone got any comments to make on this? Should 2.7 also be included?

    @bitdancer
    Copy link
    Member Author

    No, this is a 3.x only problem. And my main comment is that decode_headers ought to go away as an API :)

    But I'll try to fix the inconsistent data types problem before 3.2.

    @bitdancer bitdancer self-assigned this Jul 10, 2010
    @bitdancer
    Copy link
    Member Author

    Here is a patch that makes the output consistently (bytes, string) pairs. This is definitely a potential backward compatibility issue, but in general code which compensates for the old behavior should work fine with the new behavior, since it was always possible to get a (bytes, None) tuple back as a result, so most code should be handling that case correctly already.

    IMO this change is nevertheless worthwhile; especially since if the patch in bpo-4661 is accepted decode_header can be enhanced so that it will provide a way to obtain the bytes version of a header containing (RFC invalid) non-ASCII bytes.

    Note that this breaks one of the tests in nttplib, so backward compatibility really is an issue, unfortunately. I think nttplib's use case can be satisfied via the bpo-4661 patch coupled with the decode_header bytes-recovery enhancement.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 3, 2010

    I think nttplib's use case can be satisfied via the bpo-4661 patch
    coupled with the decode_header bytes-recovery enhancement.

    I don't really understand how that could.
    nntplib needs to "decode" (in the decode_header sense) headers containing unescaped as well as escaped non-ASCII chars. Encoding with "ascii" clearly breaks the first use case.

    Since the decode_header() API is so silly to begin with, I'd suggest providing another higher-level API instead. One which takes an str and returns another str (not bytes!) instead of that list of tuples.

    If you really want to "fix" the current decode_header(), I'd recommend adding an optional "encoding" parameter so that nntplib can give utf-8 instead of ascii. nntplib and other consumers will still have to decode back in order to get an str, which makes the whole encoding thing a bit useless.

    Oh, and instead of None, it would be nicer to give the actual encoding (e.g. "ascii"). It's no fun trying to guess.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 3, 2010

    Here is a proposal for decode_header_as_string().

    @bitdancer
    Copy link
    Member Author

    Yes, that was a late night post and as I was falling asleep I realized that I was wrong.

    Certainly decode_header_as_string is a function most people using the email package will want and will re-implement in one form or another, so I think it is a good idea to add it. I will take a look at the patch later. And with such a function added we can leave decode_header alone for backward compatibility.

    @warsaw
    Copy link
    Member

    warsaw commented Oct 4, 2010

    In email6, can we at least make tuple returning methods return namedtuples instead?

    @warsaw
    Copy link
    Member

    warsaw commented Oct 4, 2010

    I agree that it makes sense to have consistent types in the output. As for whether to add a new method or fix the existing one, I'm a bit torn, but I'd probably opt for fixing the existing function rather than adding a new one, just because I think there are few Python 3 applications out there that are counting on the old behavior. (I could be wrong though ;).

    @pitrou
    Copy link
    Member

    pitrou commented Oct 4, 2010

    I agree that it makes sense to have consistent types in the output.
    As for whether to add a new method or fix the existing one, I'm a bit
    torn, but I'd probably opt for fixing the existing function rather
    than adding a new one, just because I think there are few Python 3
    applications out there that are counting on the old behavior. (I
    could be wrong though ;).

    The point of a new method is to return the header as a human-readable
    string, rather than a list of tuples. It has added value besides leaving
    the old method alone ;)

    @warsaw
    Copy link
    Member

    warsaw commented Oct 4, 2010

    The point of a new method is to return the header as a human-readable
    string, rather than a list of tuples. It has added value besides
    leaving the old method alone ;)

    +1 then! :)

    @bitdancer
    Copy link
    Member Author

    Drat, missed this one when I was reviewing my issues for feature requests because I didn't change the type :(

    @bitdancer bitdancer changed the title email.header.decode_header data types are inconsistent and incorrectly documented Add decode_header_as_string method to email.utils Dec 13, 2010
    @bitdancer bitdancer added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Dec 13, 2010
    @bitdancer
    Copy link
    Member Author

    All of this is going to be fixed a different (better :) way in an upcoming patch that will add a new header parsing/folding policy to the email package.

    For the record, the way to spell the "decode a header and return a string" function using the existing API is:

      str(make_header(decode_header(<someheader>)))

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants