Issue20747
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014-02-23 17:59 by rednaw, last changed 2022-04-11 14:57 by admin.
Messages (10) | |||
---|---|---|---|
msg212011 - (view) | Author: Rik (rednaw) | Date: 2014-02-23 17:59 | |
If you look at the `header_encode` method in the `Charset` class in `email.charset`, you'll see that depending on the `header_encoding` that is set on the `Charset` instance, it will either encode it using base64 or quoted-printable (QP): http://hg.python.org/cpython/file/3a1db0d2747e/Lib/email/charset.py#l351 However, QP always uses `maxlinelen=None` and base64 doesn't. This results in the following behaviour: - If you use base64 encoding and your header size is longer than the default `maxlinelen`, it will be split over multiple lines. - If you use QP encoding with the same header it doesn't get split over multiple lines. You can easily test it with this snippet: from email.charset import Charset, BASE64, QP header = ( 'tejkstj tlkjes takldjf aseio neaoiflk asnfoieas nflkdan foeias ' 'naskln ioeasn kldan flkansoie naslk dnaslk fndaslk fneoisaf ' 'neklasn dfklasnf oiasenf lkadsn lkfanldk fas dfknaioe nas' ) charset = Charset('utf-8') charset.header_encoding = BASE64 print 'BASE64:' print charset.header_encode(header) charset.header_encoding = QP print 'QP:' print charset.header_encode(header) Which will output: BASE64: =?utf-8?b?dGVqa3N0aiB0bGtqZXMgdGFrbGRqZiBhc2VpbyBuZWFvaWZsayBhc25mb2llYXMg?= =?utf-8?b?bmZsa2RhbiBmb2VpYXMgbmFza2xuIGlvZWFzbiBrbGRhbiBmbGthbnNvaWUgbmFz?= =?utf-8?b?bGsgZG5hc2xrIGZuZGFzbGsgZm5lb2lzYWYgbmVrbGFzbiBkZmtsYXNuZiBvaWFz?= =?utf-8?b?ZW5mIGxrYWRzbiBsa2ZhbmxkayBmYXMgZGZrbmFpb2UgbmFz?= QP: =?utf-8?q?tejkstj_tlkjes_takldjf_aseio_neaoiflk_asnfoieas_nflkdan_foeias_naskln_ioeasn_kldan_flkansoie_naslk_dnaslk_fndaslk_fneoisaf_neklasn_dfklasnf_oiasenf_lkadsn_lkfanldk_fas_dfknaioe_nas?= This is inconsistent behavior. Aside from that, I think the `header_encode` method should accept an argument `maxlinelen` that defaults to an appropriate value (probably 76), but which you can overwrite on free will. This is (I think) also necessary because the `Header` class in `email.header` has a `maxlinelen` attribute that is used for the same purpose. Normally this works fine, but when you specified a charset for your header, it uses the `Charset` class and the `maxlinelen` is lost. This is happening here: http://hg.python.org/cpython/file/3a1db0d2747e/Lib/email/header.py#l368 You see, the `_encode_chunks` takes the `maxlinelen` argument but doesn't pass it on to the `header_encode` method of `charset` (which is a `Charset` instance). As such, you can see this issue in action with the following snippet: from email.header import Header maxlinelen = 9999999 print 'No charset:' print Header( u'asdfjk lasjdf sajdfl ajsdfaj sdlkfjas kfladjs flkajsdflk jsadklf jadslkfj adslkfj asdlkjf lksadjfkldas jfkldasj fkadsj fladsjf kladsjfk asdjfkldasasd kfaj kfladsj fkadsjf asdf ', maxlinelen=maxlinelen ).encode() print 'Charset with special characters:' print Header( u'attachment; filename="ajdsklfj klasdjfkl asdjfkl jadsfja sdflkads fad fads adsf dasjfkl jadslkfj dlasf asd \u6211\u6211\u6211 jo \u6211\u6211 jo \u6211\u6211"', charset='utf-8', maxlinelen=9999999 ).encode() Which will output: No charset: asdfjk lasjdf sajdfl ajsdfaj sdlkfjas kfladjs flkajsdflk jsadklf jadslkfj adslkfj asdlkjf lksadjfkldas jfkldasj fkadsj fladsjf kladsjfk asdjfkldasasd kfaj kfladsj fkadsjf asdf Charset with special characters: =?utf-8?b?YXR0YWNobWVudDsgZmlsZW5hbWU9ImFqZHNrbGZqIGtsYXNkamZrbCBhc2RqZmts?= =?utf-8?b?IGphZHNmamEgc2RmbGthZHMgZmFkIGZhZHMgYWRzZiBkYXNqZmtsIGphZHNsa2Zq?= =?utf-8?b?IGRsYXNmIGFzZCDmiJHmiJHmiJEgam8g5oiR5oiRIGpvIOaIkeaIkSI=?= This is currently an issue we're experiencing in Django, see our issue in the issue tracker: https://code.djangoproject.com/ticket/20889#comment:4 |
|||
msg212045 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2014-02-23 23:47 | |
The line wrapping is done by Header, not header_encode. The bug appears to be that maxlinelen=None is not passed to base64mime's header_encode the way it is to quoprimime's header_encode...and that base64mime doesn't handle a maxlinelen of None. Using maxlinelen=9999999 in the base64mime.header_encode calll, your base64 example also results in a single line header. This should be fixed. It does not affect python3, which uses a different folding algorithm. |
|||
msg212072 - (view) | Author: Rik (rednaw) | Date: 2014-02-24 09:02 | |
Line wrapping is indeed done by `Header`, but why do `base64mime` and `quoprimime` then have their own line wrapping? I assume so that you can also use them independently. So that's why I would think `Charset.header_encode` should also accept a `maxlinelen` so that you can use `Charset` independently too. |
|||
msg212093 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2014-02-24 13:26 | |
I've no clue, to tell you the truth. Those APIs evolved long before I took over email package maintenance. And since we are talking about 2.7, we can't change the existing API. In Python3, Charset.header_encode will as of 3.5 become a legacy interface, so there's not much point in changing it there either, although it is not out of the question if there is a use case. |
|||
msg212098 - (view) | Author: Rik (rednaw) | Date: 2014-02-24 13:46 | |
Ok, so you suggest to use `maxlinelen=None` for the `base64mime.header_encode` which will act the same as giving `maxlinelen=None` to `email.quoprimime`, so that we don't need to change the API? And this change would then also be reflected in the Python 3.5 legacy interface? |
|||
msg212101 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2014-02-24 14:28 | |
Well, we have to make base64mime.header_encode also handle a None value...so perhaps instead we should just use 10000, which is what the Header wrapping code in python3 does. Python3's Header doesn't have this bug. |
|||
msg212107 - (view) | Author: Rik (rednaw) | Date: 2014-02-24 15:32 | |
Ok, do you think there's any risk in making `base64mime.header_encode` handle `maxlinelen=None`? I think it would be more consistent if `base64mime.header_encode` and `quoprimime.header_encode` interpret their arguments similarly. |
|||
msg212111 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2014-02-24 16:06 | |
Well, there's the usual API change risk: something that works on 2.7.x doesn't work on 2.7.x-1. So since we can fix the bug without making the API change, I think we should. |
|||
msg212112 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2014-02-24 16:09 | |
That wasn't clear. By "something that works" I mean exactly what you are talking about: someone writing code using these functions would naturally try to use None with base64mime, and if we make it work, that would work fine in 2.7.x, but mysteriously break if run on an earlier version of 2.7. So instead we force the author of new code to use a non-None value that will in fact work in previous versions of 2.7. |
|||
msg376145 - (view) | Author: Zackery Spytz (ZackerySpytz) * ![]() |
Date: 2020-08-31 09:59 | |
Python 2.7 is no longer supported, so I think this issue should be closed. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:59 | admin | set | github: 64946 |
2020-08-31 09:59:28 | ZackerySpytz | set | nosy:
+ ZackerySpytz messages: + msg376145 |
2014-02-24 16:09:08 | r.david.murray | set | messages: + msg212112 |
2014-02-24 16:06:48 | r.david.murray | set | messages: + msg212111 |
2014-02-24 15:32:17 | rednaw | set | messages: + msg212107 |
2014-02-24 14:28:42 | r.david.murray | set | messages: + msg212101 |
2014-02-24 13:46:47 | rednaw | set | messages: + msg212098 |
2014-02-24 13:26:05 | r.david.murray | set | messages: + msg212093 |
2014-02-24 09:02:34 | rednaw | set | messages: + msg212072 |
2014-02-23 23:47:24 | r.david.murray | set | messages: + msg212045 |
2014-02-23 17:59:37 | rednaw | create |