msg94100 - (view) |
Author: Andreas Thomas (athomas) |
Date: 2009-10-15 18:06 |
The attachment of my sample email contains '123\n', but get_payload
(decode=True) returns '123'.
The source of this error is the function _bdecode from the email.utils
module, which gets called for decoding base64 encoded payloads. Here
the last \n gets stripped away, because (comment in the function):
# We can't quite use base64.encodestring() since it tacks on a "courtesy
# newline". Blech!
I don't think that's true. At least not for Python 2.5.4 and upwards.
Also note the "encodestring" instead of "decodestring".
In fact calling
base64.decodestring(part.get_payload())
returns the correct value, so I propose to remove this function
completely and just use the standard decoding method.
Please note that base64.encodestring really tacks on a \n.
|
msg97570 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-01-11 01:47 |
The behavior you object to is in fact correct per RFC 2046, 5.5.1:
The boundary delimiter MUST occur at the beginning of a line, i.e.,
following a CRLF, and the initial CRLF is considered to be attached
to the boundary delimiter line rather than part of the preceding
part.
So in your example message your attatched file does not in fact have a final newline. Note that you do not get a newline regardless of whether decode is true ('p' is the part extracted from your sample message):
>>> p.get_payload()
'MTIzCg=='
>>> p.get_payload(decode=True)
'123'
|
msg100338 - (view) |
Author: Joaquin Cuenca Abela (Joaquin.Cuenca.Abela) |
Date: 2010-03-03 11:22 |
Hi,
RFC 2046, 5.1.1 refers to the CRLF that happens just before the boundary. It says nothing about an encoded CRLF.
From Andreas example, if you have:
Content-Type: text/plain; name="test.txt"
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="test.txt"
MTIzCg==
--------------040103020004000509010404--
You are correctly eating the CRLF that exists between MTIzCg== and --------------040103020004000509010404--, because it's part of the boundary.
You are also eating the CRLF that is inside the base64 encoded text, and I agree with Andreas that this is incorrect. Will you please consider reopening this bug?
|
msg100343 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-03-03 14:08 |
Ah, I misunderstood, and did not see that the newline in question was inside the base64 string. Thank you for pointing out my mistake.
Would either of you like to propose a patch, including a test case?
(I've removed 2.5 because it is in security-fix-only mode).
|
msg100394 - (view) |
Author: Joaquin Cuenca Abela (Joaquin.Cuenca.Abela) |
Date: 2010-03-04 16:03 |
Hi,
I've never before made a patch to Python, so take it with care.
A couple of comments, I reused a test where all the attachments contained an ending newline, except for the base64 one (conveniently...)
I think the comment in _bdecode that Andreas quoted before refers to base64.encodestring add an extra \n to de encoded string, but base64.decodestring can work with and without this extra \n:
>>> import base64
>>> base64.decodestring('MTIzCg==')
'123\n'
>>> base64.decodestring('MTIzCg==\n')
'123\n'
So it's not necessary to work around this in _bdecode. Let me know if it looks good to you.
|
msg100403 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-03-04 18:01 |
Thanks for working on this.
For the most part your patch looks fine. Two comments: (1) it concerns me that by co-opting the existing test, we are no longer testing that decoding does not introduce a spurious newline :). (2) I think we should add a comment in _bdecode that it used to do more and is retained now only for backward compatibility.
I don't particularly like proliferating test files, and I plan to fix that in email6, but for now I think you should just add a new test file based on msg_10, and a test that uses it.
|
msg100427 - (view) |
Author: Joaquin Cuenca Abela (Joaquin.Cuenca.Abela) |
Date: 2010-03-04 22:22 |
I added a new subpart to msg_10.txt, that keeps the previous test and also tests the new behavior. Let me know if it's ok like this or if you still prefer to create a different msg file for testing this.
Thanks,
|
msg100592 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-03-07 16:58 |
Your patch looks good, thank you.
I just realized that Barry isn't nosy on this issue. I've checked, and the code in question dates back to email version 1.0...code of that long standing that exists specifically to implement the behavior we propose to change makes me wonder what we will break if we fix it. Yet the current behavior seems clearly contrary to the RFCs to me.
Barry? Any memory of why _bdecode exists and what will break if we fix it?
In any case I suspect that backporting this fix might be a bad idea, since existing code may be compensating for the current behavior and break if the behavior is fixed.
|
msg100596 - (view) |
Author: Joaquin Cuenca Abela (Joaquin.Cuenca.Abela) |
Date: 2010-03-07 18:04 |
Unfortunately the only way that I can see to reliably work around this is to bypass entirely get_payload, in this case fixing this bug will not affect people that do that negatively.
Some people may have more control over their attachments and they are unconditionally adding a \n to certain type of attachments. In this case they will get an extra \n after this patch.
If they are doing this, they certainly realized there was a bug, because this is only necessary for base64 encoded attachments and not for any other encoding. In this case I guess it's reasonable to expect that this bug will be fixed one day.
But you're right that the change is risky, an that we (at least I) are not sure why the code is the way it is.
Cheers,
|
msg100614 - (view) |
Author: Barry A. Warsaw (barry) * |
Date: 2010-03-07 21:46 |
Thanks for adding me to the nosy list. Yep, this code is pretty old so it doesn't surprise me that its implementation isn't quite right. Of course, I hate get_payload(decode=True) anyway and hope that goes away in email 6.
Having said that, I don't think this can be changed in Python 2.6. It's pretty common for people to have workarounds for the devil they know and I'd bet that changing this behavior in 2.6.x would break people's code. I'm okay for doing the sensible thing in Python 2.7 (though we should probably bump the email package's micro version).
|
msg100626 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-03-08 02:23 |
OK, patch (with comment tweaks to refer to this issue), and email minor version bump, applied to trunk in r78778. It turns out that bdecode was already deleted in email 5 in py3k. I did port the test in r78780.
Thanks Joaquin Cuenca Abela, and you now have the interesting distinction of being the first name listed in Misc/ACKS (because we've been maintaining it in sorted order by last name....)
|
msg100631 - (view) |
Author: Joaquin Cuenca Abela (Joaquin.Cuenca.Abela) |
Date: 2010-03-08 08:14 |
Thanks David and Barry!
As much as it flatters my ego to be in the first place is MISC/ACK, I have to confess that my family name is "Cuenca Abela", Cuenca is not a middle name.
Cheers,
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:54 | admin | set | github: 51392 |
2010-03-08 08:14:47 | Joaquin.Cuenca.Abela | set | messages:
+ msg100631 |
2010-03-08 02:23:34 | r.david.murray | set | status: open -> closed versions:
- Python 3.2 messages:
+ msg100626
resolution: fixed stage: patch review -> resolved |
2010-03-07 21:46:08 | barry | set | messages:
+ msg100614 |
2010-03-07 18:04:02 | Joaquin.Cuenca.Abela | set | messages:
+ msg100596 |
2010-03-07 16:58:17 | r.david.murray | set | nosy:
+ barry
messages:
+ msg100592 versions:
- Python 2.6, Python 2.5, Python 3.1 |
2010-03-04 22:22:46 | Joaquin.Cuenca.Abela | set | files:
+ b64crlf-2.patch
messages:
+ msg100427 versions:
+ Python 2.5 |
2010-03-04 18:01:59 | r.david.murray | set | messages:
+ msg100403 stage: test needed -> patch review |
2010-03-04 16:03:45 | Joaquin.Cuenca.Abela | set | files:
+ b64crlf.patch keywords:
+ patch messages:
+ msg100394
|
2010-03-03 14:12:06 | r.david.murray | set | versions:
- Python 3.0 |
2010-03-03 14:08:55 | r.david.murray | set | status: closed -> open
assignee: r.david.murray title: get_payload(decode=True) eats last newline -> get_payload(decode=True) eats last newline in base64 encoded payload keywords:
+ easy resolution: not a bug -> (no value) versions:
- Python 2.5 messages:
+ msg100343 stage: resolved -> test needed |
2010-03-03 11:22:56 | Joaquin.Cuenca.Abela | set | nosy:
+ Joaquin.Cuenca.Abela messages:
+ msg100338
|
2010-01-11 01:47:21 | r.david.murray | set | status: open -> closed priority: normal
nosy:
+ r.david.murray messages:
+ msg97570 resolution: not a bug stage: resolved |
2009-10-16 06:47:02 | esam | set | nosy:
+ esam
|
2009-10-15 18:06:47 | athomas | create | |