This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: email: collapse_rfc2231_value has inconsistent unquoting
Type: behavior Stage:
Components: email Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Eric Lafontaine, barry, bpoaugust, r.david.murray
Priority: normal Keywords:

Created on 2016-12-20 00:30 by bpoaugust, last changed 2022-04-11 14:58 by admin.

Messages (5)
msg283658 - (view) Author: (bpoaugust) Date: 2016-12-20 00:30
collapse_rfc2231_value unquotes the value before returning it except here:

rawbytes = bytes(text, 'raw-unicode-escape')
return str(rawbytes, charset, errors)

Why is the text not unquoted in this case?

Actually I wonder whether the function should do any unquoting at all.
There is at least one case where it is called with an already unquoted value (get_boundary, see issue28945).

But if it is intended to do unquoting, it should be consistent.
msg283936 - (view) Author: (bpoaugust) Date: 2016-12-24 10:41
Another case is get_filename.

The second call to unquote will only change the incoming parameter if the original value was enclosed in <> or "". This is not a common scenario, and was only discovered because a mailer used the form <<<abcd>>> as a boundary marker (yes, this is invalid). So existing tests are unlikely to notice any difference.

Note that it is wrong to unquote more times than the original value has been quoted. So the only possible reason for keeping unquote in the function is if unquote has not already been called on a quoted value. (It is not possible to tell from the parameter whether or not it is currently quoted).

The two sample callers are get_boundary and get_filename.
Both pass the value already unquoted.

This method should be fixed to remove the unquote calls.
If there is a caller (is there one?) that does not unquote the value first, then that needs to be fixed as well.
msg284293 - (view) Author: (bpoaugust) Date: 2016-12-29 20:59
I have just checked and AFAICT collapse_rfc2231_value is only called by get_filename and get_boundary in message.py.

Both of these call get_param and default to unquote=True.

So in all cases the parameter value passed to collapse_rfc2231_value will already have been unquoted. Calling unquote again is a bug.

The obvious fix is to remove the calls to unquote from collapse_rfc2231_value.

This will also fix issue28945.
msg284311 - (view) Author: Eric Lafontaine (Eric Lafontaine) * Date: 2016-12-30 05:33
Hi all,

The fix is already provided in issue28945 .  Please review it and I would like to flag it as needing more test on more version of python...

The patch would just require more Test Case that would check for the 
filename for preventing re-occurence in the future on the file-name.

Thanks for the analysis bpoaugust :).  I'm hoping no client relied on this function for the unquote though...

Regards,
Eric Lafontaine
msg284501 - (view) Author: (bpoaugust) Date: 2017-01-02 20:52
If there are concerns about 3rd party code relying on the current behaviour of the function, then just create a new function without the unquoting, and deprecate the original function.

The existing function does not always unquote, so any code which relies on it is liable to break.
History
Date User Action Args
2022-04-11 14:58:40adminsetgithub: 73206
2017-01-03 12:22:37vstinnersettitle: collapse_rfc2231_value has inconsistent unquoting -> email: collapse_rfc2231_value has inconsistent unquoting
2017-01-02 20:52:12bpoaugustsetmessages: + msg284501
2016-12-30 05:33:57Eric Lafontainesetmessages: + msg284311
2016-12-30 01:24:58Eric Lafontainesetnosy: + Eric Lafontaine
2016-12-29 20:59:47bpoaugustsetmessages: + msg284293
2016-12-24 10:41:00bpoaugustsetmessages: + msg283936
2016-12-20 00:30:31bpoaugustsettype: behavior
2016-12-20 00:30:12bpoaugustcreate