classification
Title: unicode support in Cookie module
Type: Stage:
Components: Documentation, Unicode Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Alexander.Tsepkov, docs@python, eric.araujo, ezio.melotti, martin.panter, orsenthil, r.david.murray, vstinner, whit537
Priority: normal Keywords: patch

Created on 2011-02-25 04:05 by Alexander.Tsepkov, last changed 2019-01-27 13:21 by BreamoreBoy.

Files
File name Uploaded Description Edit
cookie_patch.patch Alexander.Tsepkov, 2011-02-25 04:05
failing-tests-for-type-regression.patch whit537, 2015-10-30 15:13 review
fix-the-two-new-tests.patch whit537, 2015-10-30 15:39
start-converting-to-bytes-everywhere.patch whit537, 2015-10-30 17:44
Messages (9)
msg129330 - (view) Author: Alexander Tsepkov (Alexander.Tsepkov) Date: 2011-02-25 04:05
in Lib/Cookie.py, BaseCookie load() method performs the following comparison on line 624:

str(rawdata) == str("")

This breaks when a unicode string is passed in for rawdata. I've included a patch that fixes this issue by using isinstance(rawdata, basestring) comparison instead. Additionally the patch encodes rawdata in ascii before sending it to __ParseString() since that method does not support unicode.
msg129448 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-02-25 22:43
Thanks for the report and patch.  This is not as easy as it sounds: The doc ambiguously uses “string”; the code clearly wants only byte strings (str); in the 3.x version, only character strings (unicode) are accepted; “raw” would suggest to me than only bytes make sense.

I would tend to edit the documentation but no the behavior, given that 2.7 is stable and this behavior has been present and documented for a long time.  Senthil, what’s your opinion?
msg236540 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-02-24 20:13
The code was changed to do this in r83361 #3788.
msg253735 - (view) Author: Chad Whitacre (whit537) Date: 2015-10-30 14:47
> in the 3.x version, only character strings (unicode) are accepted
> The code was changed to do this in r83361 #3788.

That seems like a bug to me. It looks like the intention was to avoid the `type("")` check for stylistic reasons, so `isinstance(rawdata, str)` is an understandable translation under 3.1 (which #3788 targets), but I suspect that `type("")` should never have survived the transition to Python 3 in the first place. The 2.7 branch still has `type("")` (not `str("")` as originally reported):

https://hg.python.org/cpython/file/2.7/Lib/Cookie.py#l639


> “raw” would suggest to me than only bytes make sense.

Agreed. Cookie names and values are constrained to a subset of ASCII:

https://tools.ietf.org/html/rfc6265#section-4.1.1

I suggest cookies be thought of as a binary data store, with programmers responsible to encode/decode their data at the boundary with the cookies library.


> I would tend to edit the documentation but no[t] the behavior, given that 2.7 is stable and this behavior has been present and documented for a long time.

Leaving 2.7 as-is makes sense, but now I think it looks like we have a regression in 3, which should be fixed.

----

P.S. I arrived here from https://github.com/gratipay/aspen.py/pull/524.
msg253736 - (view) Author: Chad Whitacre (whit537) Date: 2015-10-30 15:13
Here's a patch with a couple failings tests for the type regression in 3.x.
msg253740 - (view) Author: Chad Whitacre (whit537) Date: 2015-10-30 15:39
Here's a patch that fixes the two new failing tests. Now a bunch of other tests are busted. :-)
msg253750 - (view) Author: Chad Whitacre (whit537) Date: 2015-10-30 17:44
Here's a start on converting to bytes everywhere for cookies. I'm not sure I fully understand the library's original worldview on type conversion. There's a value_{decode,encode} facility, and in the test suite attribute values in particular are often not even strings of any kind.
msg253752 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-10-30 19:14
If strings and values are constrained to be ascii, there is no obvious reason to use bytes.  I don't see a bug here in python3.
msg334410 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2019-01-27 02:00
Sorry, but changing to bytes after ten years of using str in this module in Python 3 is not going to happen. Let’s just document the state of Python 2 (see Éric: https://bugs.python.org/issue11315#msg129448).
History
Date User Action Args
2019-01-27 13:21:09BreamoreBoysetnosy: - BreamoreBoy
2019-01-27 02:03:56martin.panterlinkissue2212 superseder
2019-01-27 02:00:35martin.pantersetassignee: docs@python

components: + Documentation
title: Fix type regression in http.cookies.load (want bytes, not str) -> unicode support in Cookie module
nosy: + martin.panter, ezio.melotti, docs@python, vstinner
versions: + Python 2.7, - Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6
messages: + msg334410
2015-10-30 19:14:28r.david.murraysetnosy: + r.david.murray
messages: + msg253752
2015-10-30 17:44:48whit537setfiles: + start-converting-to-bytes-everywhere.patch

messages: + msg253750
2015-10-30 15:39:41whit537setfiles: + fix-the-two-new-tests.patch

messages: + msg253740
2015-10-30 15:13:15whit537setfiles: + failing-tests-for-type-regression.patch

messages: + msg253736
2015-10-30 14:51:23whit537setversions: + Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6, - Python 2.7
title: Fix/add unicode support in Cookie module? -> Fix type regression in http.cookies.load (want bytes, not str)
2015-10-30 14:48:00whit537setnosy: + whit537
messages: + msg253735
2015-02-24 20:13:36BreamoreBoysetnosy: + BreamoreBoy
messages: + msg236540
2011-02-25 22:43:57eric.araujosetnosy: + orsenthil
2011-02-25 22:43:49eric.araujosetnosy: + eric.araujo
title: Cookie.py breaks when passed unicode, fix included -> Fix/add unicode support in Cookie module?
messages: + msg129448

versions: + Python 2.7, - Python 2.6
2011-02-25 04:05:01Alexander.Tsepkovcreate