Message 119312 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	aclover
Recipients	BM, BreamoreBoy, aclover, akuchling, dstanek, georg.brandl, jerry.seutter, jjlee, tim.peters
Date	2010-10-21.15:55:57
SpamBayes Score	6.702475e-07
Marked as misclassified	No
Message-id	<1287676565.53.0.0376969638445.issue2193@psf.upfronthosting.co.za>
In-reply-to

Content
The various attempts by RFCs to codify HTTP cookies are useless and bear no resemblance to what browsers actually do. In the real world, every byte in the range 0x20-0x7E is allowed, except for the semicolon, the equals (in names), and in Opera, in some places, the double-quote. Many browsers even allow most of the control codes! The question of non-ASCII Unicode characters is tricky, but none of them cause a token break. Contrary to RFC2109 and its successors, no browser takes any notice of quoted-string cookies or backslash-escaping, so the effort Cookie.py puts into producing an encoded string and 'parsing' input cookies is completely wasted. It should do what everyone else does: split on semicolon, left-strip the whitespace, split each cookie on first equals. (In reality cookie names and values have no inherent encoding scheme, so if you want to include out-of-band characters like semicolon, control characters or non-ASCII characters you have to use an ad-hoc encoding scheme, often URL-encoding.)

The various attempts by RFCs to codify HTTP cookies are useless and bear no resemblance to what browsers actually do.

In the real world, every byte in the range 0x20-0x7E is allowed, except for the semicolon, the equals (in names), and in Opera, in some places, the double-quote. Many browsers even allow most of the control codes! The question of non-ASCII Unicode characters is tricky, but none of them cause a token break.

Contrary to RFC2109 and its successors, no browser takes any notice of quoted-string cookies or backslash-escaping, so the effort Cookie.py puts into producing an encoded string and 'parsing' input cookies is completely wasted. It should do what everyone else does: split on semicolon, left-strip the whitespace, split each cookie on first equals.

(In reality cookie names and values have no inherent encoding scheme, so if you want to include out-of-band characters like semicolon, control characters or non-ASCII characters you have to use an ad-hoc encoding scheme, often URL-encoding.)

History
Date	User	Action	Args
2010-10-21 15:56:05	aclover	set	recipients: + aclover, tim.peters, akuchling, georg.brandl, jjlee, dstanek, jerry.seutter, BM, BreamoreBoy
2010-10-21 15:56:05	aclover	set	messageid: <1287676565.53.0.0376969638445.issue2193@psf.upfronthosting.co.za>
2010-10-21 15:55:58	aclover	link	issue2193 messages
2010-10-21 15:55:57	aclover	create