Issue 43910: cgi.parse_header does not handle escaping correctly

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/88076

classification

Title:	cgi.parse_header does not handle escaping correctly
Type:	behavior	Stage:	patch review
Components:	Library (Lib)	Versions:	Python 3.10

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	msg555
Priority:	normal	Keywords:	patch

Created on 2021-04-22 08:03 by msg555, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL	Status	Linked	Edit
PR 25519	open	msg555, 2021-04-22 08:14

Messages (1)
msg391580 - (view)	Author: Mark Gordon (msg555) *	Date: 2021-04-22 08:03
cgi.parse_header incorrectly handles unescaping of quoted-strings Note that you can find the related RFCs to how HTTP encodes the Content-Type header at https://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html and further discussion on how quoted-string is defined at https://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-16.html#rfc.section.3.2.1.p.3. The way parse_header is written it has no context to be able to tell if a backslash is escaping a double quote or if the backslash is actually the escaped character and the double quote is free-standing, unescaped. For this reason it fails to parse values that have a backslash literal at the end. e.g. the following Content-Type will fail to be parsed a/b; foo="moo\\"; bar=baz Example run on current cpython master demonstrating the bug: Python 3.10.0a7+ (heads/master:660592f67c, Apr 21 2021, 22:51:04) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import cgi >>> query = 'a; foo="moo\\\\"; bar=cow' >>> print(query) a; foo="moo\\"; bar=cow >>> cgi.parse_header(query) ('a', {'foo': '"moo\\\\"; bar=cow'})

msg391580 - (view)

Author: Mark Gordon (msg555) *

Date: 2021-04-22 08:03

cgi.parse_header incorrectly handles unescaping of quoted-strings

Note that you can find the related RFCs to how HTTP encodes the Content-Type header at https://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html and further discussion on how quoted-string is defined at https://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-16.html#rfc.section.3.2.1.p.3.

The way parse_header is written it has no context to be able to tell if a backslash is escaping a double quote or if the backslash is actually the escaped character and the double quote is free-standing, unescaped. For this reason it fails to parse values that have a backslash literal at the end. e.g. the following Content-Type will fail to be parsed

a/b; foo="moo\\"; bar=baz

Example run on current cpython master demonstrating the bug:

Python 3.10.0a7+ (heads/master:660592f67c, Apr 21 2021, 22:51:04) [GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cgi
>>> query = 'a; foo="moo\\\\"; bar=cow' 
>>> print(query)
a; foo="moo\\"; bar=cow
>>> cgi.parse_header(query)
('a', {'foo': '"moo\\\\"; bar=cow'})

History
Date	User	Action	Args
2022-04-11 14:59:44	admin	set	github: 88076
2021-04-22 08:14:58	msg555	set	keywords: + patch stage: patch review pull_requests: + pull_request24236
2021-04-22 08:03:45	msg555	create