Title: urllib.request.parse_http_list incorrectly strips backslashes
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, labrat, orsenthil, r.david.murray
Priority: normal Keywords:

Created on 2018-03-06 06:51 by labrat, last changed 2022-04-11 14:58 by admin.

Messages (1)
msg313308 - (view) Author: W. Trevor King (labrat) * Date: 2018-03-06 06:51
Python currently strips backslashes from inside quoted strings:

  $ echo 'a="b\"c",d=e' | python3 -c 'from sys import stdin; from urllib.request import parse_http_list; print(parse_http_list('
  ['a="b"c"', 'd=e']

It should be printing:

  ['a="b\"c"', 'd=e']

The bug is this continue [1], which should be removed.  This was not a problem with the original implementation [2].  It was introduced in [3] as a fix for #735248 with explicit tests asserting the broken behavior [3].  Stripping backslashes from the insides of quoted strings is not appropriate, because it breaks explicit unquoting with email.utils.unquote [4]:

  import email.utils
  import urllib.request
  list = r'"b\\"c"'
  entry = urllib.request.parse_http_list(list)[0]
  entry  # '"b\\"c"', should be '"b\\\\"c"'
  email.utils.unquote(entry)  # 'b"c', should be 'b\\"c'

I'm happy to file patches against the various branches if that would help, but as a one-line removal (plus adjusting the tests), it might be easier if a maintainer files the patches.

