classification
Title: Case of headers in urllib.request.Request
Type: behavior Stage:
Components: Library (Lib) Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: emphoeller
Priority: normal Keywords:

Created on 2021-09-09 00:39 by emphoeller, last changed 2021-09-09 00:39 by emphoeller.

Messages (1)
msg401429 - (view) Author: E. M. P. Höller (emphoeller) Date: 2021-09-09 00:39
urllib.request.Request internally .capitalize()s header names before adding them, as can be seen here:  https://github.com/python/cpython/blob/3.9/Lib/urllib/request.py#L399

Since HTTP headers are case-insensitive, but dicts are not, this ensures that add_header and add_unredirected_header overwrite an existing header (as documented) even if they were passed in different cases.  However, this also carries two problems with it:

1. has_header, get_header, and remove_header do not apply this normalisation to their header_name parameter, causing them to fail unexpectedly when the header is passed in the wrong case.
2. Some servers do not comply with the standard and check some headers case-sensitively.  If the case they expect is different from the result of .capitalize(), those headers effectively cannot be passed to them via urllib.

These problems have already been discussed quite some time ago, and yet they still are present:
https://bugs.python.org/issue2275
https://bugs.python.org/issue12455
Or did I overlook something and there is a good reason why things are this way?

If not, I suggest that add_header and add_unredirected_header store the headers in the case they were passed (while preserving the case-insensitive overwriting behaviour) and that has_header, get_header, and remove_header find headers independent of case.

Here is a possible implementation:

# Helper outside class

# Stops after the first hit since there should be at most one of each header in the dict
def _find_key_insensitive(d, key):
  key = key.lower()
  for key2 in d:
    if key2.lower() == key:
      return key2
  return None # Unnecessary, but explicit is better than implicit ;-)

# Methods of Request

def add_header(self, key, val):
  # useful for something like authentication
  existing_key = _find_key_insensitive(self.headers, key)
  if existing_key:
    self.headers.pop(existing_key)
  self.headers[key] = val

def add_unredirected_header(self, key, val):
  # will not be added to a redirected request
  existing_key = _find_key_insensitive(self.unredirected_hdrs, key)
  if existing_key:
    self.unredirected_hdrs.pop(existing_key)
  self.unredirected_hdrs[key] = val

def has_header(self, header_name):
  return bool(_find_key_insensitive(self.headers, header_name) or
      _find_key_insensitive(self.unredirected_hdrs, header_name))

def get_header(self, header_name, default=None):
  key = _find_key_insensitive(self.headers, header_name)
  if key:
    return self.headers[key]
  key = _find_key_insensitive(self.unredirected_hdrs, header_name)
  if key:
    return self.unredirected_hdrs[key]
  return default

def remove_header(self, header_name):
  key = _find_key_insensitive(self.headers, header_name)
  if key:
    self.headers.pop(key)
  key = _find_key_insensitive(self.unredirected_hdrs, header_name)
  if key:
    self.unredirected_hdrs.pop(key)

I’m sorry if it is frowned upon to post code suggestions here like that; I didn’t have the confidence to create a pull request right away.
History
Date User Action Args
2021-09-09 00:39:40emphoellercreate