Title: does parse_header really belong in CGI module?
Type: enhancement Stage: needs patch
Components: email, Library (Lib) Versions: Python 3.5
Status: languishing Resolution:
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: barry, berker.peksag, flox, janssen, martin.panter, mgiuca, orsenthil, r.david.murray
Priority: normal Keywords: patch

Created on 2008-08-20 00:51 by janssen, last changed 2015-02-22 01:25 by martin.panter.

File name Uploaded Description Edit
issue3609-py27.diff orsenthil, 2009-04-02 21:46 review
Messages (10)
msg71500 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2008-08-20 00:51
Not sure how to class this, but shouldn't the "parse_header" function in
cgi really be in email.header?  And what about parse_multipart?
msg71512 - (view) Author: Matt Giuca (mgiuca) Date: 2008-08-20 05:08
These functions are for generic MIME headers and bodies, so are
applicable to CGI, HTTP, Email, and any other protocols based on MIME.
So I think having them in email.header makes about as much sense as
having them in cgi.

Isn't mimetools a better package for this?

Also I think there's an exodus of functions from cgi -- there's talk
about parse_qs/parse_qsl being moved to urllib (I thought that was
almost finalised). Traditionally the cgi module has had way too much
stuff in it which only superficially applies to cgi.

I'm also thinking of cgi.escape, which I'd rather see in htmllib than
cgi (except that htmllib is described as "A parser for HTML documents").

But I'm worried that these functions are too ingrained in people's
memories (I type "cgi.escape" several times a day and I'd get confused
if it moved). So perhaps these moves are too late.

I imagine if they were moved (at least for a few versions) the old ones
would still work, with a deprecation warning?
msg71515 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2008-08-20 06:21
> I imagine if they were moved (at least for a few versions) the old ones
> would still work, with a deprecation warning?

Yes, that's what I was thinking.
msg85270 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2009-04-02 21:46
The attached patch takes care of relocating the parse_header function
from cgi module to email.header module in Python2.7.

Few comments:

1)parse_multipart need not be moved from cgi, because it is discouraged
to use parse_multipart and it is advised to use FieldStorage class
methods which does the same thing.

2)Should the relocation happen in Python 2.7 as well as in Python 3K or
only in Python 3k?  ( The patch is for Python 2.7, but can be ported to
Python 3k).

4) If changes happen in Python 2.7, then cgi.parse_header will have
DeprecationWarning just in case we go for more versions in Python 2.x

5)Does anyone have any concerns with this change?  I plan to ask at
Python-Dev as well?
msg166015 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-07-21 12:29
This refactoring between cgi and email modules is languishing for few years.
Any thought?
msg235674 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-10 07:02
Good idea to move this to somewhere more visible and obvious. I would have been using parse_header() much earlier if I had known it existed.

However, maybe it would be better off in the “email.message” module. The rest of the “email.header” module only seems to be about internationalized header fields with special encodings.
msg236200 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-02-18 23:06
There is no reason to move this to the email package.  email can already parse headers just fine.  It might be useful to have a parse_header utility method, though, since currently the easiest way to parse a single header using the email package is:

  >>> from email.policy import HTTP
  >>> from email.parser import Parser
  >>> m = Parser(policy=HTTP).parsestr('Content-Type: text/plain; filename="foo"\n\n')
  >>> m['Content-Type'].content_type
  >>> m['Content-type'].params
  mappingproxy({'filename': 'foo'})

Which isn't as straightforward as the parse_header API when you are only interested in a single header.

It might also be useful to have the email MIME headers grow a 'value' attribute to return whatever the value is (in this case, text/plain), so it can be accessed regardless of the header type.

I would make parse_header be a utility method of the policy, I think.  Then the email API would be:

  from email.policy import HTTP
  h = HTTP.parse_header('Content-Type: ...')
  value, params = h.value, h.params

It would then be trivial to implement the backward compatibility shim for the CGI parse_header method using the above.
msg236210 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-19 02:39
HTTP policy sounds as good a place as any, especially if/when it is blessed as a stable API.

Another related function is the undocumented http.cookiejar.split_header_words(), which seems more flexible than cgi.parse_headers(). It accepts multiple and comma-separated header values, splits on spaces as well as semicolons, and retains parameters without equal signs. Currently I have code that abuses the Message.get_params() and get_param() methods, which could probably benefit from split_header_words():

# Parse RTSP Transport headers like
# Transport: RTP/AVP/TCP;interleaved=0-1, RTP/AVP;unicast;client_port=5004
for value in header_list(self.headers, "Transport"):  # Splits at commas
    header = email.message.Message()
    # Default get_params() header is Content-Type
    header["Content-Type"] = value
    [transport, _] = header.get_params()[0]  # Gets the RTP/AVP part
    mode = header.get_param("mode", "PLAY")
    channel = header.get_param("interleaved")
    if header.get_param("unicast") is not None:
        port = header.get_param("client_port")
msg236211 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-02-19 04:08
We can also add a parser for that format to headerregistry.  Is there an RFC that describes it?  (It should probably be a separate issue.)

The new email API will be promoted to stable in 3.5.  Mostly I just need to update the docs.
msg236399 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-22 01:25
I opened Issue 23498 about exposing split_header_words() or similar. So this issue can focus on moving parse_header() to an email.policy.HTTP method or whatever.

RTSP 1.0 and its Transport header is defined in RFC 2326: <>. However it makes more sense to me to provide the generic header value parsing routines where possible, like parse_header() and Message.get_param/s(), rather than hard-coding them for specific header names (my vague understanding of what the header registry module does).

The python-dev post mentioned above seems to be at <>, with one response.
Date User Action Args
2015-02-22 01:25:44martin.pantersetmessages: + msg236399
2015-02-19 04:27:27berker.peksagsetnosy: + berker.peksag
2015-02-19 04:08:55r.david.murraysetmessages: + msg236211
2015-02-19 02:39:32martin.pantersetmessages: + msg236210
2015-02-18 23:06:04r.david.murraysetstage: patch review -> needs patch
messages: + msg236200
versions: + Python 3.5, - Python 3.4
2015-02-10 07:02:29martin.pantersetnosy: + r.david.murray, barry, martin.panter
messages: + msg235674
components: + email
2012-07-21 12:29:46floxsetstatus: open -> languishing

messages: + msg166015
versions: + Python 3.4, - Python 3.2
2010-08-27 03:26:48floxsetnosy: + flox
stage: patch review

versions: + Python 3.2, - Python 3.0
2009-04-02 21:46:04orsenthilsetfiles: + issue3609-py27.diff
assignee: orsenthil
messages: + msg85270

keywords: + patch
2008-09-19 12:07:12orsenthilsetnosy: + orsenthil
2008-08-20 06:21:24janssensetmessages: + msg71515
2008-08-20 05:08:05mgiucasetnosy: + mgiuca
messages: + msg71512
2008-08-20 00:51:16janssencreate