On Thu, Mar 26, 2009 at 4:29 PM, Barry A. Warsaw <report@bugs.python.org> wrote:

Barry A. Warsaw <barry@python.org> added the comment:

I propose that you only document the getitem header access API.  I.e.
the thing that info() gives you can be used to access the message
headers via message['content-type'].  That's an API common to both
rfc822.Messages (the ultimate base class of mimetools.Message) and
email.message.Message.

As I've found myself in the awkward position of having to explain the new 3.0 api to my students I've thought about this and have some ideas/questions.

I'm also willing to help with the documentation or any enhancements.

>>> x = urllib.request.urlopen('http://knuth.luther.edu/python/test.html')
>>> x['Date']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'addinfourl' object is unsubscriptable

I wish I new what an addinfourl object was.

>>> x.info()['Date']
'Fri, 27 Mar 2009 00:41:34 GMT'

>>> x.headers['Date']
'Fri, 27 Mar 2009 00:41:34 GMT'

>>> x.headers.keys()
['Date', 'Server', 'Last-Modified', 'ETag', 'Accept-Ranges', 'Content-Length', 'Connection', 'Content-Type']

Using x.headers over x.info()  makes the most sense to me, but I don't know that I can give any good rationale.  Which would we want to document?

>>> x.headers['Content-Type']
'text/html; charset=ISO-8859-1'

I guess technically this is correct since the charset is part of the Content-Type header in HTTP but it does make life difficult for what I think will be a pretty common use case in this new urllib:  read from the url (as bytes) and then decode them into a string using the appropriate character set.

As you follow this road, you have the confusing option of these three calls:

>>> x.headers.get_charset()
>>> x.headers.get_content_charset()
'iso-8859-1'
>>> x.headers.get_charsets()
['iso-8859-1']

I think it should be a bug that get_charset() does not return anything in this case.  It is not at all clear why get_content_charset() and get_charset() should have different behavior.

Brad

 

----------
nosy: +barry

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4773>
_______________________________________



--
Brad Miller
Assistant Professor, Computer Science
Luther College