Message 64680 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gregory.p.smith
Recipients	gregory.p.smith, weijie90
Date	2008-03-29.04:19:06
SpamBayes Score	0.15062772
Marked as misclassified	No
Message-id	<1206764349.55.0.983852676416.issue2464@psf.upfronthosting.co.za>
In-reply-to

Content
I'm not sure what the best solution for this is. If I truncate the header values at a \x00 character it ends in an infinite redirect loop (which urllib2 detects and raises on). If I simple remove all \x00 characters the resulting url is not accepted by wikispaces.com due to having an extra / in it. Verdict: wikispaces.com is broken. urllib2 could do better. wget and firefox deal with it properly. but i'll leave deciding which patch to use up to someone who cares about handling broken sites. patch to implement either behavior of dealing with nulls where they shouldn't be: Index: Lib/httplib.py =================================================================== --- Lib/httplib.py (revision 62033) +++ Lib/httplib.py (working copy) @@ -291,9 +291,18 @@ break headerseen = self.isheader(line) if headerseen: + # Some bad web servers reply with headers with a \x00 null + # embedded in the value. Other http clients deal with + # this by treating it as a value terminator, ignoring the + # rest so we will too. http://bugs.python.org/issue2464. + if '\x00' in line: + line = line[:line.find('\x00')] + # if you want to just remove nulls instead use this: + #line = line.replace('\x00', '') # It's a legal header line, save it. hlist.append(line) - self.addheader(headerseen, line[len(headerseen)+1:].strip()) + value = line[len(headerseen)+1:].strip() + self.addheader(headerseen, value) continue else: # It's not a header line; throw it back and stop here.

I'm not sure what the best solution for this is.  If I truncate the
header values at a \x00 character it ends in an infinite redirect loop
(which urllib2 detects and raises on).  If I simple remove all \x00
characters the resulting url is not accepted by wikispaces.com due to
having an extra / in it.

Verdict: wikispaces.com is broken.

urllib2 could do better.  wget and firefox deal with it properly.  but
i'll leave deciding which patch to use up to someone who cares about
handling broken sites.

patch to implement either behavior of dealing with nulls where they
shouldn't be:

Index: Lib/httplib.py
===================================================================
--- Lib/httplib.py      (revision 62033)
+++ Lib/httplib.py      (working copy)
@@ -291,9 +291,18 @@
                 break
             headerseen = self.isheader(line)
             if headerseen:
+                # Some bad web servers reply with headers with a \x00 null
+                # embedded in the value.  Other http clients deal with
+                # this by treating it as a value terminator, ignoring the
+                # rest so we will too.  http://bugs.python.org/issue2464.
+                if '\x00' in line:
+                    line = line[:line.find('\x00')]
+                    # if you want to just remove nulls instead use this:
+                    #line = line.replace('\x00', '')
                 # It's a legal header line, save it.
                 hlist.append(line)
-                self.addheader(headerseen,
line[len(headerseen)+1:].strip())
+                value = line[len(headerseen)+1:].strip()
+                self.addheader(headerseen, value)
                 continue
             else:
                 # It's not a header line; throw it back and stop here.

History
Date	User	Action	Args
2008-03-29 04:19:10	gregory.p.smith	set	spambayes_score: 0.150628 -> 0.15062772 recipients: + gregory.p.smith, weijie90
2008-03-29 04:19:09	gregory.p.smith	set	spambayes_score: 0.150628 -> 0.150628 messageid: <1206764349.55.0.983852676416.issue2464@psf.upfronthosting.co.za>
2008-03-29 04:19:08	gregory.p.smith	link	issue2464 messages
2008-03-29 04:19:07	gregory.p.smith	create