#! /usr/bin/env python """ Small redirect test case - Python 2.3 """ import urllib x = urllib.urlopen("http://www.weather.com/search/search?where=united+kingdom") string = x.read() print string """ Comments ======== On a URL request where the server returns a URL with spaces in, urllib doesn't correct it before requesting the new page. I think this is technically a server error, however, it does work from web browsers (Mozilla, Safari) but not from Python urllib. I would suggest that when urllib is following "moved temporarily" links (or similar) from a server it translates spaces to %20. Fractions of log from a "tcpdump" session... From python > web1.weather.com: GET /search/search?where=united+kingdom HTTP/1.0 Host: www.weather.com User-agent: Python-urllib/1.15 From web1.weather.com > python: HTTP/1.1 302 Moved Temporarily Date: Fri, 12 Mar 2004 22:56:18 GMT Server: Apache Set-Cookie: UserPreferences=2| |0|real|fast|-1|-1|-1|-1|-1|+| |+| | |-1|Undeclared| | | ;Domain=.weather.com;Expires=Sat, 12-Mar-2005 22:56:18 GMT;Path=/ Location: http://www.weather.com/search/drilldown/?geoCd=4&geoCdChild=1&itemCd=UK&countryCd=UK&itemName=United+Kingdom&countryName=United+Kingdom&what=WeatherLocalUndeclared Connection: close Content-Type: text/plain (notice the United+Kingdom) From Python > web1.weather.com: GET /search/drilldown/?geoCd=4&geoCdChild=1&itemCd=UK&countryCd=UK&itemName=United+Kingdom&countryName=United+Kingdom&what=WeatherLocalUndeclared HTTP/1.0 Host: www.weather.com User-agent: Python-urllib/1.15 From web1.weather.com > Python: HTTP/1.1 302 Moved Temporarily Date: Fri, 12 Mar 2004 22:56:19 GMT Server: Apache Set-Cookie: UserPreferences=2| |0|real|fast|-1|-1|-1|-1|-1|+| |+| | |-1|Undeclared| | | ;Domain=.weather.com;Expires=Sat, 12-Mar-2005 22:56:19 GMT;Path=/ Location: http://www.weather.com/common/drilldown/UK.html?itemName=United Kingdom&geoCd=4&geoCdChild=1&countryName=United Kingdom&countryCd=UK&what=WeatherLocalUndeclared&itemCd=UK Connection: close Content-Type: text/plain (notice the "United Kingdom"s without a %20 or + in the middle) From Python > web GET /common/drilldown/UK.html?itemName=United Kingdom&geoCd=4&geoCdChild=1&countryName=United Kingdom&countryCd=UK&what=WeatherLocalUndeclared&itemCd=UK HTTP/1.0 Host: www.weather.com User-agent: Python-urllib/1.15 (again the "United Kingdom"s without a %20 or + in the middle) From web1.weather.com > Python: HTTP/1.1 400 Bad Request Date: Fri, 12 Mar 2004 22:56:20 GMT Server: Apache Connection: close Content-Type: text/html; charset=iso-8859-1 Other Notes =========== Replacing the open line with this will make it work properly (i.e. changing the spaces to %20)... x = urllib.urlopen("http://web1.weather.com/common/drilldown/UK.html?itemName=United%20Kingdom&geoCd=4&geoCdChild=1&countryName=United%20Kingdom&countryCd=UK&what=WeatherLocalUndeclared&itemCd=UK") Python Program Output ===================== 400 Bad Request

Bad Request

Your browser sent a request that this server could not understand.

The request line contained invalid characters following the protocol string.


Apache/1.3.27 Server at www.weather.com Port 80
"""