This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author cosoleto
Recipients cosoleto
Date 2007-09-26.07:55:12
SpamBayes Score 0.02637627
Marked as misclassified No
Message-id <1190793332.9.0.29899287923.issue1205@psf.upfronthosting.co.za>
In-reply-to
Content
urllib fail to read URL contents, urllib2 crash Python

Python version:
-------------------------
Python 2.5.1 (r251:54863, May 18 2007, 16:56:43) 
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)]

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit 
(Intel)] on
win32

Python 2.4.4 (#2, Aug 16 2007, 00:34:54) 
[GCC 4.1.3 20070812 (prerelease) (Debian 4.1.2-15)] on linux2

-------------------------

Working with GNU wget:
-------------------------
$ wget -S http://www.recherche.fr/encyclopedie/Thomas-Robert_Bugeaud
--08:42:21--  http://www.recherche.fr/encyclopedie/Thomas-Robert_Bugeaud
           => `Thomas-Robert_Bugeaud'
Risoluzione di www.recherche.fr in corso... 88.191.11.214
Connessione a www.recherche.fr|88.191.11.214:80... connesso.
HTTP richiesta inviata, aspetto la risposta... 
  HTTP/1.1 200 OK
  Date: Wed, 26 Sep 2007 06:42:53 GMT
  Server: Apache/2.2.3 (Debian) PHP/5.2.3-0.dotdeb.1 with Suhosin-Patch
  X-Powered-By: PHP/5.2.3-0.dotdeb.1
  Keep-Alive: timeout=15, max=100
  Connection: Keep-Alive
  Transfer-Encoding: chunked
  Content-Type: text/html; charset=UTF-8
Lunghezza: non specificato [text/html]

    [                             <=>                         ] 
267,080       --.--K/s             

08:42:42 (14.11 KB/s) - "Thomas-Robert_Bugeaud" salvato [267080]
-------------------------

Python:
-------------------------
>>> import urllib
>>> a = urllib.urlopen('http://www.recherche.fr/encyclopedie/Thomas-
Robert_Bugeaud')
>>> c = a.read(1024*1024*2)
>>> len(c)       
1035220

>>> c[63000:64000]
'he.fr en page d\'accueil</a><br>\n      <span>Partenaires :</span> <a 
href="http://www.cartes.fr/" target="_blank">Cartes\n      
postales</a>&nbsp; <a href="http://www.deux.fr/script/" 
target="_blank">Rencontres\n      gratuites\n      </a>&nbsp; <a 
href="http://www.new.fr/" target="_blank">Noms\n      de domaine 
gratuits</a>&nbsp; <a href="http://www.netencyclo.com/" 
target="_blank">Encyclopedia</a>&nbsp;</p>\n      <p style="text-
align:center;"><a href="http://www.futureobject.com/" 
target="_blank"><img src="http://www.recherche.fr/images/logo_fo.gif" 
border="0" height="25" width="96"></a></p>\n\n  </p>\n </div>\n 
</div><!-- site -->\n</body>\n</html>\n\r\n\x00\x00\x00\x00\x00\x00\x00
\x00\x00[...omission...]\x00\x00\x00\x00'
-------------------------

As above, but with urllib2 module instead of urllib:

-------------------------
  File "/usr/lib/python2.5/socket.py", line 291, in read
    data = self._sock.recv(recv_size)
  File "/usr/lib/python2.5/httplib.py", line 509, in read
    return self._read_chunked(amt)
  File "/usr/lib/python2.5/httplib.py", line 548, in _read_chunked
    chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: '\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00[...omission...]\x00\x00\x00\x00\x00\x00\x00
\
-------------------------

As above, but with Python 2.4:
-------------------------
>>> import urllib2
>>> a = urllib2.urlopen('http://www.recherche.fr/encyclopedie/Thomas-
Robert_Bugeaud')

>>> 
>>> c = a.read(1024*1024*2)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.4/socket.py", line 295, in read
    data = self._sock.recv(recv_size)
  File "/usr/lib/python2.4/httplib.py", line 460, in read
    return self._read_chunked(amt)
  File "/usr/lib/python2.4/httplib.py", line 499, in _read_chunked
    chunk_left = int(line, 16)
ValueError: invalid literal for int(): 
-------------------------

Regards,
Francesco Cosoleto
History
Date User Action Args
2007-09-26 07:55:35cosoletosetspambayes_score: 0.0263763 -> 0.02637627
recipients: + cosoleto
2007-09-26 07:55:33cosoletosetspambayes_score: 0.0263763 -> 0.0263763
messageid: <1190793332.9.0.29899287923.issue1205@psf.upfronthosting.co.za>
2007-09-26 07:55:32cosoletolinkissue1205 messages
2007-09-26 07:55:13cosoletocreate