Message 214947 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ned.deily
Recipients	dfarrell07, ned.deily, orsenthil, python-dev, r.david.murray
Date	2014-03-27.10:48:38
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1395917320.23.0.220381937098.issue21069@psf.upfronthosting.co.za>
In-reply-to

Content
After looking at why the 2.7 version of the test does not fail, the problem became apparent. In 2.7, test_errno tests urlopen() of the original deprecated urllib module. In 3.x, the test was ported over but now uses urlopen() of urllib.request which is based on urllib2() of 2.x. 2.7: >>> x = urllib.urlopen("http://www.example.com") [79234 refs] >>> x <addinfourl at 3068742324L whose fp = <socket._fileobject object at 0xb6e7eea4>> [79234 refs] >>> os.fdopen(x.fileno()).read() '<!doctype html>\n<html>\n<head>\n <title>Example Domain</title>\n\n <meta charset="utf-8" />\n <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\n <meta name="viewport" content="width=device-width, initial-scale=1" />\n <style type="text/css">\n body {\n background-color: #f0f0f2;\n margin: 0;\n padding: 0;\n font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n \n }\n div {\n width: 600px;\n margin: 5em auto;\n padding: 50px;\n background-color: #fff;\n border-radius: 1em;\n }\n a:link, a:visited {\n color: #38488f;\n text-decoration: none;\n }\n @media (max-width: 700px) {\n body {\n background-color: #fff;\n }\n div {\n width: auto;\n margin: 0 auto;\n border-radius: 0;\n padding: 1em;\n }\n }\n </style> \n</head>\n\n<body>\n<div>\n <h1>Example Domain</h1>\n <p>This domain is established to be used for illustrative examples in documents. You may use this\n domain in examples without prior coordination or asking for permission.</p>\n <p><a href="http://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>\n' [79234 refs] 3.4 (when the read doesn't fail): >>> x = urllib.request.urlopen("http://www.example.com") >>> x <http.client.HTTPResponse object at 0xb6bc7114> >>> os.fdopen(x.fileno()).read() __main__:1: ResourceWarning: unclosed file <_io.TextIOWrapper name=4 mode='r' encoding='UTF-8'> ' without prior coordination or asking for permission.</p>\n <p><a href="http://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>\n' In the 3.x case (and the 2.7 urllib2 case), the read from the file descriptor starts at mid-response or at the end (returning an empty byte string). In the past, the test passed because of the amount of data returned by the previous test URL. Now, with the short response from www.example.com, it's clear that the file descriptor read is not returning the whole response. I don't know whether the file descriptor read is expected to be meaningful for urllib2/urllib.request. Senthil, what do you think?

After looking at why the 2.7 version of the test does not fail, the problem became apparent.  In 2.7, test_errno tests urlopen() of the original deprecated urllib module.  In 3.x, the test was ported over but now uses urlopen() of urllib.request which is based on urllib2() of 2.x.

2.7:
>>> x = urllib.urlopen("http://www.example.com")
[79234 refs]
>>> x
<addinfourl at 3068742324L whose fp = <socket._fileobject object at 0xb6e7eea4>>
[79234 refs]
>>> os.fdopen(x.fileno()).read()
'<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset="utf-8" />\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\n    <style type="text/css">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 50px;\n        background-color: #fff;\n        border-radius: 1em;\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        body {\n            background-color: #fff;\n        }\n        div {\n            width: auto;\n            margin: 0 auto;\n            border-radius: 0;\n            padding: 1em;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is established to be used for illustrative examples in documents. You may use this\n    domain in examples without prior coordination or asking for permission.</p>\n    <p><a href="http://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>\n'
[79234 refs]

3.4 (when the read doesn't fail):
>>> x = urllib.request.urlopen("http://www.example.com")
>>> x
<http.client.HTTPResponse object at 0xb6bc7114>
>>> os.fdopen(x.fileno()).read()
__main__:1: ResourceWarning: unclosed file <_io.TextIOWrapper name=4 mode='r' encoding='UTF-8'>
' without prior coordination or asking for permission.</p>\n    <p><a href="http://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>\n'

In the 3.x case (and the 2.7 urllib2 case), the read from the file descriptor starts at mid-response or at the end (returning an empty byte string).  In the past, the test passed because of the amount of data returned by the previous test URL.  Now, with the short response from www.example.com, it's clear that the file descriptor read is not returning the whole response.  I don't know whether the file descriptor read is expected to be meaningful for urllib2/urllib.request.

Senthil, what do you think?

History
Date	User	Action	Args
2014-03-27 10:48:40	ned.deily	set	recipients: + ned.deily, orsenthil, r.david.murray, python-dev, dfarrell07
2014-03-27 10:48:40	ned.deily	set	messageid: <1395917320.23.0.220381937098.issue21069@psf.upfronthosting.co.za>
2014-03-27 10:48:40	ned.deily	link	issue21069 messages
2014-03-27 10:48:38	ned.deily	create