Message 69586 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mgiuca
Recipients	mgiuca
Date	2008-07-12.13:46:13
SpamBayes Score	0.00023638742
Marked as misclassified	No
Message-id	<1215870377.1.0.267386206374.issue3347@psf.upfronthosting.co.za>
In-reply-to

Content
urllib.robotparser is broken in Python 3.0, due to a bytes object appearing where a str is expected. Example: >>> import urllib.robotparser >>> r = urllib.robotparser.RobotFileParser('http://www.python.org/robots.txt') >>> r.read() TypeError: expected an object with the buffer interface This is because the variable f in RobotFileParser.read is opened by urlopen as a binary file, so f.read() returns a bytes object. I've included a patch, which checks if it's a bytes, and if so, decodes it with 'utf-8'. A more thorough fix might figure out what the charset of the document is (in f.headers['Content-Type']), but at least this works, and will be sufficient in almost all cases. Also there are no test cases for urllib.robotparser. Patch (robotparser.py.patch) is for branch /branches/py3k, revision 64891. Commit log: Lib/urllib/robotparser.py: Fixed robotparser for Python 3.0. urlopen returns bytes objects where str expected. Decode the bytes using UTF-8.

urllib.robotparser is broken in Python 3.0, due to a bytes object
appearing where a str is expected.

Example:

>>> import urllib.robotparser
>>> r =
urllib.robotparser.RobotFileParser('http://www.python.org/robots.txt')
>>> r.read()
TypeError: expected an object with the buffer interface

This is because the variable f in RobotFileParser.read is opened by
urlopen as a binary file, so f.read() returns a bytes object.

I've included a patch, which checks if it's a bytes, and if so, decodes
it with 'utf-8'. A more thorough fix might figure out what the charset
of the document is (in f.headers['Content-Type']), but at least this
works, and will be sufficient in almost all cases.

Also there are no test cases for urllib.robotparser.

Patch (robotparser.py.patch) is for branch /branches/py3k, revision 64891.

Commit log:

Lib/urllib/robotparser.py: Fixed robotparser for Python 3.0. urlopen
returns bytes objects where str expected. Decode the bytes using UTF-8.

History
Date	User	Action	Args
2008-07-12 13:46:17	mgiuca	set	spambayes_score: 0.000236387 -> 0.00023638742 recipients: + mgiuca
2008-07-12 13:46:17	mgiuca	set	spambayes_score: 0.000236387 -> 0.000236387 messageid: <1215870377.1.0.267386206374.issue3347@psf.upfronthosting.co.za>
2008-07-12 13:46:15	mgiuca	link	issue3347 messages
2008-07-12 13:46:14	mgiuca	create