You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
assignee=Noneclosed_at=<Date2008-07-18.21:00:27.948>created_at=<Date2008-07-12.13:46:16.049>labels= ['type-bug', 'library']
title="urllib.robotparser doesn't work in Py3k"updated_at=<Date2008-07-18.21:00:27.938>user='https://bugs.python.org/mgiuca'
urllib.robotparser is broken in Python 3.0, due to a bytes object
appearing where a str is expected.
Example:
>>> import urllib.robotparser
>>> r =
urllib.robotparser.RobotFileParser('http://www.python.org/robots.txt')
>>> r.read()
TypeError: expected an object with the buffer interface
This is because the variable f in RobotFileParser.read is opened by
urlopen as a binary file, so f.read() returns a bytes object.
I've included a patch, which checks if it's a bytes, and if so, decodes
it with 'utf-8'. A more thorough fix might figure out what the charset
of the document is (in f.headers['Content-Type']), but at least this
works, and will be sufficient in almost all cases.
Also there are no test cases for urllib.robotparser.
Patch (robotparser.py.patch) is for branch /branches/py3k, revision 64891.
Commit log:
Lib/urllib/robotparser.py: Fixed robotparser for Python 3.0. urlopen
returns bytes objects where str expected. Decode the bytes using UTF-8.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: