This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: urllib.robotparser doesn't work in Py3k
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.0
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: jhylton Nosy List: brett.cannon, jhylton, mgiuca
Priority: normal Keywords: patch

Created on 2008-07-12 13:46 by mgiuca, last changed 2022-04-11 14:56 by admin. This issue is now closed.

File name Uploaded Description Edit mgiuca, 2008-07-12 13:46
Messages (2)
msg69586 - (view) Author: Matt Giuca (mgiuca) Date: 2008-07-12 13:46
urllib.robotparser is broken in Python 3.0, due to a bytes object
appearing where a str is expected.


>>> import urllib.robotparser
>>> r =
TypeError: expected an object with the buffer interface

This is because the variable f in is opened by
urlopen as a binary file, so returns a bytes object.

I've included a patch, which checks if it's a bytes, and if so, decodes
it with 'utf-8'. A more thorough fix might figure out what the charset
of the document is (in f.headers['Content-Type']), but at least this
works, and will be sufficient in almost all cases.

Also there are no test cases for urllib.robotparser.

Patch ( is for branch /branches/py3k, revision 64891.

Commit log:

Lib/urllib/ Fixed robotparser for Python 3.0. urlopen
returns bytes objects where str expected. Decode the bytes using UTF-8.
msg69989 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2008-07-18 21:00
Committed revision 65118.

I applied a simple version of this patch and added a unittest.
Date User Action Args
2022-04-11 14:56:36adminsetgithub: 47597
2008-07-18 21:00:27jhyltonsetstatus: open -> closed
assignee: jhylton
messages: + msg69989
nosy: + jhylton
2008-07-12 18:42:27brett.cannonsetnosy: + brett.cannon
2008-07-12 13:46:16mgiucacreate