Message193346
Lars, I see.
For the uninitiated, the issue is the original url (containing only ascii character) redirects to the url containing non-ascii characters which upsets urllib.
To handle that situation, you can do something like this:
---------------------
import urllib.request
from urllib.parse import quote
url = "http://www.libon.it/libon/search/isbn/3499155443"
req = urllib.request.Request(url)
req.selector = urllib.parse.quote(req.selector)
response = urllib.request.urlopen(req, timeout=30)
the_page = response.read().decode('utf-8')
print(the_page)
---------------------
I admit it that this code is clunky and not pythonic.
I also believe in python standard library, we should have a module to access url containing non-ascii character in an easy manner.
At the very least, maybe we can give proper error message. Something like this would be nice:
"The url is not valid and contains non-ascii character: http://www.libon.it/ricerca/7817940/3499155443/dettaglio/3102314/Onkel-Oswald-und-der-Sudan-Käfer/order/date_desc. This url is redirected from this url: http://www.libon.it/libon/search/isbn/3499155443"
Because users can be confused. They thought they already gave only-ascii-characters url (http://www.libon.it/libon/search/isbn/3499155443) to urllib, but why did they get encoding error?
What do you say, Christian? |
|
Date |
User |
Action |
Args |
2013-07-19 04:45:12 | vajrasky | set | recipients:
+ vajrasky, terry.reedy, orsenthil, christian.heimes, ezio.melotti, Mi.Zou, LDTech |
2013-07-19 04:45:12 | vajrasky | set | messageid: <1374209112.46.0.0649274757557.issue17214@psf.upfronthosting.co.za> |
2013-07-19 04:45:12 | vajrasky | link | issue17214 messages |
2013-07-19 04:45:11 | vajrasky | create | |
|