Author jonathan-lp
Recipients jonathan-lp
Date 2018-03-09.08:24:00
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
(Confirmed in 2.7.14, 3.5.4, and 3.6.3)

I have this really bad URL from a crawl:
"http://Server=sde; Service=sde:oracle$sde:oracle11g:geopp; User=bodem; Version=SDE.DEFAULT"

if I try and parse it with wither urlparse or urlsplit it works - no errors. But when I try and get the port, I get a ValueError.

> from urllib.parse import urlparse
> r = urlparse('http://Server=sde; Service=sde:oracle$sde:oracle11g:geopp; User=bodem; Version=SDE.DEFAULT')
ParseResult(scheme='http', netloc='Server=sde; Service=sde:oracle$sde:oracle11g:geopp; User=bodem; Version=SDE.DEFAULT', path='', params='', query='', fragment='')

Ok, great, now to use the result:
> print(r.port)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "E:\Software\_libs\Python36\lib\urllib\", line 167, in port
    port = int(port, 10)
ValueError: invalid literal for int() with base 10: 'oracle$sde:oracle11g:geopp; User=bodem; Version=SDE.DEFAULT'

I'm not a Python Guru, but to me at least it's inconsistent with how every other Python Function works. In all other builtin functions I've used it would fail with the exception when I ran the function, not when I try and get the results. This caused a good few minutes of head-scratching while I tried to debug why my try/except wasn't catching it.

This inconsistency makes the results more difficult to use. Now a user needs to wrap all calls to the *results* in a try/except, or write an entire function just to "read" the results into a won't-except tuple/dict. Seems sub-optimal.

(May relate to:
Date User Action Args
2018-03-09 08:24:01jonathan-lpsetrecipients: + jonathan-lp
2018-03-09 08:24:01jonathan-lpsetmessageid: <>
2018-03-09 08:24:01jonathan-lplinkissue33034 messages
2018-03-09 08:24:00jonathan-lpcreate