Message 322652 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	chris.jerdonek
Recipients	chris.jerdonek
Date	2018-07-30.04:39:02
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1532925543.25.0.56676864532.issue34276@psf.upfronthosting.co.za>
In-reply-to

Content
urllib.parse doesn't seem to round-trip file URI's containing multiple leading slashes. For example, this-- import urllib.parse def round_trip(url): parsed = urllib.parse.urlsplit(url) new_url = urllib.parse.urlunsplit(parsed) print(f'{url} [{parsed}]\n{new_url}') print('ROUNDTRIP: {}\n'.format(url == new_url)) for i in range(4): round_trip('file://{}root/a'.format(i * '/')) results in-- file://root/a [SplitResult(scheme='file', netloc='root', path='/a', query='', fragment='')] file://root/a ROUNDTRIP: True file:///root/a [SplitResult(scheme='file', netloc='', path='/root/a', query='', fragment='')] file:///root/a ROUNDTRIP: True file:////root/a [SplitResult(scheme='file', netloc='', path='//root/a', query='', fragment='')] file://root/a ROUNDTRIP: False file://///root/a [SplitResult(scheme='file', netloc='', path='///root/a', query='', fragment='')] file:///root/a ROUNDTRIP: False URI's of the form file:////<host>/<share>/<path> occur, for example, when one wants to git-clone a UNC path on Windows: https://stackoverflow.com/a/2520121/262819 Here is where CPython defines urlunsplit(): https://github.com/python/cpython/blob/4e11c461ed39085b8495a35c9367b46d8a0d306d/Lib/urllib/parse.py#L465-L482 (The '//' special-casing seems to occur in this line here: https://github.com/python/cpython/blob/4e11c461ed39085b8495a35c9367b46d8a0d306d/Lib/urllib/parse.py#L473 ) And here is where the round-tripping is tested: https://github.com/python/cpython/blob/4e11c461ed39085b8495a35c9367b46d8a0d306d/Lib/test/test_urlparse.py#L156 (Three initial leading slashes is tested, but not the problem case of four or more.)

urllib.parse doesn't seem to round-trip file URI's containing multiple leading slashes.  For example, this--

    import urllib.parse

    def round_trip(url):
        parsed = urllib.parse.urlsplit(url)
        new_url = urllib.parse.urlunsplit(parsed)
        print(f'{url} [{parsed}]\n{new_url}')
        print('ROUNDTRIP: {}\n'.format(url == new_url))

    for i in range(4):
        round_trip('file://{}root/a'.format(i * '/'))

results in--

    file://root/a [SplitResult(scheme='file', netloc='root', path='/a', query='', fragment='')]
    file://root/a
    ROUNDTRIP: True

    file:///root/a [SplitResult(scheme='file', netloc='', path='/root/a', query='', fragment='')]
    file:///root/a
    ROUNDTRIP: True

    file:////root/a [SplitResult(scheme='file', netloc='', path='//root/a', query='', fragment='')]
    file://root/a
    ROUNDTRIP: False

    file://///root/a [SplitResult(scheme='file', netloc='', path='///root/a', query='', fragment='')]
    file:///root/a
    ROUNDTRIP: False

URI's of the form file:////<host>/<share>/<path> occur, for example, when one wants to git-clone a UNC path on Windows:
https://stackoverflow.com/a/2520121/262819

Here is where CPython defines urlunsplit():
https://github.com/python/cpython/blob/4e11c461ed39085b8495a35c9367b46d8a0d306d/Lib/urllib/parse.py#L465-L482
(The '//' special-casing seems to occur in this line here:
https://github.com/python/cpython/blob/4e11c461ed39085b8495a35c9367b46d8a0d306d/Lib/urllib/parse.py#L473 )
 
And here is where the round-tripping is tested:
https://github.com/python/cpython/blob/4e11c461ed39085b8495a35c9367b46d8a0d306d/Lib/test/test_urlparse.py#L156
(Three initial leading slashes is tested, but not the problem case of four or more.)

History
Date	User	Action	Args
2018-07-30 04:39:03	chris.jerdonek	set	recipients: + chris.jerdonek
2018-07-30 04:39:03	chris.jerdonek	set	messageid: <1532925543.25.0.56676864532.issue34276@psf.upfronthosting.co.za>
2018-07-30 04:39:03	chris.jerdonek	link	issue34276 messages
2018-07-30 04:39:02	chris.jerdonek	create