Message 231099 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	soilandreyes
Recipients	martin.panter, orsenthil, soilandreyes
Date	2014-11-13.09:46:28
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1415871989.55.0.214093610086.issue22852@psf.upfronthosting.co.za>
In-reply-to

Content
I tried to make a patch for this, but I found it quite hard as the urllib/parse.py is fairly low-level, e.g. it is constantly encoding/decoding bytes and strings within each URI component. Basically the code assumes there are tuples of strings, with support for both bytes and strings baked in later. As you see in https://github.com/stain/cpython/compare/issue-2285-urllib-empty-fragment?expand=1 the patch in parse.py is small - but the effect of that in test_urlparse.py is a bit bigger, as lots of test are testing for the result of urlsplit to have '' instead of None. It is uncertain how much real-life client code also check for '' directly. ("if not p.fragment" would of course still work - but "if p.fragment == ''" would not work anymore. I therefore suggest an alternative to my patch above - to add some boolean fields like has_fragment, thus the existing component fields can keep their backwards compatible '' and b'' values even when a component is actually missing, and yet allowing geturl() to reconstitute the URI according to the RFC.

I tried to make a patch for this, but I found it quite hard as the urllib/parse.py is fairly low-level, e.g. it is constantly encoding/decoding bytes and strings within each URI component. Basically the code assumes there are tuples of strings, with support for both bytes and strings baked in later.

As you see in 

https://github.com/stain/cpython/compare/issue-2285-urllib-empty-fragment?expand=1

the patch in parse.py is small - but the effect of that in test_urlparse.py is a bit bigger, as lots of test are testing for the result of urlsplit to have '' instead of None. It is uncertain how much real-life client code also check for '' directly. ("if not p.fragment" would of course still work - but "if p.fragment == ''" would not work anymore.

I therefore suggest an alternative to my patch above - to add some boolean fields like has_fragment, thus the existing component fields can keep their backwards compatible '' and b'' values even when a component is actually missing, and yet allowing geturl() to reconstitute the URI according to the RFC.

History
Date	User	Action	Args
2014-11-13 09:46:29	soilandreyes	set	recipients: + soilandreyes, orsenthil, martin.panter
2014-11-13 09:46:29	soilandreyes	set	messageid: <1415871989.55.0.214093610086.issue22852@psf.upfronthosting.co.za>
2014-11-13 09:46:29	soilandreyes	link	issue22852 messages
2014-11-13 09:46:28	soilandreyes	create