This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author soilandreyes
Recipients martin.panter, orsenthil, soilandreyes
Date 2014-11-13.09:46:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1415871989.55.0.214093610086.issue22852@psf.upfronthosting.co.za>
In-reply-to
Content
I tried to make a patch for this, but I found it quite hard as the urllib/parse.py is fairly low-level, e.g. it is constantly encoding/decoding bytes and strings within each URI component. Basically the code assumes there are tuples of strings, with support for both bytes and strings baked in later.

As you see in 

https://github.com/stain/cpython/compare/issue-2285-urllib-empty-fragment?expand=1

the patch in parse.py is small - but the effect of that in test_urlparse.py is a bit bigger, as lots of test are testing for the result of urlsplit to have '' instead of None. It is uncertain how much real-life client code also check for '' directly. ("if not p.fragment" would of course still work - but "if p.fragment == ''" would not work anymore.

I therefore suggest an alternative to my patch above - to add some boolean fields like has_fragment, thus the existing component fields can keep their backwards compatible '' and b'' values even when a component is actually missing, and yet allowing geturl() to reconstitute the URI according to the RFC.
History
Date User Action Args
2014-11-13 09:46:29soilandreyessetrecipients: + soilandreyes, orsenthil, martin.panter
2014-11-13 09:46:29soilandreyessetmessageid: <1415871989.55.0.214093610086.issue22852@psf.upfronthosting.co.za>
2014-11-13 09:46:29soilandreyeslinkissue22852 messages
2014-11-13 09:46:28soilandreyescreate