This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Mike.Lissner
Recipients Mike.Lissner, apollo13, gregory.p.smith, lukasz.langa, mgorny, miss-islington, ned.deily, odd_bloke, orsenthil, sethmlarson, xtreak
Date 2021-05-06.20:36:08
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1620333368.98.0.904016749428.issue43882@roundup.psfhosted.org>
In-reply-to
Content
>  With the fix for this bug, urlsplit silently removes (some of) those characters before we can replace them, modifying the output of our sanitisation code

I don't have any good solutions for 3.9.5, but going forward, this feels like another example of why we should just do parsing right (the way browsers do). That'd maintain tabs and whatnot in your output, and it'd fix the security issue by putting `java\nscript` into the scheme attribute instead of the path.

> One solution that presents itself to me: add a `strip_insecure_characters: bool = True` parameter.

Doesn't this lose sight of what this tool is supposed to do? It's not supposed to have a good (new, correct) and a bad (old, obsolete) way of parsing. Woe unto whoever has to write the documentation for that parameter. 

Also, I should reiterate that these aren't "insecure" characters so if we did have a parameter for this, it'd be more like `do_rfc_3986_parsing` or maybe `do_naive_parsing`. The chars aren't insecure in themselves. They're fine. Python just gets tripped up on them.
History
Date User Action Args
2021-05-06 20:36:09Mike.Lissnersetrecipients: + Mike.Lissner, gregory.p.smith, orsenthil, ned.deily, odd_bloke, lukasz.langa, mgorny, apollo13, miss-islington, xtreak, sethmlarson
2021-05-06 20:36:08Mike.Lissnersetmessageid: <1620333368.98.0.904016749428.issue43882@roundup.psfhosted.org>
2021-05-06 20:36:08Mike.Lissnerlinkissue43882 messages
2021-05-06 20:36:08Mike.Lissnercreate