Message 393146 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Mike.Lissner
Recipients	Mike.Lissner, apollo13, gregory.p.smith, lukasz.langa, mgorny, miss-islington, ned.deily, odd_bloke, orsenthil, sethmlarson, xtreak
Date	2021-05-06.20:36:08
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1620333368.98.0.904016749428.issue43882@roundup.psfhosted.org>
In-reply-to

Content
> With the fix for this bug, urlsplit silently removes (some of) those characters before we can replace them, modifying the output of our sanitisation code I don't have any good solutions for 3.9.5, but going forward, this feels like another example of why we should just do parsing right (the way browsers do). That'd maintain tabs and whatnot in your output, and it'd fix the security issue by putting `java\nscript` into the scheme attribute instead of the path. > One solution that presents itself to me: add a `strip_insecure_characters: bool = True` parameter. Doesn't this lose sight of what this tool is supposed to do? It's not supposed to have a good (new, correct) and a bad (old, obsolete) way of parsing. Woe unto whoever has to write the documentation for that parameter. Also, I should reiterate that these aren't "insecure" characters so if we did have a parameter for this, it'd be more like `do_rfc_3986_parsing` or maybe `do_naive_parsing`. The chars aren't insecure in themselves. They're fine. Python just gets tripped up on them.

>  With the fix for this bug, urlsplit silently removes (some of) those characters before we can replace them, modifying the output of our sanitisation code

I don't have any good solutions for 3.9.5, but going forward, this feels like another example of why we should just do parsing right (the way browsers do). That'd maintain tabs and whatnot in your output, and it'd fix the security issue by putting `java\nscript` into the scheme attribute instead of the path.

> One solution that presents itself to me: add a `strip_insecure_characters: bool = True` parameter.

Doesn't this lose sight of what this tool is supposed to do? It's not supposed to have a good (new, correct) and a bad (old, obsolete) way of parsing. Woe unto whoever has to write the documentation for that parameter. 

Also, I should reiterate that these aren't "insecure" characters so if we did have a parameter for this, it'd be more like `do_rfc_3986_parsing` or maybe `do_naive_parsing`. The chars aren't insecure in themselves. They're fine. Python just gets tripped up on them.

History
Date	User	Action	Args
2021-05-06 20:36:09	Mike.Lissner	set	recipients: + Mike.Lissner, gregory.p.smith, orsenthil, ned.deily, odd_bloke, lukasz.langa, mgorny, apollo13, miss-islington, xtreak, sethmlarson
2021-05-06 20:36:08	Mike.Lissner	set	messageid: <1620333368.98.0.904016749428.issue43882@roundup.psfhosted.org>
2021-05-06 20:36:08	Mike.Lissner	link	issue43882 messages
2021-05-06 20:36:08	Mike.Lissner	create