This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author martin.panter
Recipients Nam.Nguyen, martin.panter, serhiy.storchaka, vstinner
Date 2017-07-02.11:44:04
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1498995844.64.0.369462620102.issue30713@psf.upfronthosting.co.za>
In-reply-to
Content
It might help if you explained why you want to make these changes. Otherwise I have to guess. Is a compromise between strictly rejecting all non-URL characters (not just control characters), versus leaving it up to user applications to validate their URLs?

I guess it could partially prevent some newline injection problems like Issue 29606 (FTP) and Issue 30458 (HTTP). But how do we know it closes more security holes than it opens?

I don’t understand the focus on these three functions. They are undocumented and more-or-less deprecated (Issue 27485). Why not focus on the “urlsplit” and “urlparse” functions first?

Some of the changes seem to go too far, e.g. in the splithost("//hostname/u\nrl") test case, the hostname is fine, but it is not recognized. This would partially conflict the patch in Issue 13359, with proposes to percent-encode newlines after passing through “splithost”. And it would make the URL look like a relative URL, which is a potential security hole and reminds me of the open redirect bug report (Issue 23505).
History
Date User Action Args
2017-07-02 11:44:04martin.pantersetrecipients: + martin.panter, vstinner, Nam.Nguyen, serhiy.storchaka
2017-07-02 11:44:04martin.pantersetmessageid: <1498995844.64.0.369462620102.issue30713@psf.upfronthosting.co.za>
2017-07-02 11:44:04martin.panterlinkissue30713 messages
2017-07-02 11:44:04martin.pantercreate