martin.panter
Recipients Björn.Lindqvist, martin.panter, orsenthil, r.david.murray
2016-07-31.02:37:10
The main backward compatibility consideration would be Issue 754016, but don’t agree with the changes made, and would support reverting them. The original bug reporter wanted urlparse("", "http") to be treated as the URL, but the IP address was being parsed as a scheme, so the default “http” scheme was ignored.

The original fix (r83701) affected any URL that had a digit 0–9 immediately after the “scheme:” prefix. In such URLs, the scheme component was no longer parsed. A test case for “path:80” was added, and a demonstration of not parsing any scheme from was added in the documentation.

Later, the logic was altered to test if the URL looked like an integer (revision 495d12196487, Issue 11467). This restored proper parsing of clsid:85bbd92o-42a0-1o69-a2e4-08002b30309d and, although another URL given, javascript:123, remains misparsed. The documentation was subsequently adjusted in Issue 16932 to just demonstrate being parsed as a path.

The logic was watered down to its current form by revision 9f6b7576c08c, Issue 14072. Now it tests for a non-digit anywhere after the scheme, so that tel:+31641044153 is again parsed properly. But it was pointed out that tel:1234 remains misparsed.

What’s the next step in the watering-down process? All the attempts so far break valid URLs in favour of special-casing inputs that are not valid URLs.
