This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author martin.panter
Recipients Arfrever, martin.panter, orsenthil, serhiy.storchaka, vstinner
Date 2015-09-21.22:31:22
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1442874682.77.0.0262967864815.issue25184@psf.upfronthosting.co.za>
In-reply-to
Content
Serhiy’s patch essentially uses the local filesystem encoding and then percent encoding, rather than the current behaviour of strict UTF-8 encoding and percent encoding. This is similar to what the “pathlib” make_uri() methods do, so maybe we could let “pathlib” do the work instead.

This draft RFC discusses encoding “file:” URLs:

https://tools.ietf.org/html/draft-ietf-appsawg-file-scheme-03#section-4

It suggests leaving Unicode characters alone (in IRIs) if possible, or using UTF-8 and percent encoding even if the filesystem uses a non-UTF-8 encoding. Perhaps we could leave the filename in the HTML as Unicode characters without percent encoding, and only percent encode the undecodable (surrogate-escaped) bytes.

This “IRI” scheme is also recommended by <http://blogs.msdn.com/b/ie/archive/2006/12/06/file-uris-in-windows.aspx>, which says on Windows, “in file URIs, percent-encoded octets are interpreted as a byte in the user’s current codepage”. This contradicts the draft RFC and the “pathlib” implementation, which both use UTF-8.
History
Date User Action Args
2015-09-21 22:31:22martin.pantersetrecipients: + martin.panter, orsenthil, vstinner, Arfrever, serhiy.storchaka
2015-09-21 22:31:22martin.pantersetmessageid: <1442874682.77.0.0262967864815.issue25184@psf.upfronthosting.co.za>
2015-09-21 22:31:22martin.panterlinkissue25184 messages
2015-09-21 22:31:22martin.pantercreate