Message251271
Serhiy’s patch essentially uses the local filesystem encoding and then percent encoding, rather than the current behaviour of strict UTF-8 encoding and percent encoding. This is similar to what the “pathlib” make_uri() methods do, so maybe we could let “pathlib” do the work instead.
This draft RFC discusses encoding “file:” URLs:
https://tools.ietf.org/html/draft-ietf-appsawg-file-scheme-03#section-4
It suggests leaving Unicode characters alone (in IRIs) if possible, or using UTF-8 and percent encoding even if the filesystem uses a non-UTF-8 encoding. Perhaps we could leave the filename in the HTML as Unicode characters without percent encoding, and only percent encode the undecodable (surrogate-escaped) bytes.
This “IRI” scheme is also recommended by <http://blogs.msdn.com/b/ie/archive/2006/12/06/file-uris-in-windows.aspx>, which says on Windows, “in file URIs, percent-encoded octets are interpreted as a byte in the user’s current codepage”. This contradicts the draft RFC and the “pathlib” implementation, which both use UTF-8. |
|
Date |
User |
Action |
Args |
2015-09-21 22:31:22 | martin.panter | set | recipients:
+ martin.panter, orsenthil, vstinner, Arfrever, serhiy.storchaka |
2015-09-21 22:31:22 | martin.panter | set | messageid: <1442874682.77.0.0262967864815.issue25184@psf.upfronthosting.co.za> |
2015-09-21 22:31:22 | martin.panter | link | issue25184 messages |
2015-09-21 22:31:22 | martin.panter | create | |
|