Message 128382 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	docs@python, vstinner
Date	2011-02-11.12:55:08
SpamBayes Score	1.4658218e-09
Marked as misclassified	No
Message-id	<1297428909.69.0.247564466299.issue11186@psf.upfronthosting.co.za>
In-reply-to

Content
If you have an undecodable filenames on UNIX, Python 3 escapes undecodable bytes using surrogates. pydoc: HTMLDoc.index() uses indirectly os.listdir() which does such operation, and later filenames are encoded to UTF-8 (the whole HTML content is encoded to UTF-8). In practice, you cannot import such .py file, you run them using "python script.py", so we can maybe just ignore modules with undecodable filenames. For example: def isUndecodableFilename(filename): return any((0xD800 <= ord(ch) <= 0xDFFF) for ch in filename) Or we can escape the surrogate characters, but I don't know how. Write "\uDC80" in a HTML document is not a good idea, especially in an URL (e.g. Firefox replaces \ by / in URLs).

If you have an undecodable filenames on UNIX, Python 3 escapes undecodable bytes using surrogates. pydoc: HTMLDoc.index() uses indirectly os.listdir() which does such operation, and later filenames are encoded to UTF-8 (the whole HTML content is encoded to UTF-8).

In practice, you cannot import such .py file, you run them using "python script.py", so we can maybe just ignore modules with undecodable filenames. For example:

def isUndecodableFilename(filename):
  return any((0xD800 <= ord(ch) <= 0xDFFF) for ch in filename)

Or we can escape the surrogate characters, but I don't know how. Write "\uDC80" in a HTML document is not a good idea, especially in an URL (e.g. Firefox replaces \ by / in URLs).

History
Date	User	Action	Args
2011-02-11 12:55:09	vstinner	set	recipients: + vstinner, docs@python
2011-02-11 12:55:09	vstinner	set	messageid: <1297428909.69.0.247564466299.issue11186@psf.upfronthosting.co.za>
2011-02-11 12:55:09	vstinner	link	issue11186 messages
2011-02-11 12:55:08	vstinner	create