This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients docs@python, vstinner
Date 2011-02-11.12:55:08
SpamBayes Score 1.4658218e-09
Marked as misclassified No
Message-id <1297428909.69.0.247564466299.issue11186@psf.upfronthosting.co.za>
In-reply-to
Content
If you have an undecodable filenames on UNIX, Python 3 escapes undecodable bytes using surrogates. pydoc: HTMLDoc.index() uses indirectly os.listdir() which does such operation, and later filenames are encoded to UTF-8 (the whole HTML content is encoded to UTF-8).

In practice, you cannot import such .py file, you run them using "python script.py", so we can maybe just ignore modules with undecodable filenames. For example:

def isUndecodableFilename(filename):
  return any((0xD800 <= ord(ch) <= 0xDFFF) for ch in filename)

Or we can escape the surrogate characters, but I don't know how. Write "\uDC80" in a HTML document is not a good idea, especially in an URL (e.g. Firefox replaces \ by / in URLs).
History
Date User Action Args
2011-02-11 12:55:09vstinnersetrecipients: + vstinner, docs@python
2011-02-11 12:55:09vstinnersetmessageid: <1297428909.69.0.247564466299.issue11186@psf.upfronthosting.co.za>
2011-02-11 12:55:09vstinnerlinkissue11186 messages
2011-02-11 12:55:08vstinnercreate