classification
Title: pydoc: HTMLDoc.index() doesn't support PEP 383
Type: Stage:
Components: Documentation, Library (Lib) Versions: Python 3.1, Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, eric.araujo, haypo, lemburg, loewis, python-dev
Priority: normal Keywords:

Created on 2011-02-11 12:55 by haypo, last changed 2011-04-15 17:13 by eric.araujo. This issue is now closed.

Messages (6)
msg128382 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-02-11 12:55
If you have an undecodable filenames on UNIX, Python 3 escapes undecodable bytes using surrogates. pydoc: HTMLDoc.index() uses indirectly os.listdir() which does such operation, and later filenames are encoded to UTF-8 (the whole HTML content is encoded to UTF-8).

In practice, you cannot import such .py file, you run them using "python script.py", so we can maybe just ignore modules with undecodable filenames. For example:

def isUndecodableFilename(filename):
  return any((0xD800 <= ord(ch) <= 0xDFFF) for ch in filename)

Or we can escape the surrogate characters, but I don't know how. Write "\uDC80" in a HTML document is not a good idea, especially in an URL (e.g. Firefox replaces \ by / in URLs).
msg128383 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-02-11 12:56
Oops, my isUndecodableFilename() example is wrong. PEP 383 only uses U+DC80..U+DCFF range:

def isUndecodableFilename(filename):
  return any((0xDC80 <= ord(ch) <= 0xDCFF) for ch in filename)

Example of undecodable filename: b'bla\xe9\xff.py' with UTF-8 filesystem encoding is decoded as 'bla\uDCE9\uDCFF.py'.
msg133604 - (view) Author: Roundup Robot (python-dev) Date: 2011-04-12 21:44
New changeset 506cab8fc329 by Victor Stinner in branch 'default':
Issue #11186: pydoc ignores a module if its name contains a surrogate character
http://hg.python.org/cpython/rev/506cab8fc329
msg133668 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-04-13 15:12
The wording “pydoc ignores a module” is confusing to me: I can’t tell whether it is a description of the bug (“pydoc ignored a module”) or the new, correct behavior (“pydoc now ignores a module”).

Regarding the problem and fix itself, I’m wondering.  If a user unknowingly creates such a module with an unencodable filename, will they understand why pydoc does not display it?
msg133670 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-04-13 15:24
> If a user unknowingly creates such a module with an unencodable
> filename, will they understand why pydoc does not display it?

It is really a bad idea to choose an *undecodable* name for a module. You will not be able to write its name using "import name" syntax.

(It is possible to import such module using __import__, but it is just ugly)

For the changelog, feel free to rephrase it.
msg133851 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-04-15 17:13
> It is really a bad idea to choose an *undecodable* name for a module.
> You will not be able to write its name using "import name" syntax.

Okay, makes sense that pydoc ignores those.  You speak about a user choosing to create such a filename though; is it possible to create such a name without knowing it?

> For the changelog, feel free to rephrase it.

I don’t currently have SSH access, so please do it.
History
Date User Action Args
2011-04-15 17:13:52eric.araujosetmessages: + msg133851
2011-04-13 15:24:05hayposetmessages: + msg133670
2011-04-13 15:12:31eric.araujosetnosy: + eric.araujo, lemburg, loewis
messages: + msg133668
2011-04-12 21:45:14hayposetstatus: open -> closed
resolution: fixed
2011-04-12 21:44:45python-devsetnosy: + python-dev
messages: + msg133604
2011-02-11 12:56:53hayposetnosy: haypo, docs@python
messages: + msg128383
2011-02-11 12:55:09haypocreate