This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dlitz
Recipients dlitz
Date 2008-08-26.17:05:58
SpamBayes Score 1.1317456e-08
Marked as misclassified No
Message-id <>
On Linux/ext3, filenames are stored natively as sequences of octets.  On
Win32/NTFS, they are stored natively as sequences of Unicode code points.

In Python 2.x, the way to unambiguously open a particular file was to
pass the filename as a str object on Linux/ext3 and as a unicode object
on Win32/NTFS.  os.listdir(".") would return every filename as a str
object, and os.listdir(u".") would return every filename as a unicode
object---based on the current locale settings---*except* for filenames
that couldn't be decoded that way.

Consider this bash script (executed on Linux under a UTF-8 locale):

  export LC_CTYPE=en_CA.UTF-8   # requires the en_CA.UTF-8 locale to be
  mkdir /tmp/foo
  cd /tmp/foo
  touch $'UTF-8 compatible filename\xc2\xa2'
  touch $'UTF-8 incompatible filename\xc0'

Under Python 2.52, you get this:
  >>> import os
  >>> os.listdir(u".")
  ['UTF-8 incompatible filename\xc0', u'UTF-8 compatible filename\xa2']
  >>> os.listdir(".")
  ['UTF-8 incompatible filename\xc0', 'UTF-8 compatible filename\xc2\xa2']
  >>> [open(f, "r") for f in os.listdir(u".")]
  [<open file 'UTF-8 incompatible filename�, mode 'r' at 0xb7cee578>,
<open file 'UTF-8 compatible filename¢', mode 'r' at 0xb7cee6e0>]

Under Python 3.0b3, you get this:
  >>> import os
  >>> os.listdir(".")
  [b'UTF-8 incompatible filename\xc0', 'UTF-8 compatible filename¢']
  >>> os.listdir(b".")
  [b'UTF-8 incompatible filename\xc0', b'UTF-8 compatible filename\xc2\xa2']
  >>> [open(f, "r") for f in os.listdir(".")]
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 1, in <listcomp>
    File "/home/dwon/python3.0b3/lib/python3.0/", line 284, in __new__
      return open(*args, **kwargs)
    File "/home/dwon/python3.0b3/lib/python3.0/", line 184, in open
      raise TypeError("invalid file: %r" % file)
  TypeError: invalid file: b'UTF-8 incompatible filename\xc0'

This behaviour of open() makes it impossible to write code that opens
arbitrarily-named files on Linux/ext3.
Date User Action Args
2008-08-26 17:06:02dlitzsetrecipients: + dlitz
2008-08-26 17:06:02dlitzsetmessageid: <>
2008-08-26 17:06:01dlitzlinkissue3688 messages
2008-08-26 17:05:59dlitzcreate