classification
Title: glob doesn't return unicode with no dir in unicode filename
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: georg.brandl, lcantey, leve, loewis, nnorwitz, nyamatongwe
Priority: normal Keywords: patch

Created on 2004-08-01 19:20 by leve, last changed 2008-08-22 00:15 by lcantey. This issue is now closed.

Files
File name Uploaded Description Edit
glob.patch nnorwitz, 2004-08-01 22:31 patch 1 return unicode files when no dir is specified
Messages (8)
msg46495 - (view) Author: leve (leve) Date: 2004-08-01 19:20
#Here is the script
#Python 2.3 on W2K
 
import glob
 
name = glob.glob(u"./*.mp3")[0]
print type(name)
name = glob.glob(u"*.mp3")[0]
print type(name)
 
##OUTPUT##
#<type 'unicode'>
#<type 'str'>
 
#The second line should be <type 'unicode'> too.

msg46496 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2004-08-01 22:31
Logged In: YES 
user_id=33168

The attached patch fixes the problem and all tests pass on
Linux.  But I'm not really sure if this should be fixed or
not.  Perhaps someone more familiar with unicode and
filenames (like Martin or Marc-Andre?) could provide
feedback.  I don't know if this could create any problems on
Windows.

Changing to a patch.
msg46497 - (view) Author: Neil Hodgson (nyamatongwe) Date: 2004-08-02 00:09
Logged In: YES 
user_id=12579

I wrote a slightly different patch that converts the
os.curdir to Unicode inside glob but the patch here is just
as good. The nnorwitz patch works well for me on Windows
with Unicode file names:

>>> glob.glob("*")
['a.bat', 'abc', 'ascii', 'b.bat', 'fileobject.c',
'fileobject.c.diff', 'Gr\xfc\xdf-Gott', 'pep-0277.txt',
'posixmodule.c', 'posixmodule.c.diff', 'uni.py',
'winunichanges.zip', 'Ge??-sa?', '????????????', '??????',
'???', '????G\xdf', '???']

>>> glob.glob(u"*")
[u'a.bat', u'abc', u'ascii', u'b.bat', u'fileobject.c',
u'fileobject.c.diff', u'Gr\xfc\xdf-Gott', u'pep-0277.txt',
u'posixmodule.c', u'posixmodule.c.diff', u'uni.py',
u'winunichanges.zip',
u'\u0393\u03b5\u03b9\u03ac-\u03c3\u03b1\u03c2',
u'\u0417\u0434\u0440\u0430\u0432\u0441\u0442\u0432\u0443\u0439\u0442\u0435',
u'\u05d4\u05e9\u05e7\u05e6\u05e5\u05e1',
u'\u306b\u307d\u3093',
u'\u66e8\u05e9\u3093\u0434\u0393\xdf', u'\u66e8\u66e9\u66eb']

 Here is my patch if you are interested:

--- glob.py     Wed Jun 06 06:24:38 2001
+++ g:\Python23\Lib\glob.py     Sun Aug 01 23:50:43 2004
@@ -19,7 +19,10 @@
             return []
     dirname, basename = os.path.split(pathname)
     if not dirname:
-        return glob1(os.curdir, basename)
+        # Use the current directory but match the argument
+        # string form, either unicode or string.
+        dirname = type(dirname)(os.curdir)
+        return glob1(dirname, basename)
     elif has_magic(dirname):
         list = glob(dirname)
     else:
@@ -40,7 +43,7 @@
     return result

 def glob1(dirname, pattern):
-    if not dirname: dirname = os.curdir
+    assert dirname
     try:
         names = os.listdir(dirname)
     except os.error:
msg46498 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-08-12 14:58
Logged In: YES 
user_id=21627

The patch in itself is fine - it is policy that file name
functions return Unicode strings for unicode arguments.

The test is flawed, though: there is no guarantee that the
string and the Unicode result compare equal (see
nyamatongwe's message) - there isn't even a guarantee that
they compare without raising an exception.

Also, there is currently no guarantee that
unicode-in-unicode-out works on all platforms. For example,
on Windows 95, this hasn't been implemented (and neither on
OS/2). So I would:

a) drop the test checking for equality
b) consider returns-not-unicode a failure only if os.listdir
does return unicode for unicode arguments.
msg46499 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-03-07 01:19
Shouldn't the conversion to Unicode use the filesystem encoding?
msg46500 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-03-07 08:10
gbrandl: you are right, it should.
msg46501 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-03-07 08:32
Committed corrected patch as rev. 54197, 54198 (2.5).
msg71706 - (view) Author: Lee Cantey (lcantey) Date: 2008-08-22 00:15
2.5.1 (r251:54863, Jul 10 2008, 17:24:48)

Fails for me with 2.5.1 on Linux, OS X, and Windows.

>>> glob.glob("*")
['t.txt', 't\xd0\xb4.txt', 't\xe2\xbd\x94.txt']
>>> glob.glob(u"*")
['t.txt', 't\xd0\xb4.txt', 't\xe2\xbd\x94.txt']
>>> glob.glob(u"./*")
[u'./t.txt', u'./t\u0434.txt', u'./t\u2f54.txt']
History
Date User Action Args
2008-08-22 00:15:32lcanteysetnosy: + lcantey
type: behavior
messages: + msg71706
components: + Library (Lib), - None
versions: + Python 2.5
2004-08-01 19:20:15levecreate