Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glob doesn't return unicode with no dir in unicode filename #40668

Closed
leve mannequin opened this issue Aug 1, 2004 · 8 comments
Closed

glob doesn't return unicode with no dir in unicode filename #40668

leve mannequin opened this issue Aug 1, 2004 · 8 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@leve
Copy link
Mannequin

leve mannequin commented Aug 1, 2004

BPO 1001604
Nosy @loewis, @birkenfeld
Files
  • glob.patch: patch 1 return unicode files when no dir is specified
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2008-08-22.00:15:32.301>
    created_at = <Date 2004-08-01.19:20:15.000>
    labels = ['type-bug', 'library']
    title = "glob doesn't return unicode with no dir in unicode filename"
    updated_at = <Date 2008-08-22.00:15:32.301>
    user = 'https://bugs.python.org/leve'

    bugs.python.org fields:

    activity = <Date 2008-08-22.00:15:32.301>
    actor = 'lcantey'
    assignee = 'none'
    closed = True
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2004-08-01.19:20:15.000>
    creator = 'leve'
    dependencies = []
    files = ['6129']
    hgrepos = []
    issue_num = 1001604
    keywords = ['patch']
    message_count = 8.0
    messages = ['46495', '46496', '46497', '46498', '46499', '46500', '46501', '71706']
    nosy_count = 6.0
    nosy_names = ['loewis', 'nnorwitz', 'georg.brandl', 'nyamatongwe', 'leve', 'lcantey']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1001604'
    versions = ['Python 2.5']

    @leve
    Copy link
    Mannequin Author

    leve mannequin commented Aug 1, 2004

    #Here is the script
    #Python 2.3 on W2K
     
    import glob
     
    name = glob.glob(u"./*.mp3")[0]
    print type(name)
    name = glob.glob(u"*.mp3")[0]
    print type(name)
     
    ##OUTPUT##
    #<type 'unicode'>
    #<type 'str'>
     
    #The second line should be <type 'unicode'> too.

    @leve leve mannequin closed this as completed Aug 1, 2004
    @leve leve mannequin closed this as completed Aug 1, 2004
    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Aug 1, 2004

    Logged In: YES
    user_id=33168

    The attached patch fixes the problem and all tests pass on
    Linux. But I'm not really sure if this should be fixed or
    not. Perhaps someone more familiar with unicode and
    filenames (like Martin or Marc-Andre?) could provide
    feedback. I don't know if this could create any problems on
    Windows.

    Changing to a patch.

    @nyamatongwe
    Copy link
    Mannequin

    nyamatongwe mannequin commented Aug 2, 2004

    Logged In: YES
    user_id=12579

    I wrote a slightly different patch that converts the
    os.curdir to Unicode inside glob but the patch here is just
    as good. The nnorwitz patch works well for me on Windows
    with Unicode file names:

    >>> glob.glob("*")
    ['a.bat', 'abc', 'ascii', 'b.bat', 'fileobject.c',
    'fileobject.c.diff', 'Gr\xfc\xdf-Gott', 'pep-0277.txt',
    'posixmodule.c', 'posixmodule.c.diff', 'uni.py',
    'winunichanges.zip', 'Ge??-sa?', '????????????', '??????',
    '???', '????G\xdf', '???']
    
    >>> glob.glob(u"*")
    [u'a.bat', u'abc', u'ascii', u'b.bat', u'fileobject.c',
    u'fileobject.c.diff', u'Gr\xfc\xdf-Gott', u'pep-0277.txt',
    u'posixmodule.c', u'posixmodule.c.diff', u'uni.py',
    u'winunichanges.zip',
    u'\u0393\u03b5\u03b9\u03ac-\u03c3\u03b1\u03c2',
    u'\u0417\u0434\u0440\u0430\u0432\u0441\u0442\u0432\u0443\u0439\u0442\u0435',
    u'\u05d4\u05e9\u05e7\u05e6\u05e5\u05e1',
    u'\u306b\u307d\u3093',
    u'\u66e8\u05e9\u3093\u0434\u0393\xdf', u'\u66e8\u66e9\u66eb']

    Here is my patch if you are interested:

    --- glob.py     Wed Jun 06 06:24:38 2001
    +++ g:\Python23\Lib\glob.py     Sun Aug 01 23:50:43 2004
    @@ -19,7 +19,10 @@
                 return []
         dirname, basename = os.path.split(pathname)
         if not dirname:
    -        return glob1(os.curdir, basename)
    +        # Use the current directory but match the argument
    +        # string form, either unicode or string.
    +        dirname = type(dirname)(os.curdir)
    +        return glob1(dirname, basename)
         elif has_magic(dirname):
             list = glob(dirname)
         else:
    @@ -40,7 +43,7 @@
         return result
    
     def glob1(dirname, pattern):
    -    if not dirname: dirname = os.curdir
    +    assert dirname
         try:
             names = os.listdir(dirname)
         except os.error:

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 12, 2004

    Logged In: YES
    user_id=21627

    The patch in itself is fine - it is policy that file name
    functions return Unicode strings for unicode arguments.

    The test is flawed, though: there is no guarantee that the
    string and the Unicode result compare equal (see
    nyamatongwe's message) - there isn't even a guarantee that
    they compare without raising an exception.

    Also, there is currently no guarantee that
    unicode-in-unicode-out works on all platforms. For example,
    on Windows 95, this hasn't been implemented (and neither on
    OS/2). So I would:

    a) drop the test checking for equality
    b) consider returns-not-unicode a failure only if os.listdir
    does return unicode for unicode arguments.

    @birkenfeld
    Copy link
    Member

    Shouldn't the conversion to Unicode use the filesystem encoding?

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 7, 2007

    gbrandl: you are right, it should.

    @birkenfeld
    Copy link
    Member

    Committed corrected patch as rev. 54197, 54198 (2.5).

    @lcantey
    Copy link
    Mannequin

    lcantey mannequin commented Aug 22, 2008

    2.5.1 (r251:54863, Jul 10 2008, 17:24:48)

    Fails for me with 2.5.1 on Linux, OS X, and Windows.

    >>> glob.glob("*")
    ['t.txt', 't\xd0\xb4.txt', 't\xe2\xbd\x94.txt']
    >>> glob.glob(u"*")
    ['t.txt', 't\xd0\xb4.txt', 't\xe2\xbd\x94.txt']
    >>> glob.glob(u"./*")
    [u'./t.txt', u'./t\u0434.txt', u'./t\u2f54.txt']

    @lcantey lcantey mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Aug 22, 2008
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant