Title: os.path.isfile doesn't work with some greek characters
Type: behavior Stage: resolved
Components: Unicode, Windows Versions: Python 2.6
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, brian.curtin, loewis, r.david.murray, tim.golden, wyj1046
Priority: normal Keywords:

Created on 2010-12-22 01:51 by wyj1046, last changed 2010-12-22 06:18 by loewis. This issue is now closed.

Messages (7)
msg124473 - (view) Author: Wang Yanjin (wyj1046) Date: 2010-12-22 01:51
There is a file named "µTorrent.lnk" in the folder.

Here is the code:


import os
for i in os.listdir('.'):
    print os.path.isfile(i), '\t', i
a = input()

and the output:

False   aμ汉字.txt
True    uTorrent.lnk
False   μTorrent.lnk
False   μTorrent1.lnk
False   μ汉字.txt
False   μ汉字.txt.lnk
True    αγβδο
True    φχ.txt
True    φχ.txt.lnk

the function just doesn't work with the character "μ"
msg124475 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-22 02:09
I am unable to reproduce this on any python from py3k trunk down to 2.6.6.  Can you provide a complete test program that demonstrates the failure?  (That is, it creates the file and then fails to detect it as a file with isfile.)
msg124476 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-22 02:27
Oh, yes, and it is likely to be important to know what OS you are on.  I tested on linux.
msg124482 - (view) Author: Wang Yanjin (wyj1046) Date: 2010-12-22 03:12
I encoutered this problem on Winxp sp3.

I have retested it on the win7, and it could return the correct value as it did on linux.
msg124483 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-22 03:21
Since the os functions tend to be small wrappers around system functions, this sounds like it is probably a platform issue and not a Python issue.  I'm adding our windows experts as nosy, they can reopen the issue if they disagree.
msg124485 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-22 03:48
Just a random thought (no, I don't know anything about Windows): there are two "mu" characters: GREEK SMALL LETTER MU (μ) and MICRO SIGN (µ).  Normalization turns one into the other:

>>> from unicodedata import *
>>> name(normalize('NFKC', '\N{MICRO SIGN}'))

it is possible that somehow the two characters get confused by OP system.

I could not reproduce the issue on OSX, so I won't reopen it.
msg124488 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-12-22 06:18
On Windows, using the bytes APIs for filenames is unreliable and fails for characters that are not in the ANSI code page. So you should use

import os
for i in os.listdir(u'.'):
    print os.path.isfile(i), '\t', i

Date User Action Args
2010-12-22 06:18:56loewissetnosy: + loewis
messages: + msg124488
2010-12-22 03:48:17belopolskysetnosy: + belopolsky
messages: + msg124485
2010-12-22 03:21:31r.david.murraysetstatus: open -> closed

nosy: + tim.golden, brian.curtin
messages: + msg124483

resolution: not a bug
stage: resolved
2010-12-22 03:12:39wyj1046setmessages: + msg124482
2010-12-22 02:27:06r.david.murraysetmessages: + msg124476
2010-12-22 02:09:20r.david.murraysetnosy: + r.david.murray
messages: + msg124475
2010-12-22 01:51:56wyj1046create