Author Atle.Pedersen
Recipients Atle.Pedersen, ezio.melotti
Date 2012-01-05.20:12:51
SpamBayes Score 1.55431e-15
Marked as misclassified No
Message-id <1325794375.32.0.250961574863.issue13717@psf.upfronthosting.co.za>
In-reply-to
Content
I've made a short program to traverse file tree and print file names.

for root, dirs, files in os.walk(path):
        for f in files:
                hex = ' '.join(["%02X"%ord(x) for x in f])
                print('file is',hex,f)

This fails with the following file:

file is 67 72 DCE5 6B 61 6C 6C 65 6E 2E 6A 70 67 2E 68 74 6D 6C Traceback (most recent call last):
  File "/home/atle/bin/findpictures.py", line 16, in <module>
    print('file is',hexa,f)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce5' in position 2: surrogates not allowed

I don't really understand the issue, but this works with Python 2, and fails using 3.1.4 (gentoo: dev-lang/python-3.1.4-r3)

Same code using Python 2.7.2 gives:
('file is', '67 72 E5 6B 61 6C 6C 65 6E 2E 6A 70 67 2E 68 74 6D 6C', 'gr\xe5kallen.jpg.html')
History
Date User Action Args
2012-01-05 20:12:55Atle.Pedersensetrecipients: + Atle.Pedersen, ezio.melotti
2012-01-05 20:12:55Atle.Pedersensetmessageid: <1325794375.32.0.250961574863.issue13717@psf.upfronthosting.co.za>
2012-01-05 20:12:52Atle.Pedersenlinkissue13717 messages
2012-01-05 20:12:51Atle.Pedersencreate