Message 74162 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	a.badger
Recipients	a.badger, loewis, vstinner
Date	2008-10-02.14:32:21
SpamBayes Score	1.5543122e-15
Marked as misclassified	No
Message-id	<1222957943.28.0.508732032829.issue4006@psf.upfronthosting.co.za>
In-reply-to

Content
It's not a feature it's a bug! :-) (I hope you meant to have a smiley too ;-) As stated in the os.listdir() related bug, on Unix filesystems filenames are a sequence of bytes. The system encoding allows the user-level tools to display the filenames as characters instead of byte sequences and allows you to manipulate the filenames using characters instead of byte sequences. But if you change your locale the user level tools will interpret the byte sequences as different characters and allow you free access to create files in a different encoding. So in order to work correctly on Unix you must be able to accept byte sequences in place of filename. The sad fact of the matter is that while we can be all unicode with data and strings inside of python we will always have to be prepared to handle supposed strings as byte sequences when talking to some things outside of ourselves. Sometimes the border has a specification that tells us what encoding to expect and we can do conversion automatically. But when it doesn't we have to be prepared to 1) tell the user that the data exists even but isn't string type as expected and 2) make the byte sequence available to the user. Silently pretending that the data doesn't exist at all is a bug (maybe a minor bug depending on how often we expect the situation to arise but still a bug.)

It's not a feature it's a bug! :-)  (I hope you meant to have a smiley
too ;-)

As stated in the os.listdir() related bug, on Unix filesystems filenames
are a sequence of bytes.  The system encoding allows the user-level
tools to display the filenames as characters instead of byte sequences
and allows you to manipulate the filenames using characters instead of
byte sequences.  But if you change your locale the user level tools will
interpret the byte sequences as different characters and allow you free
access to create files in a different encoding.

So in order to work correctly on Unix you must be able to accept byte
sequences in place of filename.

The sad fact of the matter is that while we can be all unicode with data
and strings inside of python we will always have to be prepared to
handle supposed strings as byte sequences when talking to some things
outside of ourselves.  Sometimes the border has a specification that
tells us what encoding to expect and we can do conversion automatically.
 But when it doesn't we have to be prepared to 1) tell the user that the
data exists even but isn't string type as expected and 2) make the byte
sequence available to the user.

Silently pretending that the data doesn't exist at all is a bug (maybe a
minor bug depending on how often we expect the situation to arise but
still a bug.)

History
Date	User	Action	Args
2008-10-02 14:32:23	a.badger	set	recipients: + a.badger, loewis, vstinner
2008-10-02 14:32:23	a.badger	set	messageid: <1222957943.28.0.508732032829.issue4006@psf.upfronthosting.co.za>
2008-10-02 14:32:22	a.badger	link	issue4006 messages
2008-10-02 14:32:21	a.badger	create