Message 209453 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	r.david.murray
Recipients	m123orning, r.david.murray
Date	2014-01-27.17:27:02
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1390843622.9.0.404935420646.issue20409@psf.upfronthosting.co.za>
In-reply-to

Content
The file use different encodings. In the first case, the first two bytes (which don't appear in the second example) I believe are the BOM. I'm not an expert, but I believe it is a utf-16 file (thus all the \x00 bytes). The second file is presumably utf-8, with no BOM. Notepad++ handles both automatically. For Python, you have to tell it to look for the BOM by specifying the appropriate codec in the open call. This is because Python's philosophy is to not guess at the encoding of files (though it does have a default encoding, usually utf-8). Questions like this are better directed to the python-list mailing list, by the way.

The file use different encodings.  In the first case, the first two bytes (which don't appear in the second example) I believe are the BOM.  I'm not an expert, but I believe it is a utf-16 file (thus all the \x00 bytes).  The second file is presumably utf-8, with no BOM.  Notepad++ handles both automatically.  For Python, you have to tell it to look for the BOM by specifying the appropriate codec in the open call.  This is because Python's philosophy is to not guess at the encoding of files (though it does have a default encoding, usually utf-8).

Questions like this are better directed to the python-list mailing list, by the way.

History
Date	User	Action	Args
2014-01-27 17:27:02	r.david.murray	set	recipients: + r.david.murray, m123orning
2014-01-27 17:27:02	r.david.murray	set	messageid: <1390843622.9.0.404935420646.issue20409@psf.upfronthosting.co.za>
2014-01-27 17:27:02	r.david.murray	link	issue20409 messages
2014-01-27 17:27:02	r.david.murray	create