Message 72776 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	hdima
Recipients	amaury.forgeotdarc, hdima, orsenthil
Date	2008-09-08.13:26:40
SpamBayes Score	0.00021830526
Marked as misclassified	No
Message-id	<1220880434.4.0.740565186703.issue3714@psf.upfronthosting.co.za>
In-reply-to

Content
Actually RFC-977 said all characters must be in ASCII, but RFC-3977 changed default character set to UTF-8. So I think UTF-8 must be default encoding, not Latin-1. Moreover Latin-1 can silently hide a real encoding, for example: >>> u'\u0422\u0435\u0441\u0442'.encode("koi8-r").decode("latin1") u'\xf4\xc5\xd3\xd4' Additionally in the future it would be a good idea to look in the article headers for article body encoding.

Actually RFC-977 said all characters must be in ASCII, but RFC-3977
changed default character set to UTF-8. So I think UTF-8 must be default
encoding, not Latin-1. Moreover Latin-1 can silently hide a real
encoding, for example:

>>> u'\u0422\u0435\u0441\u0442'.encode("koi8-r").decode("latin1")
u'\xf4\xc5\xd3\xd4'

Additionally in the future it would be a good idea to look in the
article headers for article body encoding.

History
Date	User	Action	Args
2008-09-08 13:27:14	hdima	set	recipients: + hdima, amaury.forgeotdarc, orsenthil
2008-09-08 13:27:14	hdima	set	messageid: <1220880434.4.0.740565186703.issue3714@psf.upfronthosting.co.za>
2008-09-08 13:26:40	hdima	link	issue3714 messages
2008-09-08 13:26:40	hdima	create