This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author hdima
Recipients amaury.forgeotdarc, hdima, orsenthil
Date 2008-09-08.13:26:40
SpamBayes Score 0.00021830526
Marked as misclassified No
Message-id <1220880434.4.0.740565186703.issue3714@psf.upfronthosting.co.za>
In-reply-to
Content
Actually RFC-977 said all characters must be in ASCII, but RFC-3977
changed default character set to UTF-8. So I think UTF-8 must be default
encoding, not Latin-1. Moreover Latin-1 can silently hide a real
encoding, for example:

>>> u'\u0422\u0435\u0441\u0442'.encode("koi8-r").decode("latin1")
u'\xf4\xc5\xd3\xd4'

Additionally in the future it would be a good idea to look in the
article headers for article body encoding.
History
Date User Action Args
2008-09-08 13:27:14hdimasetrecipients: + hdima, amaury.forgeotdarc, orsenthil
2008-09-08 13:27:14hdimasetmessageid: <1220880434.4.0.740565186703.issue3714@psf.upfronthosting.co.za>
2008-09-08 13:26:40hdimalinkissue3714 messages
2008-09-08 13:26:40hdimacreate