This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author doerwalter
Recipients Rhamphoryncus, doerwalter, ggenellina, gvanrossum, jgsack
Date 2007-11-15.12:57:09
SpamBayes Score 0.36998397
Marked as misclassified No
Message-id <1195131429.89.0.409413961237.issue1328@psf.upfronthosting.co.za>
In-reply-to
Content
jgsack wrote:
>
> If codec utf_8 or utf_8_sig were to accept input with or without the
> 3-byte BOM, and write it as currently specified without/with the BOM
> respectively, then _I_ can reread again with either utf_8 or utf_8_sig.

That's exactly what the utf_8_sig codec does. The decoder accepts input
with or without the BOM (the (first) BOM doesn't get returned). The
encoder always prepends a BOM.

Or do you want a codec that behaves like utf_8 on reading and like
utf_8_sig on writing? Such a codec indead indead wouldn't roundtrip.
History
Date User Action Args
2007-11-15 12:57:10doerwaltersetspambayes_score: 0.369984 -> 0.36998397
recipients: + doerwalter, gvanrossum, jgsack, ggenellina, Rhamphoryncus
2007-11-15 12:57:09doerwaltersetspambayes_score: 0.369984 -> 0.369984
messageid: <1195131429.89.0.409413961237.issue1328@psf.upfronthosting.co.za>
2007-11-15 12:57:09doerwalterlinkissue1328 messages
2007-11-15 12:57:09doerwaltercreate