Message 64189 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	doerwalter
Recipients	Rhamphoryncus, doerwalter, ggenellina, jafo, jgsack
Date	2008-03-20.18:16:11
SpamBayes Score	0.16362882
Marked as misclassified	No
Message-id	<1206036973.18.0.778841029058.issue1328@psf.upfronthosting.co.za>
In-reply-to

Content
I don't see exactly what James is proposing. > For my needs, I would like the decoding parts of the utf_8 module > to treat an initial BOM as an optional signature and skip it if > there is one (just like the utf_8_sig decoder). In fact I have > a working patch that replaces the utf_8_sig decode, > IncrementalDecoder and StreamReader components by direct > transplants from utf_8_sig (as recently repaired -- there was a > SteamReader error). I've you want a decoder that behave like the utf-8-sig decoder, use the utf-8-sig decoder. I don't see how changing the utf-8 decoder helps here. > I can imagine there might be utf_8 client code out there which > expects to see a leading U+feff as (perhaps) a clue that the > output should be returned with a BOM-signature (say) to > accomodate the guessed input requirements of the remote > correspondant. In this case use UTF-8: The leading BOM will be passed to the application. > I can just live with code like > if input[0] == u"\ufeff": > input=input[1:} > spread around, and of course slightly different for incremental > and stream inputs. Can you post an example that requires this code?

I don't see exactly what James is proposing.

> For my needs, I would like the decoding parts of the utf_8 module
> to treat an initial BOM as an optional signature and skip it if
> there is one (just like the utf_8_sig decoder). In fact I have
> a working patch that replaces the utf_8_sig  decode,
> IncrementalDecoder and StreamReader components by direct
> transplants from utf_8_sig (as recently repaired -- there was a
> SteamReader error).

I've you want a decoder that behave like the utf-8-sig decoder, use the
utf-8-sig decoder. I don't see how changing the utf-8 decoder helps here.

> I can imagine there might be utf_8 client code out there which
> expects to see a leading U+feff as (perhaps) a clue that the
> output should be returned with a BOM-signature (say) to
> accomodate the guessed input requirements of the remote
> correspondant.

In this case use UTF-8: The leading BOM will be passed to the application.

> I can just live with code like
>  if input[0] == u"\ufeff": 
>    input=input[1:}
> spread around, and of course slightly different for incremental
> and stream inputs.

Can you post an example that requires this code?

History
Date	User	Action	Args
2008-03-20 18:16:13	doerwalter	set	spambayes_score: 0.163629 -> 0.16362882 recipients: + doerwalter, jafo, jgsack, ggenellina, Rhamphoryncus
2008-03-20 18:16:13	doerwalter	set	spambayes_score: 0.163629 -> 0.163629 messageid: <1206036973.18.0.778841029058.issue1328@psf.upfronthosting.co.za>
2008-03-20 18:16:12	doerwalter	link	issue1328 messages
2008-03-20 18:16:11	doerwalter	create