This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author sjmachin
Recipients dangra, ezio.melotti, lemburg, sjmachin
Date 2010-04-01.06:08:58
SpamBayes Score 2.1521077e-05
Marked as misclassified No
Message-id <1270102141.43.0.916956714929.issue8271@psf.upfronthosting.co.za>
In-reply-to
Content
@ezio.melotti: Your second sentence is true, but it is not the whole truth. Bytes in the range C0-FF (whose high bit *is* set) ALSO shouldn't be considered part of the sequence because they (like 00-7F) are invalid as continuation bytes; they are either starter bytes (C2-F4) or invalid for any purpose (C0-C2 and F5-FF). Further, some bytes in the range 80-BF are NOT always valid as the first continuation byte, it depends on what starter byte they follow.

The simple way of summarising the above is to say that a byte that is not a valid continuation byte in the current state ("failing byte") is not a part of the current (now known to be invalid) sequence, and the decoder must try again ("resync") with the failing byte.

Do you agree with my example 3?
History
Date User Action Args
2010-04-01 06:09:01sjmachinsetrecipients: + sjmachin, lemburg, ezio.melotti, dangra
2010-04-01 06:09:01sjmachinsetmessageid: <1270102141.43.0.916956714929.issue8271@psf.upfronthosting.co.za>
2010-04-01 06:08:59sjmachinlinkissue8271 messages
2010-04-01 06:08:58sjmachincreate