This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author sjmachin
Recipients dangra, ezio.melotti, lemburg, sjmachin
Date 2010-04-01.03:19:31
SpamBayes Score 3.2230457e-06
Marked as misclassified No
Message-id <1270091973.22.0.435495612508.issue8271@psf.upfronthosting.co.za>
In-reply-to
Content
@lemburg: "failing byte" seems rather obvious: first byte that you meet that is not valid in the current state. I don't understand your explanation, especially "does not have the high bit set". I think you mean "is a valid starter byte". See example 3 below.

Example 1: F1 80 41 42 43. F1 implies a 4-byte character. 80 is OK. 41 is not in 80-BF. It is the "failing byte"; high bit not set. Required action is to emit FFFD then resync on the 41, causing 0041 0042 0043 to be emitted. Total output: FFFD 0041 0042 0043. Current code emits FFFD 0043.

Example 2: F1 80 FF 42 43. F1 implies a 4-byte character. 80 is OK. FF is not in 80-BF. It is the "failing byte". Required action is to emit FFFD then resync on the FF. FF is not a valid starter byte, so emit FFFD, and resync on the 42, causing 0042 0043 to be emitted. Total output: FFFD FFFD 0042 0043. Current code emits FFFD 0043.

Example 3: F1 80 C2 81 43. F1 implies a 4-byte character. 80 is OK. C2 is not in 80-BF. It is the "failing byte". Required action is to emit FFFD then resync on the C2. C2 and 81 have the high bit set, but C2 is a valid starter byte, and remaining bytes are OK, causing 0081 0043 to be emitted. Total output: FFFD 0081 0043. Current code emits FFFD 0043.
History
Date User Action Args
2010-04-01 03:19:33sjmachinsetrecipients: + sjmachin, lemburg, ezio.melotti, dangra
2010-04-01 03:19:33sjmachinsetmessageid: <1270091973.22.0.435495612508.issue8271@psf.upfronthosting.co.za>
2010-04-01 03:19:32sjmachinlinkissue8271 messages
2010-04-01 03:19:31sjmachincreate