Author lemburg
Recipients dangra, ezio.melotti, lemburg, sjmachin
Date 2010-03-31.18:07:43
I guess the term "failing" byte somewhat underdefined.

Page 95 of the standard PDF ( suggests to "Replace each maximal subpart of an ill-formed subsequence by a single U+FFFD".

Fortunately, they explain what they are after: if a subsequent byte in the sequence does not have the high bit set, it's not to be considered part of the UTF-8 sequence of the code point.

Implementing that should be fairly straight-forward by adjusting the endinpos variable accordingly.

Any takers ?
