Author sjmachin
Recipients sjmachin
Date 2010-03-31.02:28:10
SpamBayes Score 4.65013e-11
Marked as misclassified No
Message-id <1270002492.52.0.790856673013.issue8271@psf.upfronthosting.co.za>
In-reply-to
Content
Unicode 5.2.0 chapter 3 (Conformance) has a new section (headed "Constraints on Conversion Processes) after requirement D93. Recent Pythons e.g. 3.1.2 don't comply. Using the Unicode example:

 >>> print(ascii(b"\xc2\x41\x42".decode('utf8', 'replace')))
 '\ufffdB'
 # should produce u'\ufffdAB'

Resynchronisation currently starts at a position derived by considering the length implied by the start byte:

 >>> print(ascii(b"\xf1ABCD".decode('utf8', 'replace')))
 '\ufffdD'
 # should produce u'\ufffdABCD'; resync should start from the *failing* byte.

Notes: This applies to the 'ignore' option as well as the 'replace' option. The Unicode discussion mentions "security exploits".
History
Date User Action Args
2010-03-31 02:28:12sjmachinsetrecipients: + sjmachin
2010-03-31 02:28:12sjmachinsetmessageid: <1270002492.52.0.790856673013.issue8271@psf.upfronthosting.co.za>
2010-03-31 02:28:10sjmachinlinkissue8271 messages
2010-03-31 02:28:10sjmachincreate