Author sjmachin
Recipients dangra, ezio.melotti, lemburg, sjmachin
Date 2010-04-01.14:43:21
SpamBayes Score 5.92863e-05
Marked as misclassified No
Message-id <1270133004.19.0.674258327128.issue8271@psf.upfronthosting.co.za>
In-reply-to
Content
Patch review:

Preamble: pardon my ignorance of how the codebase works, but trunk unicodeobject.c is r79494 (and allows encoding of surrogate codepoints), py3k unicodeobject.c is r79506 (and bans the surrogate caper) and I can't find the r79542 that the patch mentions ... help, please!

length 2 case: 
1. the loop can be hand-unrolled into oblivion. It can be entered only when s[1] & 0xC0 != 0x80 (previous if test).
2. the over-long check (if (ch < 0x80)) hasn't been touched. It could be removed and the entries for C0 and C1 in the utf8_code_length array set to 0.

length 3 case:
1. the tests involving s[0] being 0xE0 or 0xED are misplaced.
2. the test s[0] == 0xE0 && s[1] < 0xA0 if not misplaced would be shadowing the over-long test (ch < 0x800). It seems better to use the over-long test (with endinpos set to 1).
3. The test s[0] == 0xED relates to the surrogates caper which in the py3k version is handled in the same place as the over-long test.
4. unrolling loop: needs no loop, only 1 test ... if s[1] is good, then we know s[2] must be bad without testing it, because we start the for loop only when s[1] is bad || s[2] is bad.

length 4 case: as for the len 3 case generally ... misplaced tests, F1 test shadows over-long test, F4 test shadows max value test, too many loop iterations.
History
Date User Action Args
2010-04-01 14:43:24sjmachinsetrecipients: + sjmachin, lemburg, ezio.melotti, dangra
2010-04-01 14:43:24sjmachinsetmessageid: <1270133004.19.0.674258327128.issue8271@psf.upfronthosting.co.za>
2010-04-01 14:43:21sjmachinlinkissue8271 messages
2010-04-01 14:43:21sjmachincreate