Message183446
I prefer a little different (simpler for me) form:
for (p = collstart; p < collend;) {
Py_UCS4 ch = *p++;
if ((0xD800 <= ch && ch <= 0xDBFF) &&
(p < collend) &&
(0xDC00 <= *p && *p <= 0xDFFF)) {
ch = ((((ch & 0x03FF) << 10) |
((Py_UCS4)*p++ & 0x03FF)) + 0x10000);
}
str += sprintf(str, "&#%d;", (int)ch);
}
And please look at the loop above ("determine replacement size"). It should be corrected too. It will be simpler to use a buffer with static size (``char buffer[2+29+1+1];``) as in charmap encoder. Perhaps charmap encoder should be fixed too (and common code extracted to separate function).
I doubt about '\ud83d\udc9d' on wide build. Is it right to encode it as b'💝' and not as b'��'? This will be compatible with narrow build but will break compatibility with 3.3+. What is less evil? |
|
Date |
User |
Action |
Args |
2013-03-04 13:20:33 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, lemburg, vstinner, ezio.melotti, wiml |
2013-03-04 13:20:33 | serhiy.storchaka | set | messageid: <1362403233.83.0.893127781964.issue15866@psf.upfronthosting.co.za> |
2013-03-04 13:20:33 | serhiy.storchaka | link | issue15866 messages |
2013-03-04 13:20:33 | serhiy.storchaka | create | |
|