Message121568
On Fri, Nov 19, 2010 at 3:06 PM, STINNER Victor <report@bugs.python.org> wrote:
> .. Whereas PyUnicode_FromFormatV() converts the format string
> (bytes) to unicode (characters). If you would like a comparaison in C, it's
> like printf()+mbstowcs() in the same function.
>
I see. So it is really the
else
*s++ = *f;
that surreptitiously widens the characters.
..
> I choosed to use ASCII instead of UTF-8, because an UTF-8 decoder is long (210
> lines) and complex (see PyUnicode_DecodeUTF8Stateful()), whereas ASCII decode
> is just: "unicode_char = (Py_UNICODE)byte;" + an if before to check that 0 <=
> byte <= 127).
I don't think we need 210 lines to replace "*s++ = *f" with proper
UTF-8 logic. Even if we do, the code can be shared with
PyUnicode_DecodeUTF8 and a UTF-8 iterator may be a welcome addition to
Python C API. |
|
Date |
User |
Action |
Args |
2010-11-19 20:58:25 | belopolsky | set | recipients:
+ belopolsky, amaury.forgeotdarc, vstinner, ezio.melotti |
2010-11-19 20:58:22 | belopolsky | link | issue9769 messages |
2010-11-19 20:58:22 | belopolsky | create | |
|