Author jj
Recipients ezio.melotti, jj, loewis, vstinner, zach.ware
Date 2014-07-31.20:03:00
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1406836980.31.0.548665855711.issue22108@psf.upfronthosting.co.za>
In-reply-to
Content
Martin, i think the most intuitive and easiest way for working with strings in C are just char arrays.

Starting with the main() argv being char*, probably most programmers just go with char* and all the encoding just works.
This is because contact with encoding is only needed for the user input software (xorg, keyboard input) and user output (-> your terminal emulator, the gui, ...).
No matter what stuff your program receives, the encoding only matters for the actual output display software to select the correct visual representation.
Requiring a conversion to wide chars just increases the interface complexity and adds really unneeded data transformations that are completely obsolete with UTF-8.

What I'd really like to see in CPython is that the internal storage (and the way it's exposed in the C-API) is just raw bytes (=> char*).

This allows super-easy integration in C projects that probably all just use char as their string type (see the doc example mentioned earlier).

PEP 393 states: "(..) the specification chooses UTF-8 as the recommended way of exposing strings to C code."

And for that, I think using char instead of wchar_t is a better solution for interface developers.
History
Date User Action Args
2014-07-31 20:03:00jjsetrecipients: + jj, loewis, vstinner, ezio.melotti, zach.ware
2014-07-31 20:03:00jjsetmessageid: <1406836980.31.0.548665855711.issue22108@psf.upfronthosting.co.za>
2014-07-31 20:03:00jjlinkissue22108 messages
2014-07-31 20:03:00jjcreate