Message 224444 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jj
Recipients	ezio.melotti, jj, loewis, vstinner, zach.ware
Date	2014-07-31.20:03:00
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1406836980.31.0.548665855711.issue22108@psf.upfronthosting.co.za>
In-reply-to

Content
Martin, i think the most intuitive and easiest way for working with strings in C are just char arrays. Starting with the main() argv being char, probably most programmers just go with char and all the encoding just works. This is because contact with encoding is only needed for the user input software (xorg, keyboard input) and user output (-> your terminal emulator, the gui, ...). No matter what stuff your program receives, the encoding only matters for the actual output display software to select the correct visual representation. Requiring a conversion to wide chars just increases the interface complexity and adds really unneeded data transformations that are completely obsolete with UTF-8. What I'd really like to see in CPython is that the internal storage (and the way it's exposed in the C-API) is just raw bytes (=> char*). This allows super-easy integration in C projects that probably all just use char as their string type (see the doc example mentioned earlier). PEP 393 states: "(..) the specification chooses UTF-8 as the recommended way of exposing strings to C code." And for that, I think using char instead of wchar_t is a better solution for interface developers.

Martin, i think the most intuitive and easiest way for working with strings in C are just char arrays.

Starting with the main() argv being char*, probably most programmers just go with char* and all the encoding just works.
This is because contact with encoding is only needed for the user input software (xorg, keyboard input) and user output (-> your terminal emulator, the gui, ...).
No matter what stuff your program receives, the encoding only matters for the actual output display software to select the correct visual representation.
Requiring a conversion to wide chars just increases the interface complexity and adds really unneeded data transformations that are completely obsolete with UTF-8.

What I'd really like to see in CPython is that the internal storage (and the way it's exposed in the C-API) is just raw bytes (=> char*).

This allows super-easy integration in C projects that probably all just use char as their string type (see the doc example mentioned earlier).

PEP 393 states: "(..) the specification chooses UTF-8 as the recommended way of exposing strings to C code."

And for that, I think using char instead of wchar_t is a better solution for interface developers.

History
Date	User	Action	Args
2014-07-31 20:03:00	jj	set	recipients: + jj, loewis, vstinner, ezio.melotti, zach.ware
2014-07-31 20:03:00	jj	set	messageid: <1406836980.31.0.548665855711.issue22108@psf.upfronthosting.co.za>
2014-07-31 20:03:00	jj	link	issue22108 messages
2014-07-31 20:03:00	jj	create