Message 168345 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ncoghlan
Recipients	loewis, ncoghlan, skrah, teoliphant
Date	2012-08-16.01:04:08
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1345079050.28.0.028981423401.issue15625@psf.upfronthosting.co.za>
In-reply-to

Content
I admit that the main thing that bothers me with the proposal in PEP 3118 is the inconsistency between c -> bytes, while u, w -> str This was less of an issue in 2.x (which was the main frame of reference when the PEP was written), with implicit str/unicode interoperability, but seems quite jarring in the 3.x world. Status quo: struct module: 'c' = individual bytes, 's' = multi-byte sequence array module: 'u' typecode may be either 2 bytes or 4 bytes (Py_UNICODE) (the addition of the 'w' typecode has been reverted) My current inclination is still to apply Victor's patch from #13072 (which changes array to export the appropriate integer typecodes for 'u' arrays) and otherwise punt on this for 3.3 and try to sort out the mess for 3.4. For 3.4, I'm inclined to favour Stefan's proposal of C, U, W mapping to multi-point sequences of UCS-1, UCS-2, UCS-4 code points (with corresponding typecodes in the array module). Support for lowercase 'u' would then never become an official part of the buffer API, existing only as an array typecode.

I admit that the main thing that bothers me with the proposal in PEP 3118 is the inconsistency between c -> bytes, while u, w -> str

This was less of an issue in 2.x (which was the main frame of reference when the PEP was written), with implicit str/unicode interoperability, but seems quite jarring in the 3.x world.

Status quo:
struct module: 'c' = individual bytes, 's' = multi-byte sequence
array module: 'u' typecode may be either 2 bytes or 4 bytes (Py_UNICODE) (the addition of the 'w' typecode has been reverted)

My current inclination is still to apply Victor's patch from #13072 (which changes array to export the appropriate integer typecodes for 'u' arrays) and otherwise punt on this for 3.3 and try to sort out the mess for 3.4.

For 3.4, I'm inclined to favour Stefan's proposal of C, U, W mapping to multi-point sequences of UCS-1, UCS-2, UCS-4 code points (with corresponding typecodes in the array module).

Support for lowercase 'u' would then never become an official part of the buffer API, existing only as an array typecode.

History
Date	User	Action	Args
2012-08-16 01:04:10	ncoghlan	set	recipients: + ncoghlan, loewis, teoliphant, skrah
2012-08-16 01:04:10	ncoghlan	set	messageid: <1345079050.28.0.028981423401.issue15625@psf.upfronthosting.co.za>
2012-08-16 01:04:09	ncoghlan	link	issue15625 messages
2012-08-16 01:04:08	ncoghlan	create