Message 330252 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ncoghlan
Recipients	eric.snow, lukasz.langa, ncoghlan, vstinner
Date	2018-11-22.12:53:44
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1542891224.36.0.788709270274.issue35266@psf.upfronthosting.co.za>
In-reply-to

Content
I didn't know what was possible when I wrote PEP 432 either - instead, I wrote down an initial concept for what I wanted, and then started exploring the code to find out the barriers to achieving that. We know enough now to know that original design concept isn't technically feasible, but that's OK - the general idea was just to get to a point where the startup code is better tested, easier to maintain, and easier to control in an embedding application, and everything outside that is negotiable. The problem with the purely bottom-up approach is that we may end up with something that's better tested and easier to maintain, but find out that it hasn't actually helped us get to a point where we can make the interpreter easier for embedding applications to manage. As far as Unicode goes, it isn't Unicode as a concept that's problematic, it's specifically the CPython Unicode type: that needs hash randomisation configured, and that means we need to have already processed the input settings that can affect the hash seed. And unlike UTF-8 mode, where there's a comparatively limited set of strings to recreate with a different decoding step, there's no escape hatch to let you cleanly recreate all previously created string objects with a different basis for their hash.

I didn't know what was possible when I wrote PEP 432 either - instead, I wrote down an initial concept for what I *wanted*, and then started exploring the code to find out the barriers to achieving that.

We know enough now to know that original design concept isn't technically feasible, but that's OK - the general idea was just to get to a point where the startup code is better tested, easier to maintain, and easier to control in an embedding application, and everything outside that is negotiable.

The problem with the purely bottom-up approach is that we may end up with something that's better tested and easier to maintain, but find out that it hasn't actually helped us get to a point where we can make the interpreter easier for embedding applications to manage.

As far as Unicode goes, it isn't Unicode as a concept that's problematic, it's specifically the CPython Unicode type: that needs hash randomisation configured, and that means we need to have already processed the input settings that can affect the hash seed. And unlike UTF-8 mode, where there's a comparatively limited set of strings to recreate with a different decoding step, there's no escape hatch to let you cleanly recreate all previously created string objects with a different basis for their hash.

History
Date	User	Action	Args
2018-11-22 12:53:44	ncoghlan	set	recipients: + ncoghlan, vstinner, lukasz.langa, eric.snow
2018-11-22 12:53:44	ncoghlan	set	messageid: <1542891224.36.0.788709270274.issue35266@psf.upfronthosting.co.za>
2018-11-22 12:53:44	ncoghlan	link	issue35266 messages
2018-11-22 12:53:44	ncoghlan	create