Message 106154 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	asvetlov
Recipients	Arfrever, asvetlov, pitrou, vstinner
Date	2010-05-20.14:10:26
SpamBayes Score	3.0127818e-09
Marked as misclassified	No
Message-id	<1274364629.87.0.401242238544.issue8611@psf.upfronthosting.co.za>
In-reply-to

Content
After looking in #4352 deep I figured out what true separation of filesystem default encoding and utf8 python namespace is really too complicated. For example import call stack chain converts module name from utf-8 to filesystem in import.c:find_module. After that converted name used by PyImport_ExecCodeModule* as utf-8 name while actually it has filesystem encoding. That problem cannot be solved by "five-line patch" and Martin von Loevis suggested me to stop potentially dangerous big import.c changes in python 3.1 beta. I like importlib way (with maybe C implementation as next step) in terms of "true way" reorganization of python import machinery, but unfortunatelly Cannon has no time for that. From my perspective only big refactoring can solve encoding issues (and we can use excellent io implementation to open utf-8 named files in Windows using native unicode functions). We need to split 'module names' from 'filesystem pathes' clean. Maybe pure python importing is not easy - not sure. But reorganizing of current 'import spaghetti' is required. importlib (and PEP 302) introduced a nice way to do that. I like to be volunteer for this task and I feel enough knowledge to implement and test cover at least windows and linux (MacOs is not big problem also). But I need a mentor (Petrou, Cannon - you are welcome) to make it done, done clear and stable, done in resonable time period.

After looking in #4352 deep I figured out what true separation of filesystem default encoding and utf8 python namespace is really too complicated.
For example import call stack chain converts module name from utf-8 to  filesystem in import.c:find_module. After that converted name used by PyImport_ExecCodeModule* as utf-8 name while actually it has filesystem encoding. That problem cannot be solved by "five-line patch" and Martin von Loevis suggested me to stop potentially dangerous big import.c changes in python 3.1 beta. 
I like importlib way (with maybe C implementation as next step) in terms of "true way" reorganization of python import machinery, but unfortunatelly Cannon has no time for that. From my perspective only big refactoring can solve encoding issues (and we can use excellent io implementation to open utf-8 named files in Windows using native unicode functions). We need to split 'module names' from 'filesystem pathes' clean. 
Maybe pure python importing is not easy - not sure. But reorganizing of current 'import spaghetti' is required. importlib (and PEP 302) introduced a nice way to do that.
I like to be volunteer for this task and I feel enough knowledge to implement and test cover at least windows and linux (MacOs is not big problem also). But I need a mentor (Petrou, Cannon - you are welcome) to make it done, done clear and stable, done in resonable time period.

History
Date	User	Action	Args
2010-05-20 14:10:30	asvetlov	set	recipients: + asvetlov, pitrou, vstinner, Arfrever
2010-05-20 14:10:29	asvetlov	set	messageid: <1274364629.87.0.401242238544.issue8611@psf.upfronthosting.co.za>
2010-05-20 14:10:28	asvetlov	link	issue8611 messages
2010-05-20 14:10:26	asvetlov	create