classification
Title: zipimport to import from non-ascii pathname on Windows
Type: behavior Stage: resolved
Components: Unicode, Windows Versions: Python 3.6, Python 3.5, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amswap, eric.snow, ezio.melotti, serhiy.storchaka, steve.dower, superluser, tim.golden, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2015-01-26 22:43 by amswap, last changed 2018-09-19 06:55 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
errorlog.txt amswap, 2015-01-26 22:43 Error log
zipimport_test.py amswap, 2015-01-26 22:45 Test script to reproduce the issue.
zipimport_fix.patch amswap, 2015-01-26 22:49 Possible fix review
zipimport_fix_withtest.patch amswap, 2015-01-27 19:08 Possible fix with unit test review
Messages (9)
msg234786 - (view) Author: Swapneel Ambre (amswap) * Date: 2015-01-26 22:43
On Windows, using zipimport module APIs like get_filename on a file with non-ascii characters in the full path fails with 

UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character ( Full output attached in errorlog.txt ).

The issue is that Modules/zipimport.c has a function compile_source which tries to run PyUnicode_EncodeFSDefault on the pathname. On Windows, the default encoding is 'mbcs' which cannot handle unicode characters.

This has already been fixed in the import machinery on python 3 ( see issue http://bugs.python.org/issue13758, http://bugs.python.org/issue11619). The solution is to pass the pathname as Unicode directly to the compiler.
msg234787 - (view) Author: Swapneel Ambre (amswap) * Date: 2015-01-26 22:45
I am attaching the test script I have used to reproduce the issue.
msg234789 - (view) Author: Swapneel Ambre (amswap) * Date: 2015-01-26 22:49
I have tried to fix this by calling Py_CompileStringObject instead of Py_CompileString , thus avoiding the need to Encode the pathname. Please see zipimport_fix.patch for the possible fix.
msg234790 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-01-26 22:52
> Please see zipimport_fix.patch for the possible fix.

The solution looks good. Can you please try to convert  zipimport_test.py to a patch for test_zipimport.py and combine it with   zipimport_fix.patch  to create a complete patch?

You should also sign the contributor agreement:
https://www.python.org/psf/contrib/contrib-form/
msg234792 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-01-26 22:55
I don't understand the issue: does it only concern the name of the ZIP file? Or also paths inside the ZIP?

In both cases, the workaround is to use only ASCII names.

I spent a lot of times on supporting any Unicode name, everyone in Python. I didn't expect that people have so different and crazy use cases :-)
msg234794 - (view) Author: Swapneel Ambre (amswap) * Date: 2015-01-26 23:12
Sorry I was not very clear about the use case. 

The name of the zipfile or any parent directory name could contain non-ascii characters. Consider a use case where you want to ship some product with third party module shipped as an egg file (say example.egg) along with your product. You don't have control over where the product files gets installed. Someone could install the product files under say C:\的\product_name. So both your product (exe or python files) and the egg files are installed under a path with non-ascii characters in it. Any import statements trying to import modules from the egg file will fail with UnicodeEncodeError as zipimport will try to use PyUnicode_EncodeFSDefault with 'mbcs' encoding on Windows. 

I hope the use case is clearer now. I do agree that it is a corner case scenario and using ASCII names is a better option :-)

I will create a complete patch and sign contributor agreement.
msg234798 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-01-26 23:40
> I do agree that it is a corner case scenario and using ASCII names is a better option :-)

Since the patch is short, I see no problem to fix this issue.
msg234840 - (view) Author: Swapneel Ambre (amswap) * Date: 2015-01-27 19:08
Attaching a combined patch. I updated testUnencodable testcase from test_zipimport.py. Verified that without my fix, the testcase fails and it passes with my fix.
msg325711 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-19 06:55
Thank you for your patch Swapneel, but this issue was fixed in issue25758.
History
Date User Action Args
2018-09-19 06:55:14serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg325711

resolution: fixed
stage: resolved
2015-08-05 15:54:28eric.snowsettype: crash -> behavior
versions: + Python 3.5, Python 3.6
2015-08-05 15:52:43eric.snowsetnosy: + eric.snow, superluser
2015-01-27 19:08:06amswapsetfiles: + zipimport_fix_withtest.patch

messages: + msg234840
2015-01-26 23:40:00vstinnersetmessages: + msg234798
2015-01-26 23:12:33amswapsetmessages: + msg234794
2015-01-26 22:55:39vstinnersetmessages: + msg234792
2015-01-26 22:52:13vstinnersetmessages: + msg234790
2015-01-26 22:49:34amswapsetfiles: + zipimport_fix.patch
keywords: + patch
messages: + msg234789
2015-01-26 22:45:34amswapsetfiles: + zipimport_test.py

messages: + msg234787
2015-01-26 22:43:56amswapcreate