classification
Title: Unicode filename gets crippled on Windows when drag and drop
Type: behavior Stage: resolved
Components: Unicode, Windows Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: steve.dower Nosy List: Decorater, Drekin, abarry, eryksun, ezio.melotti, paul.moore, python-dev, steve.dower, terry.reedy, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2016-07-08 15:38 by Drekin, last changed 2016-07-24 04:00 by steve.dower. This issue is now closed.

Files
File name Uploaded Description Edit
27469_1.patch steve.dower, 2016-07-16 23:19
27469_2.patch steve.dower, 2016-07-18 05:07
Messages (26)
msg269991 - (view) Author: Adam Bartoš (Drekin) * Date: 2016-07-08 15:38
When a Python script is run by drag-and-dropping another file on it in Windows explorer, the other file's path becomes sys.argv[1]. However, when the path contains a Unicode characters (e.g. α), it gets crippled – it is replaced by ordinary question mark.

Maybe this is not Python's fault. Calling directly GetCommandLineW already contains that question mark. I'm just curious whether the original path even can be reconstructed.
msg270009 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-07-08 21:27
Nothing can be done about this from Python. It's a bug in how Explorer handles the dropped filename. 

Note that it's not simply replacing Unicode characters with question marks. It's using a best-fit ANSI encoding. For example, codepage 1252 maps "Ā" to "A". If there's no defined best-fit mapping, most codepages default to using "?" as the replacement character when encoding via WideCharToMultiByte. When decoding via MultiByteToWideChar, some codepages (e.g. 932), use katakana middle dot (U+30FB) as the default instead of a question mark.

For example, here's the commandline of py.exe when I drop a file named "Ā.txt" on a script. Note that "Ā" becomes "A":

    0:000> ?? @$peb->ProcessParameters->CommandLine
    struct _UNICODE_STRING
     ""C:\Windows\py.exe" "C:\Temp\test.py" C:\Temp\A.txt "
       +0x000 Length           : 0x68
       +0x002 MaximumLength    : 0x6a
       +0x004 Buffer           : 0x00771d50  ""C:\Windows\py.exe" "C:\Temp\test.py" C:\Temp\A.txt "

It's bizarre that it encodes the filename as ANSI just to decode it later when it calls CreateProcess. But Explorer probably still has a lot old code from back when it had to run on both Windows NT and DOS-based Windows 9x. This is probably a vestige of some workaround.

It isn't a problem if you ShellExecuteEx the Python script. For example, I ran "C:\Temp\test.py C:\Temp\Ā.txt" in the command prompt, and here's the resulting command line:

    0:000> ?? @$peb->ProcessParameters->CommandLine
    struct _UNICODE_STRING
     ""C:\Windows\py.exe" "C:\Temp\test.py"  C:\Temp\Ā.txt"
       +0x000 Length           : 0x68
       +0x002 MaximumLength    : 0x6a
       +0x004 Buffer           : 0x00981cf8  ""C:\Windows\py.exe" "C:\Temp\test.py"  C:\Temp\Ā.txt"

Explorer actually handles drag and drop correctly when dropping the file on another window. So as a (clunky) workaround, you can drag the script icon into a command prompt or Win+R run dialog, and then drag the target file. The shell should add quotes where required.
msg270016 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-07-09 03:13
On second thought, it occurred to me that the problem isn't in Explorer or shell32, but in the drop handler. It turns out the problem is that the drop handler calls the ANSI API DragQueryFileA instead of DragQueryFileW. To see this I attached a debugger to Explorer and set a breakpoint on shell32!DragQueryFileA:

    Breakpoint 0 hit
    SHELL32!DragQueryFileA:
    00007ffd`bed1a5b0 4883ec38        sub     rsp,38h
    0:056> k 3
    Child-SP          RetAddr           Call Site
    00000000`0ee3e0f8 00007ffd`b2c95b7a SHELL32!DragQueryFileA
    00000000`0ee3e100 00007ffd`bedade2e wshext!CWSHExtension::Drop+0x7a
    00000000`0ee3e300 00007ffd`bed9d547 SHELL32!CDVDropTarget::_PerformDrop+0x14a

Note that the shell is deferring to the drop handler for the file type, which in our case is implemented by wshext.dll. 

The first DragQueryFileA call is to query the number of files (i.e. iFile == 0xFFFFFFFF):

    0:056> r rdx
    rdx=00000000ffffffff
    0:056> g

The 2nd call gets the ANSI encoded filename:

    Breakpoint 0 hit
    SHELL32!DragQueryFileA:
    00007ffd`bed1a5b0 4883ec38        sub     rsp,38h
    0:056> r r8
    r8=000000000ee3e1a0
    0:056> pt; da ee3e1a0
    00000000`0ee3e1a0  "C:\Temp\A.txt"

The drop handler is set in the Python.File ProgId, which is defined in the installer in Tools/msi/launcher/launcher_reg.wxs, which configures the following registry entry:

    C:\>reg query HKLM\Software\Classes\Python.File\shellex\DropHandler

    HKEY_LOCAL_MACHINE\Software\Classes\Python.File\shellex\DropHandler
        (Default)    REG_SZ    {60254CA5-953B-11CF-8C96-00AA00B8708C}

As we previously saw in the stack trace, it's implemented by wshext.dll:

    C:\>reg query HKLM\Software\Classes\CLSID\{60254CA5-953B-11CF-8C96-00AA00B8708C} /s

    HKEY_LOCAL_MACHINE\Software\Classes\CLSID\{60254CA5-953B-11CF-8C96-00AA00B8708C}
        (Default)    REG_SZ    Shell Extension For Windows Script Host

    HKEY_LOCAL_MACHINE\Software\Classes\CLSID\{60254CA5-953B-11CF-8C96-00AA00B8708C}\InProcServer32
        (Default)    REG_SZ    C:\Windows\System32\wshext.dll
        ThreadingModel    REG_SZ    Apartment


I thought I could fix this easily by switching to the drop handler that batfile and exefile use:

    C:\>reg add HKLM\Software\Classes\Python.File\shellex\DropHandler /ve /d ^
    More? {86C86720-42A0-1069-A2E8-08002B30309D}
    Value  exists, overwrite(Yes/No)? y
    The operation completed successfully.

This gets the Unicode filename right, but when I move "Ā.txt" to a directory under "Program Files", I see that it uses short filenames in the path. 

    ['C:\\PROGRA~1\\Python35\\Ā.txt']

Is that acceptable?
msg270029 - (view) Author: Adam Bartoš (Drekin) * Date: 2016-07-09 08:50
Thank you very much for the analysis. So Python Windows installers may be changed to set the other drop handler. If the short paths are problem, they may be converted to long ones when initializing `sys.argv`.
msg270030 - (view) Author: Adam Bartoš (Drekin) * Date: 2016-07-09 08:51
Also, what versions of Windows does this affect? I have 64bit Vista, so maybe this is fixed in say Windows 10.
msg270034 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-07-09 10:50
Yes, a different drop handler solves the problem. It doesn't have to be the exefile handler that's built into shell32.dll. Another handler could be used that preserves Unicode filenames and long paths.

I tested in Windows 10.0.10586.
msg270038 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-07-09 12:59
Try removing the handler completely and see what the default behavior is.

Otherwise, I'll do some research and figure out the right one.
msg270040 - (view) Author: Adam Bartoš (Drekin) * Date: 2016-07-09 13:04
Without a handler the drop feature is disabled.
msg270042 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-07-09 13:50
Okay thanks. I'll see if I can track down the right one on Monday.
msg270050 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-07-09 16:09
Got some digging done today and it looks like we're probably best to write our own drop handler. It can probably be embedded into the py.exe launcher, which will keep all the registration in the one file.

At worst, we should switch to {86C86720-42A0-1069-A2E8-08002B30309D} (same as EXE files - converts long/Unicode paths to short filenames).
msg270138 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-07-10 22:38
The WSH drop handler was added to resolve issue 1656675. It was chosen because it supports long filenames. OTOH, the EXE drop handler probably uses short filenames because the command line is limited to 32,768 characters. (Actually, it only uses the short name for a path element if it's longer than 12 characters or contains whitespace, which was why my example still had non-ASCII "Ā" in the filename.)

If Python.File switches to the EXE drop handler, then I think PySys_SetArgv should automatically expand short filenames. A command-line option could be added to disable automatic expansion. It would be nice to implement this change in 2.7 as well since the WSH drop handler has been used since 2.6. 

The alternative is to distribute a DLL with the launcher that implements the IPersistFile and IDropTarget interfaces [1], with support for both long filenames and Unicode. It has to be a shell extension, so AFAIK it can't be hosted in a local server process, if that's what Steve meant by embedding it in py.exe.

[1]: https://msdn.microsoft.com/en-us/library/cc144165
msg270191 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-07-11 17:26
I don't want Python to get into the business of changing the command line.

I've started looking into building the shell extension so we can pass through arguments correctly. Once we have this, there are likely other features we can add to it (maybe extract __author__/__version__ and display it in tooltips? Certainly some of the requests for IDLE could be accommodated).
msg270593 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-07-16 23:19
Attached my first pass at the shell extension.

As this has the potential to crash Windows Explorer, I want to be _really_ thorough, so all reviews and feedback welcomed.
msg270617 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-07-17 08:05
Thanks, Steve. I manually added this shell extension as the drop handler for Python.File. It's working with non-ANSI filenames, e.g. "αβψδ εφγη ιξκλ μνοπ ρστθ ωχυζ.txt" in a Western locale. Also, I was able to drop 939 files from the System32 directory, with a total command-line length of 32766 characters. 

However, I ran into heap corruption that crashed Explorer when dropping files from "C:\Program Files (x86)\Windows Kits\10\Include\10.0.10586.0\um". I used gflags to enable the page heap for Explorer and tracked the problem down to FilenameListCchLength[W|A]. It wasn't allocating space for the double quotes for paths with spaces in them. So I made the following changes:

FilenameListCchLengthW:

    length += oneLength + (wcschr(pszSource, L' ') ? 3 : 1);

FilenameListCchLengthA:

    length += oneLength + (strchr(pszSource, ' ') ? 3 : 1);

After this I was able to drop 421 files from the above-mentioned Include directory, with a total command-line length of 32737 characters.
msg270685 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-07-18 03:52
Thanks, I've merged that change into my patch.

I doubt anyone else is going to build and test this, but just in case I'll let it set for a couple of days before merging. Perhaps someone will at least look at the code.
msg270689 - (view) Author: Decorater (Decorater) * Date: 2016-07-18 04:12
Oh and when the shell extension is done could you make a custom icon for pyd's and have the desciption of "Python C Compiled DLL" to easy selcting them when you have no icons for any *.lib, *.exp and *.pdb's Would ease up comping pyd files as well.

On further note it seems that python (when told to read from python36.zip seems to not read any pyd's that are in the zip as well) I would like for it to be able too if someone wants to embed PyNacl with _sodium.cp36-win32.pyd or _sodium.cp36-win_amd64.pyd files in it.

(Also readign pyd's in the zips could also simplify things like a ImportError on stuff that is actually there.
msg270690 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-07-18 04:25
A custom icon for .pyd files doesn't require a shell extension - it only really requires the icon.

I don't want to use the same as .pyc, since that has the implication that it can be deleted safely, or .py, since that can be double-clicked. So we'd need something new.

The rest of the request requires exposing new public APIs in Windows so that we can load executables from arbitrary memory locations. That's not something I can fix here :)
msg270696 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-07-18 04:56
Steve, will you be uploading a new patch? The current patch doesn't include "pyshellext.vcxproj" in the build, since that was accidentally committed and then removed. 

When you call ShellExecute, I suggest passing NULL for lpOperation, to use the default verb. If no default is defined the shell uses "open" anyway.

Decorater, there's a memory importer for extension modules in py2exe, based on Joachim Bauch's MemoryModule.c:

https://sourceforge.net/p/py2exe/svn/HEAD/tree/trunk/py2exe-3/source
msg270697 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-07-18 05:07
Sure, added a new patch with all of your suggestions. I also added pyshellext.vcxproj to pcbuild.sln.

(And my final word on the magic used to load DLLs from memory is that I'm not willing to maintain that code within CPython itself, so it won't be going into the core unless someone steps up with a 10+ year promise to look after it.)
msg270698 - (view) Author: Decorater (Decorater) * Date: 2016-07-18 05:11
I actually like py2exe's memory loader.
msg271082 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-07-23 15:02
New changeset 3005fc6cff8a by Steve Dower in branch '3.5':
Issue #27469: Adds a shell extension to the launcher so that drag and drop works correctly.
https://hg.python.org/cpython/rev/3005fc6cff8a

New changeset f56adb9800fa by Steve Dower in branch 'default':
Issue #27469: Adds a shell extension to the launcher so that drag and drop works correctly.
https://hg.python.org/cpython/rev/f56adb9800fa
msg271083 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-07-23 15:04
Fixed. This should get out into the wild first with 3.6.0a4, and then 3.5.3[rc1]
msg271098 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-07-23 18:21
> Certainly some of the requests for IDLE could be accommodated).

Steve, what did you have in mind here?
msg271103 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-07-23 19:29
The "Edit in IDLE" submenu we implemented could be moved from a collection of registry keys into the shell extension, which would also let us detect all Python installations, as right now we only have 3.5 and later (we could even filter out those that are missing IDLE for whatever reason). We'd also be able to fully control launching it, so we could set the working directory properly and any environment variables that may be needed/helpful.

I believe we can also do other tricks like treating Shift+Click differently from a regular click, if that was any value (run in IDLE and leave the interactive session open, perhaps?).
msg271108 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-07-23 20:21
Having "Edit in IDLE' work uniformly for all installed versions would be great.

> set the working directory properly

This needs some discussion and coordination.  I am thinking of patching IDLE to switch to $HOME when cwd is the python install dir, but this would very likely only be done for 3.6 as part of re-writing the startup code.

> and any environment variables that may be needed/helpful.
Can't think of anything at the moment, but this has never been an option.

> treating Shift+Click differently from a regular click, if that was any value (run in IDLE and leave the interactive session open, perhaps?).

IDLE's "-r file.py" runs the file and leaves Shell open.  I believe this is meant as a substitute for setting $IDLESTARTUP or $PYTHONSTARTUP and using -s.  For development, one should open in the editor and hit f5.   There might be a controversy whether to run in IDLE or 'python -i'.  If control or alt click could be distinguished, both would be possible.

Are you going to open issues, or should I?
msg271125 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-07-24 04:00
I just created issue27603 for any work that goes into adapting the IDLE context menu. Pretty good chance I'll get to look at it for 3.6.
History
Date User Action Args
2016-07-24 04:00:59steve.dowersetmessages: + msg271125
2016-07-23 20:21:19terry.reedysetmessages: + msg271108
2016-07-23 19:29:36steve.dowersetmessages: + msg271103
2016-07-23 18:21:29terry.reedysetnosy: + terry.reedy
messages: + msg271098
2016-07-23 15:04:44steve.dowersetstatus: open -> closed
resolution: fixed
messages: + msg271083

stage: patch review -> resolved
2016-07-23 15:02:27python-devsetnosy: + python-dev
messages: + msg271082
2016-07-18 19:40:30vstinnersetnosy: - vstinner
2016-07-18 05:11:24Decoratersetmessages: + msg270698
2016-07-18 05:07:54steve.dowersetfiles: + 27469_2.patch

messages: + msg270697
stage: patch review
2016-07-18 04:56:38eryksunsetmessages: + msg270696
2016-07-18 04:25:57steve.dowersetmessages: + msg270690
2016-07-18 04:12:10Decoratersetnosy: + Decorater
messages: + msg270689
2016-07-18 03:52:31steve.dowersetmessages: + msg270685
2016-07-17 08:05:29eryksunsetmessages: + msg270617
2016-07-16 23:19:51steve.dowersetfiles: + 27469_1.patch
keywords: + patch
messages: + msg270593
2016-07-11 17:26:11steve.dowersetmessages: + msg270191
2016-07-10 22:38:31eryksunsetmessages: + msg270138
2016-07-09 16:09:55steve.dowersetmessages: + msg270050
2016-07-09 13:50:52steve.dowersetassignee: steve.dower
messages: + msg270042
2016-07-09 13:04:59Drekinsetmessages: + msg270040
2016-07-09 12:59:51steve.dowersetmessages: + msg270038
2016-07-09 10:50:21eryksunsetmessages: + msg270034
2016-07-09 08:51:18Drekinsetmessages: + msg270030
2016-07-09 08:50:07Drekinsetmessages: + msg270029
2016-07-09 03:13:11eryksunsetstatus: closed -> open
resolution: third party -> (no value)
messages: + msg270016

stage: resolved -> (no value)
2016-07-08 21:27:23eryksunsetstatus: open -> closed

nosy: + eryksun
messages: + msg270009

resolution: third party
stage: resolved
2016-07-08 15:40:59abarrysetnosy: + abarry
2016-07-08 15:38:52Drekincreate