classification
Title: Add new PyRun_xxx() functions to not encode the filename
Type: enhancement Stage: test needed
Components: Interpreter Core Versions: Python 3.5
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, Drekin, eric.snow, georg.brandl, larry, ncoghlan, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-11-07 11:47 by vstinner, last changed 2015-10-02 21:09 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
pyrun_object.patch vstinner, 2013-11-07 11:47 review
pyrun_object-2.patch vstinner, 2013-11-07 22:42 review
Messages (32)
msg202326 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-07 11:47
The changeset af822a6c9faf of the issue #19512 added the function PyRun_InteractiveOneObject(). By the way, I forgot to document this function. This issue is also a reminder for that. The purpose of the new function is to avoid creation of temporary Unicode strings and useless call to Unicode encoder/decoder.

I propose to generalize the change to other PyRun_xxx() functions. Attached patch adds the following functions:

- PyRun_AnyFileObject()
- PyRun_SimpleFileObject()
- PyRun_InteractiveLoopObject()
- PyRun_FileObject()

On Windows, these changes should allow to pass an unencodable filename on the command line (ex: japanese script name on an english setup).

TODO: I should document all these new functions.
msg202329 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-07 12:32
> On Windows, these changes should allow to pass an unencodable filename on the command line (ex: japanese script name on an english setup).

Doesn't the surrogateescape error handler solve this issue?
msg202335 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-07 13:01
2013/11/7 Serhiy Storchaka <report@bugs.python.org>:
>> On Windows, these changes should allow to pass an unencodable filename on the command line (ex: japanese script name on an english setup).
>
> Doesn't the surrogateescape error handler solve this issue?

surrogateescape is very specific to UNIX, or more generally systems
using bytes filenames. Windows native type for filename is Unicode. To
support any Unicode filename on Windows, you must never encode a
filename.

surrogateescape avoids decoding errors, here is the problem is an
encoding error.

For example, "abé" cannot be encoded to ASCII. "abé".encode("ascii",
"surrogateescape") doesn't help here.
msg202338 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-07 13:31
I added some comments on Rietveld.

Please do not commit without documentation and tests.
msg202392 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-07 22:42
Updated patch addressing some remarks of Serhiy and adding documentation.
msg202393 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-07 22:43
> Updated patch addressing some remarks of Serhiy and adding documentation.

Oh, and it adds also an unit test. I didn't run the unit test on Windows yet.
msg202397 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2013-11-08 00:05
PEP 432 relates pretty closely here.
msg202398 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-08 00:07
> PEP 432 relates pretty closely here.

What is the relation between this issue and the PEP 432?
msg202399 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2013-11-08 00:27
PEP 432 is all about the PyRun_* API and especially relates to refactoring it with the goal of improving extensibility and maintainability.  I'm sure Nick could expound, but the PEP is a response to the cruft that has accumulated over the years in Python/pythonrun.c.  The result of that organic growth makes it harder than necessary to do things like adding new commandline options.  While I haven't looked closely at the new function you added, I expect PEP 432 would have simplified things or even removed the need for a new function.
msg202411 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-11-08 09:45
PEP 432 doesn't really touch the PyRun_* APIs - it's all about refactoring
Py_Initialize so you can use most of the C API during the latter parts of
the configuration process (e.g. setting up the path for the import system).

pythonrun.c is just a monstrous beast that covers the entire interpreter
lifecycle from initialisation through script execution through to
termination.
msg203447 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-19 23:57
> Updated patch addressing some remarks of Serhiy and adding documentation.

Anyone for a new review?
msg203464 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-20 07:45
PyRun_FileObject() looks misleading, because it works with FILE*, not with a file object.
msg203474 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-20 13:38
> PyRun_FileObject() looks misleading, because it works with FILE*, not with a file object.

I simply replaced the current suffix with Object(). Only filename is converted from char* to PyObject*. Do you have a better suggestion for the new name?
msg203476 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-20 13:48
No I have not a better suggestion. But I afraid that one day you will wanted to extend PyRun_File*() function to work with a general Python file object (perhaps there is such issue already) and then you will encountered a problem.
msg203480 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-11-20 14:13
Perhaps we could we use the suffix "Unicode" rather than "Object"? These don't work with arbitrary objects, they expect a unicode string.

PyRun_InteractiveOneObject would be updated to use the new suffix as well.

That would both be clearer for the user, and address Serhiy's concern about the possible ambiguity: PyRun_FileUnicode still isn't crystal clear, but it's clearer than PyRun_FileObject.
msg203481 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-20 14:17
FYI I already added a bunch of new functions with Object suffix when I replaced char* with PyObject*.

Example:

http://hg.python.org/cpython/rev/df2fdd42b375
http://bugs.python.org/issue11619
msg203489 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-11-20 15:03
Hmm, reading more of those and I think Serhiy is definitely right -
Object is the wrong suffix. Unicode isn't right either, since the main
problem is that ambiguity around *which* parameter is a Python Unicode
object. The API names that end in *StringObject or *FileObject don't
give the right idea at all.

The shortest accurate suffix I can come up with at the moment is the
verbose "WithUnicodeFilename":

    PyParser_ParseStringObject vs
    PyParser_ParseStringWithUnicodeFilename

Other possibilities:

    PyParser_ParseStringUnicode # Huh?
    PyParser_ParseStringDecodedFilename # Slight fib on Windows, but
mostly accurate
    PyParser_ParseStringAnyFilename

Inserting an underscore before the suffix is another option (although
I don't think it much matters either way).
msg203490 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-20 15:11
> FYI I already added a bunch of new functions with Object suffix when I replaced char* with PyObject*.

Most of them were added in 3.4. Unfortunately several functions were added earlier (e.g. PyImport_ExecCodeModuleObject, PyErr_SetFromErrnoWithFilenameObject).
msg203592 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-21 09:09
So, which suffix should be used?
msg203593 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-21 09:18
"*Unicode" suffix in existing functions means Py_UNICODE* argument.

May be "*Ex2"? It can't be misinterpreted but looks ugly.
msg203608 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-21 10:36
> "*Unicode" suffix in existing functions means Py_UNICODE* argument.

Yes, this is why I chose Object() suffix. Are you still opposed to
"Object" suffix?

(Yes, "*Ex2" is really ugly.)
msg203618 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-11-21 12:04
How about "ExName"?

This patch:
    PyRun_AnyFileExName
    PyRun_SimpleFileExName
    PyRun_InteractiveOneExName
    PyRun_InteractiveLoopExName
    PyRun_FileExName

Previous patch:
    Py_CompileStringExName
    PyAST_FromNodeExName
    PyAST_CompileExName
    PyFuture_FromASTExName
    PyParser_ParseFileExName
    PyParser_ParseStringExName
    PyErr_SyntaxLocationExName
    PyErr_ProgramTextExName
    PyParser_ASTFromStringExName
    PyParser_ASTFromFileExName


- "Ex" has precedent as indicating a largely functionally equivalent API with a different signature
- "Name" suggests strongly that we're tinkering with the filename (since this APIs don't accept another name)
- "ExName" is the same length as "Object" but far more explicit

Thoughts?
msg206391 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-12-16 23:22
Sorry, but because of the bikeshedding, I'm not more interested to work on this issue. Don't hesitate to re-work my patch if you want to fix the bug ("On Windows, these changes should allow to pass an unencodable filename on the command line").
msg206396 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-12-17 02:50
Just getting this on Larry's radar and summarising the current position.

The original problem: using "char *" to pass filenames around doesn't work properly on Windows, we need to use Unicode objects.

The solution: parallel APIs that accept PyObject * rather than char * for the filename parameters.

The new problem: both Serhiy and I find the *Object() suffix currently used for those "filename as Unicode object instead of C string" parallel APIs to be ambiguous and confusing. However, the problem the parallel APIs solve is real, and reverting or excessively modifying any of the work Victor has already done would be silly.

That means we're now in a situation where we have to either:

* accept *Object as the suffix for all of these APIs indefinitely, even though it's ambiguous and confusing
* choose a new suffix and use that for the APIs already added in 3.4 and add compatibility aliases for the older APIs to make them consistent
* change the public API additions already made for 3.4 to new private APIs by adding an underscore prefix, and then reconsider the public API naming question for 3.5
* accept *Object as the suffix for the moment, but aim to replace it with something more descriptive in Python 3.5

Neither Serhiy nor I are comfortable with the first option, and making a decision in haste for the second option doesn't seem like a good idea. Option 3 seems like far too much work to make things less useful (a capability that works, but has an ambiguous and confusing name, is better than a capability that isn't provided at all)

That leaves option number 4: don't change anything further now, but revisit it for 3.5, including changing the preferred name of the existing APIs.

I like that approach, so I'm assigning to myself to take a closer look at how some of the suggestions above read in the docs once 3.4 is out the door.
msg206449 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2013-12-17 14:38
So all the PyRun_*Object functions are new in 3.4, and none of them are documented yet?

Option 4 is silly--I don't think we should ship them as public APIs in 3.4 if we're planning to rename them.  I prefer the previous options.

p.s. fwiw I hate "ExName".
msg206453 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-17 14:55
> So all the PyRun_*Object functions are new in 3.4, and none of them are documented yet?

Not all. Only following functions are new in 3.4:

Parser/parsetok.c:PyParser_ParseStringObject
Parser/parsetok.c:PyParser_ParseFileObject
Python/future.c:PyFuture_FromASTObject
Python/symtable.c:PySymtable_BuildObject
Python/compile.c:PyAST_CompileObject
Python/_warnings.c:PyErr_WarnExplicitObject
Python/ast.c:PyAST_FromNodeObject
Python/errors.c:PyErr_SyntaxLocationObject
Python/errors.c:PyErr_ProgramTextObject
Python/pythonrun.c:PyRun_InteractiveOneObject
Python/pythonrun.c:Py_CompileStringObject
Python/pythonrun.c:Py_SymtableStringObject
Python/pythonrun.c:PyParser_ASTFromStringObject
Python/pythonrun.c:PyParser_ASTFromFileObject

Following functions existed in 3.3:

Objects/moduleobject.c:PyModule_NewObject
Objects/moduleobject.c:PyModule_GetNameObject
Objects/moduleobject.c:PyModule_GetFilenameObject
Objects/abstract.c:PyObject_CallObject
Objects/bytesobject.c:PyBytes_FromObject
Objects/fileobject.c:PyFile_WriteObject
Objects/memoryobject.c:PyMemoryView_FromObject
Objects/longobject.c:PyLong_FromUnicodeObject
Objects/weakrefobject.c:PyWeakref_GetObject
Objects/exceptions.c:PyUnicodeEncodeError_GetObject
Objects/exceptions.c:PyUnicodeDecodeError_GetObject
Objects/exceptions.c:PyUnicodeTranslateError_GetObject
Objects/unicodeobject.c:PyUnicode_FromObject
Objects/unicodeobject.c:PyUnicode_FromEncodedObject
Objects/unicodeobject.c:PyUnicode_AsDecodedObject
Objects/unicodeobject.c:PyUnicode_AsEncodedObject
Objects/bytearrayobject.c:PyByteArray_FromObject
Python/sysmodule.c:PySys_GetObject
Python/sysmodule.c:PySys_SetObject
Python/errors.c:PyErr_SetObject
Python/errors.c:PyErr_SetFromErrnoWithFilenameObject
Python/import.c:_PyImport_FixupExtensionObject
Python/import.c:_PyImport_FindExtensionObject
Python/import.c:PyImport_AddModuleObject
Python/import.c:PyImport_ExecCodeModuleObject
Python/import.c:PyImport_ImportFrozenModuleObject
Python/import.c:PyImport_ImportModuleLevelObject
Python/modsupport.c:PyModule_AddObject
Python/pyarena.c:PyArena_AddPyObject
msg206456 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2013-12-17 14:58
Are all the functions that use "Object" to indicate "Unicode object instead of string" new in 3.4?  Of those, how many are undocumented?
msg206460 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-17 15:16
> Are all the functions that use "Object" to indicate "Unicode object instead
> of string" new in 3.4?  Of those, how many are undocumented?

Following 5 functions work with PyObject* filenames and have Object-less 
variants which works with char * filenames:

Python/errors.c:PyErr_SetFromErrnoWithFilenameObject
Python/import.c:PyImport_AddModuleObject
Python/import.c:PyImport_ExecCodeModuleObject
Python/import.c:PyImport_ImportFrozenModuleObject
Python/import.c:PyImport_ImportModuleLevelObject

Private _PyImport_FixupExtensionObject and _PyImport_FindExtensionObject have 
no Object-less variants.

All other *Object functions are unrelated.
msg206462 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2013-12-17 15:33
Are those five functions new in 3.4 and undocumented?
msg206464 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2013-12-17 15:34
Are we proposing renaming any functions that are either
a) not new in 3.4, or
b) were documented as of 3.4 beta 1?
msg206466 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-17 15:45
> Are those five functions new in 3.4 and undocumented?

PyErr_SetFromErrnoWithFilenameObject exists even in 2.7. Other 4 
PyImport_*Object functions all added in 3.3 (see issue3080). All 5 functions 
are documented.

14 new functions were added in 3.4.
msg247988 - (view) Author: Adam Bartoš (Drekin) * Date: 2015-08-04 12:20
I'm not sure this is the right issue. The support for Unicode filenames is not (at least on Windows) ideal.

Let α.py be a Python script with invalid syntax.

> py α.py
  File "<encoding error>", line 2
    as as compile error
     ^
SyntaxError: invalid syntax

On the other hand, if run.py is does something like

path = sys.argv[1]
with tokenize.open(path) as f:
    source = f.read()
code = compile(source, path, "exec")
exec(code, __main__.__dict__)

we get 
> py run.py α.py
  File "Python Unicode\\u03b1.py", line 2
    as as compile error
     ^
SyntaxError: invalid syntax

(or 'File "Python Unicode\α.py", line 2' depending on whether sys.stdout can encode the string).

So the "<encoding error>" in the first example is unfortunate as it is easy to get better result even by a simple pure Python approach.
History
Date User Action Args
2015-10-02 21:09:44vstinnersetstatus: open -> closed
resolution: out of date
2015-08-04 12:20:07Drekinsetnosy: + Drekin
messages: + msg247988
2015-06-28 03:03:46ncoghlansetassignee: ncoghlan ->
2013-12-17 15:45:44serhiy.storchakasetmessages: + msg206466
2013-12-17 15:34:42larrysetmessages: + msg206464
2013-12-17 15:33:16larrysetmessages: + msg206462
2013-12-17 15:16:12serhiy.storchakasetmessages: + msg206460
2013-12-17 14:58:23larrysetmessages: + msg206456
2013-12-17 14:55:48serhiy.storchakasetmessages: + msg206453
2013-12-17 14:38:11larrysetmessages: + msg206449
2013-12-17 02:54:09ncoghlansetpriority: normal
2013-12-17 02:50:48ncoghlansetpriority: normal -> (no value)

nosy: + larry
versions: + Python 3.5, - Python 3.4
messages: + msg206396

assignee: ncoghlan
2013-12-16 23:22:32vstinnersetnosy: - vstinner
2013-12-16 23:22:20vstinnersetnosy: georg.brandl, ncoghlan, vstinner, Arfrever, eric.snow, serhiy.storchaka
messages: + msg206391
2013-11-21 12:04:49ncoghlansetmessages: + msg203618
2013-11-21 10:36:47vstinnersetmessages: + msg203608
2013-11-21 09:18:38serhiy.storchakasetmessages: + msg203593
2013-11-21 09:09:13vstinnersetmessages: + msg203592
2013-11-20 15:11:51serhiy.storchakasetmessages: + msg203490
2013-11-20 15:03:34ncoghlansetmessages: + msg203489
2013-11-20 14:17:03vstinnersetmessages: + msg203481
2013-11-20 14:13:57ncoghlansetmessages: + msg203480
2013-11-20 13:48:25serhiy.storchakasetmessages: + msg203476
2013-11-20 13:38:59vstinnersetmessages: + msg203474
2013-11-20 07:45:57serhiy.storchakasetmessages: + msg203464
2013-11-19 23:57:54vstinnersetmessages: + msg203447
2013-11-08 09:45:35ncoghlansetmessages: + msg202411
2013-11-08 00:27:12eric.snowsetmessages: + msg202399
2013-11-08 00:07:30vstinnersetmessages: + msg202398
2013-11-08 00:05:16eric.snowsetnosy: + eric.snow, ncoghlan
messages: + msg202397
2013-11-07 22:43:22vstinnersetmessages: + msg202393
2013-11-07 22:42:52vstinnersetfiles: + pyrun_object-2.patch

messages: + msg202392
2013-11-07 16:48:24Arfreversetnosy: + Arfrever
2013-11-07 13:31:46serhiy.storchakasetmessages: + msg202338
2013-11-07 13:02:54vstinnersetnosy: + georg.brandl
2013-11-07 13:01:19vstinnersetmessages: + msg202335
2013-11-07 12:32:41serhiy.storchakasetmessages: + msg202329
2013-11-07 12:30:46serhiy.storchakasettype: enhancement
components: + Interpreter Core
stage: test needed
2013-11-07 11:48:00vstinnercreate