Add new PyRun_xxx() functions to not encode the filename #63717

vstinner · 2013-11-07T11:48:00Z

BPO	19518
Nosy	@birkenfeld, @ncoghlan, @larryhastings, @ericsnowcurrently, @serhiy-storchaka
Files	pyrun_object.patch pyrun_object-2.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2015-10-02.21:09:44.761>
created_at = <Date 2013-11-07.11:47:59.991>
labels = ['interpreter-core', 'type-feature']
title = 'Add new PyRun_xxx() functions to not encode the filename'
updated_at = <Date 2015-10-02.21:09:44.761>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2015-10-02.21:09:44.761>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2015-10-02.21:09:44.761>
closer = 'vstinner'
components = ['Interpreter Core']
creation = <Date 2013-11-07.11:47:59.991>
creator = 'vstinner'
dependencies = []
files = ['32524', '32538']
hgrepos = []
issue_num = 19518
keywords = ['patch']
message_count = 32.0
messages = ['202326', '202329', '202335', '202338', '202392', '202393', '202397', '202398', '202399', '202411', '203447', '203464', '203474', '203476', '203480', '203481', '203489', '203490', '203592', '203593', '203608', '203618', '206391', '206396', '206449', '206453', '206456', '206460', '206462', '206464', '206466', '247988']
nosy_count = 7.0
nosy_names = ['georg.brandl', 'ncoghlan', 'larry', 'Arfrever', 'eric.snow', 'serhiy.storchaka', 'Drekin']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = 'test needed'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue19518'
versions = ['Python 3.5']

vstinner · 2013-11-07T11:47:58Z

The changeset af822a6c9faf of the issue bpo-19512 added the function PyRun_InteractiveOneObject(). By the way, I forgot to document this function. This issue is also a reminder for that. The purpose of the new function is to avoid creation of temporary Unicode strings and useless call to Unicode encoder/decoder.

I propose to generalize the change to other PyRun_xxx() functions. Attached patch adds the following functions:

PyRun_AnyFileObject()
PyRun_SimpleFileObject()
PyRun_InteractiveLoopObject()
PyRun_FileObject()

On Windows, these changes should allow to pass an unencodable filename on the command line (ex: japanese script name on an english setup).

TODO: I should document all these new functions.

serhiy-storchaka · 2013-11-07T12:32:41Z

On Windows, these changes should allow to pass an unencodable filename on the command line (ex: japanese script name on an english setup).

Doesn't the surrogateescape error handler solve this issue?

vstinner · 2013-11-07T13:01:19Z

2013/11/7 Serhiy Storchaka <report@bugs.python.org>:

> On Windows, these changes should allow to pass an unencodable filename on the command line (ex: japanese script name on an english setup).

Doesn't the surrogateescape error handler solve this issue?

surrogateescape is very specific to UNIX, or more generally systems
using bytes filenames. Windows native type for filename is Unicode. To
support any Unicode filename on Windows, you must never encode a
filename.

surrogateescape avoids decoding errors, here is the problem is an
encoding error.

For example, "abé" cannot be encoded to ASCII. "abé".encode("ascii",
"surrogateescape") doesn't help here.

serhiy-storchaka · 2013-11-07T13:31:47Z

I added some comments on Rietveld.

Please do not commit without documentation and tests.

vstinner · 2013-11-07T22:42:52Z

Updated patch addressing some remarks of Serhiy and adding documentation.

vstinner · 2013-11-07T22:43:23Z

Updated patch addressing some remarks of Serhiy and adding documentation.

Oh, and it adds also an unit test. I didn't run the unit test on Windows yet.

ericsnowcurrently · 2013-11-08T00:05:17Z

PEP-432 relates pretty closely here.

vstinner · 2013-11-08T00:07:31Z

PEP-432 relates pretty closely here.

What is the relation between this issue and the PEP-432?

ericsnowcurrently · 2013-11-08T00:27:13Z

PEP-432 is all about the PyRun_* API and especially relates to refactoring it with the goal of improving extensibility and maintainability. I'm sure Nick could expound, but the PEP is a response to the cruft that has accumulated over the years in Python/pythonrun.c. The result of that organic growth makes it harder than necessary to do things like adding new commandline options. While I haven't looked closely at the new function you added, I expect PEP-432 would have simplified things or even removed the need for a new function.

ncoghlan · 2013-11-08T09:45:35Z

PEP-432 doesn't really touch the PyRun_* APIs - it's all about refactoring
Py_Initialize so you can use most of the C API during the latter parts of
the configuration process (e.g. setting up the path for the import system).

pythonrun.c is just a monstrous beast that covers the entire interpreter
lifecycle from initialisation through script execution through to
termination.

vstinner · 2013-11-19T23:57:54Z

Updated patch addressing some remarks of Serhiy and adding documentation.

Anyone for a new review?

serhiy-storchaka · 2013-11-20T07:45:58Z

PyRun_FileObject() looks misleading, because it works with FILE*, not with a file object.

vstinner · 2013-11-20T13:39:00Z

PyRun_FileObject() looks misleading, because it works with FILE*, not with a file object.

I simply replaced the current suffix with Object(). Only filename is converted from char* to PyObject*. Do you have a better suggestion for the new name?

serhiy-storchaka · 2013-11-20T13:48:25Z

No I have not a better suggestion. But I afraid that one day you will wanted to extend PyRun_File*() function to work with a general Python file object (perhaps there is such issue already) and then you will encountered a problem.

ncoghlan · 2013-11-20T14:13:58Z

Perhaps we could we use the suffix "Unicode" rather than "Object"? These don't work with arbitrary objects, they expect a unicode string.

PyRun_InteractiveOneObject would be updated to use the new suffix as well.

That would both be clearer for the user, and address Serhiy's concern about the possible ambiguity: PyRun_FileUnicode still isn't crystal clear, but it's clearer than PyRun_FileObject.

vstinner · 2013-11-20T14:17:03Z

FYI I already added a bunch of new functions with Object suffix when I replaced char* with PyObject*.

Example:

http://hg.python.org/cpython/rev/df2fdd42b375
http://bugs.python.org/issue11619

ncoghlan · 2013-11-20T15:03:34Z

Hmm, reading more of those and I think Serhiy is definitely right -
Object is the wrong suffix. Unicode isn't right either, since the main
problem is that ambiguity around *which* parameter is a Python Unicode
object. The API names that end in *StringObject or *FileObject don't
give the right idea at all.

The shortest accurate suffix I can come up with at the moment is the
verbose "WithUnicodeFilename":

PyParser_ParseStringObject vs
PyParser_ParseStringWithUnicodeFilename

Other possibilities:

PyParser_ParseStringUnicode # Huh?
PyParser_ParseStringDecodedFilename # Slight fib on Windows, but

mostly accurate
PyParser_ParseStringAnyFilename

Inserting an underscore before the suffix is another option (although
I don't think it much matters either way).

serhiy-storchaka · 2013-11-20T15:11:51Z

FYI I already added a bunch of new functions with Object suffix when I replaced char* with PyObject*.

Most of them were added in 3.4. Unfortunately several functions were added earlier (e.g. PyImport_ExecCodeModuleObject, PyErr_SetFromErrnoWithFilenameObject).

vstinner · 2013-11-21T09:09:13Z

So, which suffix should be used?

serhiy-storchaka · 2013-11-21T09:18:38Z

"*Unicode" suffix in existing functions means Py_UNICODE* argument.

May be "*Ex2"? It can't be misinterpreted but looks ugly.

vstinner · 2013-11-21T10:36:48Z

"*Unicode" suffix in existing functions means Py_UNICODE* argument.

Yes, this is why I chose Object() suffix. Are you still opposed to
"Object" suffix?

(Yes, "*Ex2" is really ugly.)

ncoghlan · 2013-11-21T12:04:49Z

How about "ExName"?

This patch:
PyRun_AnyFileExName
PyRun_SimpleFileExName
PyRun_InteractiveOneExName
PyRun_InteractiveLoopExName
PyRun_FileExName

Previous patch:
Py_CompileStringExName
PyAST_FromNodeExName
PyAST_CompileExName
PyFuture_FromASTExName
PyParser_ParseFileExName
PyParser_ParseStringExName
PyErr_SyntaxLocationExName
PyErr_ProgramTextExName
PyParser_ASTFromStringExName
PyParser_ASTFromFileExName

"Ex" has precedent as indicating a largely functionally equivalent API with a different signature
"Name" suggests strongly that we're tinkering with the filename (since this APIs don't accept another name)
"ExName" is the same length as "Object" but far more explicit

Thoughts?

vstinner · 2013-12-16T23:22:20Z

Sorry, but because of the bikeshedding, I'm not more interested to work on this issue. Don't hesitate to re-work my patch if you want to fix the bug ("On Windows, these changes should allow to pass an unencodable filename on the command line").

ncoghlan · 2013-12-17T02:50:46Z

Just getting this on Larry's radar and summarising the current position.

The original problem: using "char *" to pass filenames around doesn't work properly on Windows, we need to use Unicode objects.

The solution: parallel APIs that accept PyObject * rather than char * for the filename parameters.

The new problem: both Serhiy and I find the *Object() suffix currently used for those "filename as Unicode object instead of C string" parallel APIs to be ambiguous and confusing. However, the problem the parallel APIs solve is real, and reverting or excessively modifying any of the work Victor has already done would be silly.

That means we're now in a situation where we have to either:

accept *Object as the suffix for all of these APIs indefinitely, even though it's ambiguous and confusing
choose a new suffix and use that for the APIs already added in 3.4 and add compatibility aliases for the older APIs to make them consistent
change the public API additions already made for 3.4 to new private APIs by adding an underscore prefix, and then reconsider the public API naming question for 3.5
accept *Object as the suffix for the moment, but aim to replace it with something more descriptive in Python 3.5

Neither Serhiy nor I are comfortable with the first option, and making a decision in haste for the second option doesn't seem like a good idea. Option 3 seems like far too much work to make things less useful (a capability that works, but has an ambiguous and confusing name, is better than a capability that isn't provided at all)

That leaves option number 4: don't change anything further now, but revisit it for 3.5, including changing the preferred name of the existing APIs.

I like that approach, so I'm assigning to myself to take a closer look at how some of the suggestions above read in the docs once 3.4 is out the door.

larryhastings · 2013-12-17T14:38:11Z

So all the PyRun_*Object functions are new in 3.4, and none of them are documented yet?

Option 4 is silly--I don't think we should ship them as public APIs in 3.4 if we're planning to rename them. I prefer the previous options.

p.s. fwiw I hate "ExName".

serhiy-storchaka · 2013-12-17T14:55:48Z

So all the PyRun_*Object functions are new in 3.4, and none of them are documented yet?

Not all. Only following functions are new in 3.4:

Parser/parsetok.c:PyParser_ParseStringObject
Parser/parsetok.c:PyParser_ParseFileObject
Python/future.c:PyFuture_FromASTObject
Python/symtable.c:PySymtable_BuildObject
Python/compile.c:PyAST_CompileObject
Python/_warnings.c:PyErr_WarnExplicitObject
Python/ast.c:PyAST_FromNodeObject
Python/errors.c:PyErr_SyntaxLocationObject
Python/errors.c:PyErr_ProgramTextObject
Python/pythonrun.c:PyRun_InteractiveOneObject
Python/pythonrun.c:Py_CompileStringObject
Python/pythonrun.c:Py_SymtableStringObject
Python/pythonrun.c:PyParser_ASTFromStringObject
Python/pythonrun.c:PyParser_ASTFromFileObject

Following functions existed in 3.3:

Objects/moduleobject.c:PyModule_NewObject
Objects/moduleobject.c:PyModule_GetNameObject
Objects/moduleobject.c:PyModule_GetFilenameObject
Objects/abstract.c:PyObject_CallObject
Objects/bytesobject.c:PyBytes_FromObject
Objects/fileobject.c:PyFile_WriteObject
Objects/memoryobject.c:PyMemoryView_FromObject
Objects/longobject.c:PyLong_FromUnicodeObject
Objects/weakrefobject.c:PyWeakref_GetObject
Objects/exceptions.c:PyUnicodeEncodeError_GetObject
Objects/exceptions.c:PyUnicodeDecodeError_GetObject
Objects/exceptions.c:PyUnicodeTranslateError_GetObject
Objects/unicodeobject.c:PyUnicode_FromObject
Objects/unicodeobject.c:PyUnicode_FromEncodedObject
Objects/unicodeobject.c:PyUnicode_AsDecodedObject
Objects/unicodeobject.c:PyUnicode_AsEncodedObject
Objects/bytearrayobject.c:PyByteArray_FromObject
Python/sysmodule.c:PySys_GetObject
Python/sysmodule.c:PySys_SetObject
Python/errors.c:PyErr_SetObject
Python/errors.c:PyErr_SetFromErrnoWithFilenameObject
Python/import.c:_PyImport_FixupExtensionObject
Python/import.c:_PyImport_FindExtensionObject
Python/import.c:PyImport_AddModuleObject
Python/import.c:PyImport_ExecCodeModuleObject
Python/import.c:PyImport_ImportFrozenModuleObject
Python/import.c:PyImport_ImportModuleLevelObject
Python/modsupport.c:PyModule_AddObject
Python/pyarena.c:PyArena_AddPyObject

larryhastings · 2013-12-17T14:58:23Z

Are all the functions that use "Object" to indicate "Unicode object instead of string" new in 3.4? Of those, how many are undocumented?

serhiy-storchaka · 2013-12-17T15:16:13Z

Are all the functions that use "Object" to indicate "Unicode object instead
of string" new in 3.4? Of those, how many are undocumented?

Following 5 functions work with PyObject* filenames and have Object-less
variants which works with char * filenames:

Python/errors.c:PyErr_SetFromErrnoWithFilenameObject
Python/import.c:PyImport_AddModuleObject
Python/import.c:PyImport_ExecCodeModuleObject
Python/import.c:PyImport_ImportFrozenModuleObject
Python/import.c:PyImport_ImportModuleLevelObject

Private _PyImport_FixupExtensionObject and _PyImport_FindExtensionObject have
no Object-less variants.

All other *Object functions are unrelated.

larryhastings · 2013-12-17T15:33:17Z

Are those five functions new in 3.4 and undocumented?

larryhastings · 2013-12-17T15:34:43Z

Are we proposing renaming any functions that are either
a) not new in 3.4, or
b) were documented as of 3.4 beta 1?

serhiy-storchaka · 2013-12-17T15:45:44Z

Are those five functions new in 3.4 and undocumented?

PyErr_SetFromErrnoWithFilenameObject exists even in 2.7. Other 4
PyImport_*Object functions all added in 3.3 (see bpo-3080). All 5 functions
are documented.

14 new functions were added in 3.4.

Drekin · 2015-08-04T12:20:08Z

I'm not sure this is the right issue. The support for Unicode filenames is not (at least on Windows) ideal.

Let α.py be a Python script with invalid syntax.

py α.py
File "<encoding error>", line 2
as as compile error
^
SyntaxError: invalid syntax

On the other hand, if run.py is does something like

path = sys.argv[1]
with tokenize.open(path) as f:
    source = f.read()
code = compile(source, path, "exec")
exec(code, __main__.__dict__)

we get

py run.py α.py
File "Python Unicode\\u03b1.py", line 2
as as compile error
^
SyntaxError: invalid syntax

(or 'File "Python Unicode\α.py", line 2' depending on whether sys.stdout can encode the string).

So the "<encoding error>" in the first example is unfortunate as it is easy to get better result even by a simple pure Python approach.

serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Nov 7, 2013

ncoghlan self-assigned this Dec 17, 2013

ncoghlan removed their assignment Jun 28, 2015

vstinner closed this as completed Oct 2, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new PyRun_xxx() functions to not encode the filename #63717

Add new PyRun_xxx() functions to not encode the filename #63717

vstinner commented Nov 7, 2013

vstinner commented Nov 7, 2013

serhiy-storchaka commented Nov 7, 2013

vstinner commented Nov 7, 2013

serhiy-storchaka commented Nov 7, 2013

vstinner commented Nov 7, 2013

vstinner commented Nov 7, 2013

ericsnowcurrently commented Nov 8, 2013

vstinner commented Nov 8, 2013

ericsnowcurrently commented Nov 8, 2013

ncoghlan commented Nov 8, 2013

vstinner commented Nov 19, 2013

serhiy-storchaka commented Nov 20, 2013

vstinner commented Nov 20, 2013

serhiy-storchaka commented Nov 20, 2013

ncoghlan commented Nov 20, 2013

vstinner commented Nov 20, 2013

ncoghlan commented Nov 20, 2013

serhiy-storchaka commented Nov 20, 2013

vstinner commented Nov 21, 2013

serhiy-storchaka commented Nov 21, 2013

vstinner commented Nov 21, 2013

ncoghlan commented Nov 21, 2013

vstinner commented Dec 16, 2013

ncoghlan commented Dec 17, 2013

larryhastings commented Dec 17, 2013

serhiy-storchaka commented Dec 17, 2013

larryhastings commented Dec 17, 2013

serhiy-storchaka commented Dec 17, 2013

larryhastings commented Dec 17, 2013

larryhastings commented Dec 17, 2013

serhiy-storchaka commented Dec 17, 2013

Drekin mannequin commented Aug 4, 2015

Add new PyRun_xxx() functions to not encode the filename #63717

Add new PyRun_xxx() functions to not encode the filename #63717

Comments

vstinner commented Nov 7, 2013

vstinner commented Nov 7, 2013

serhiy-storchaka commented Nov 7, 2013

vstinner commented Nov 7, 2013

serhiy-storchaka commented Nov 7, 2013

vstinner commented Nov 7, 2013

vstinner commented Nov 7, 2013

ericsnowcurrently commented Nov 8, 2013

vstinner commented Nov 8, 2013

ericsnowcurrently commented Nov 8, 2013

ncoghlan commented Nov 8, 2013

vstinner commented Nov 19, 2013

serhiy-storchaka commented Nov 20, 2013

vstinner commented Nov 20, 2013

serhiy-storchaka commented Nov 20, 2013

ncoghlan commented Nov 20, 2013

vstinner commented Nov 20, 2013

ncoghlan commented Nov 20, 2013

serhiy-storchaka commented Nov 20, 2013

vstinner commented Nov 21, 2013

serhiy-storchaka commented Nov 21, 2013

vstinner commented Nov 21, 2013

ncoghlan commented Nov 21, 2013

vstinner commented Dec 16, 2013

ncoghlan commented Dec 17, 2013

larryhastings commented Dec 17, 2013

serhiy-storchaka commented Dec 17, 2013

larryhastings commented Dec 17, 2013

serhiy-storchaka commented Dec 17, 2013

larryhastings commented Dec 17, 2013

larryhastings commented Dec 17, 2013

serhiy-storchaka commented Dec 17, 2013

Drekin mannequin commented Aug 4, 2015