msg202326 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-07 11:47 |
The changeset af822a6c9faf of the issue #19512 added the function PyRun_InteractiveOneObject(). By the way, I forgot to document this function. This issue is also a reminder for that. The purpose of the new function is to avoid creation of temporary Unicode strings and useless call to Unicode encoder/decoder.
I propose to generalize the change to other PyRun_xxx() functions. Attached patch adds the following functions:
- PyRun_AnyFileObject()
- PyRun_SimpleFileObject()
- PyRun_InteractiveLoopObject()
- PyRun_FileObject()
On Windows, these changes should allow to pass an unencodable filename on the command line (ex: japanese script name on an english setup).
TODO: I should document all these new functions.
|
msg202329 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-11-07 12:32 |
> On Windows, these changes should allow to pass an unencodable filename on the command line (ex: japanese script name on an english setup).
Doesn't the surrogateescape error handler solve this issue?
|
msg202335 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-07 13:01 |
2013/11/7 Serhiy Storchaka <report@bugs.python.org>:
>> On Windows, these changes should allow to pass an unencodable filename on the command line (ex: japanese script name on an english setup).
>
> Doesn't the surrogateescape error handler solve this issue?
surrogateescape is very specific to UNIX, or more generally systems
using bytes filenames. Windows native type for filename is Unicode. To
support any Unicode filename on Windows, you must never encode a
filename.
surrogateescape avoids decoding errors, here is the problem is an
encoding error.
For example, "abé" cannot be encoded to ASCII. "abé".encode("ascii",
"surrogateescape") doesn't help here.
|
msg202338 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-11-07 13:31 |
I added some comments on Rietveld.
Please do not commit without documentation and tests.
|
msg202392 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-07 22:42 |
Updated patch addressing some remarks of Serhiy and adding documentation.
|
msg202393 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-07 22:43 |
> Updated patch addressing some remarks of Serhiy and adding documentation.
Oh, and it adds also an unit test. I didn't run the unit test on Windows yet.
|
msg202397 - (view) |
Author: Eric Snow (eric.snow) * |
Date: 2013-11-08 00:05 |
PEP 432 relates pretty closely here.
|
msg202398 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-08 00:07 |
> PEP 432 relates pretty closely here.
What is the relation between this issue and the PEP 432?
|
msg202399 - (view) |
Author: Eric Snow (eric.snow) * |
Date: 2013-11-08 00:27 |
PEP 432 is all about the PyRun_* API and especially relates to refactoring it with the goal of improving extensibility and maintainability. I'm sure Nick could expound, but the PEP is a response to the cruft that has accumulated over the years in Python/pythonrun.c. The result of that organic growth makes it harder than necessary to do things like adding new commandline options. While I haven't looked closely at the new function you added, I expect PEP 432 would have simplified things or even removed the need for a new function.
|
msg202411 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2013-11-08 09:45 |
PEP 432 doesn't really touch the PyRun_* APIs - it's all about refactoring
Py_Initialize so you can use most of the C API during the latter parts of
the configuration process (e.g. setting up the path for the import system).
pythonrun.c is just a monstrous beast that covers the entire interpreter
lifecycle from initialisation through script execution through to
termination.
|
msg203447 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-19 23:57 |
> Updated patch addressing some remarks of Serhiy and adding documentation.
Anyone for a new review?
|
msg203464 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-11-20 07:45 |
PyRun_FileObject() looks misleading, because it works with FILE*, not with a file object.
|
msg203474 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-20 13:38 |
> PyRun_FileObject() looks misleading, because it works with FILE*, not with a file object.
I simply replaced the current suffix with Object(). Only filename is converted from char* to PyObject*. Do you have a better suggestion for the new name?
|
msg203476 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-11-20 13:48 |
No I have not a better suggestion. But I afraid that one day you will wanted to extend PyRun_File*() function to work with a general Python file object (perhaps there is such issue already) and then you will encountered a problem.
|
msg203480 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2013-11-20 14:13 |
Perhaps we could we use the suffix "Unicode" rather than "Object"? These don't work with arbitrary objects, they expect a unicode string.
PyRun_InteractiveOneObject would be updated to use the new suffix as well.
That would both be clearer for the user, and address Serhiy's concern about the possible ambiguity: PyRun_FileUnicode still isn't crystal clear, but it's clearer than PyRun_FileObject.
|
msg203481 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-20 14:17 |
FYI I already added a bunch of new functions with Object suffix when I replaced char* with PyObject*.
Example:
http://hg.python.org/cpython/rev/df2fdd42b375
http://bugs.python.org/issue11619
|
msg203489 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2013-11-20 15:03 |
Hmm, reading more of those and I think Serhiy is definitely right -
Object is the wrong suffix. Unicode isn't right either, since the main
problem is that ambiguity around *which* parameter is a Python Unicode
object. The API names that end in *StringObject or *FileObject don't
give the right idea at all.
The shortest accurate suffix I can come up with at the moment is the
verbose "WithUnicodeFilename":
PyParser_ParseStringObject vs
PyParser_ParseStringWithUnicodeFilename
Other possibilities:
PyParser_ParseStringUnicode # Huh?
PyParser_ParseStringDecodedFilename # Slight fib on Windows, but
mostly accurate
PyParser_ParseStringAnyFilename
Inserting an underscore before the suffix is another option (although
I don't think it much matters either way).
|
msg203490 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-11-20 15:11 |
> FYI I already added a bunch of new functions with Object suffix when I replaced char* with PyObject*.
Most of them were added in 3.4. Unfortunately several functions were added earlier (e.g. PyImport_ExecCodeModuleObject, PyErr_SetFromErrnoWithFilenameObject).
|
msg203592 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-21 09:09 |
So, which suffix should be used?
|
msg203593 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-11-21 09:18 |
"*Unicode" suffix in existing functions means Py_UNICODE* argument.
May be "*Ex2"? It can't be misinterpreted but looks ugly.
|
msg203608 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-21 10:36 |
> "*Unicode" suffix in existing functions means Py_UNICODE* argument.
Yes, this is why I chose Object() suffix. Are you still opposed to
"Object" suffix?
(Yes, "*Ex2" is really ugly.)
|
msg203618 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2013-11-21 12:04 |
How about "ExName"?
This patch:
PyRun_AnyFileExName
PyRun_SimpleFileExName
PyRun_InteractiveOneExName
PyRun_InteractiveLoopExName
PyRun_FileExName
Previous patch:
Py_CompileStringExName
PyAST_FromNodeExName
PyAST_CompileExName
PyFuture_FromASTExName
PyParser_ParseFileExName
PyParser_ParseStringExName
PyErr_SyntaxLocationExName
PyErr_ProgramTextExName
PyParser_ASTFromStringExName
PyParser_ASTFromFileExName
- "Ex" has precedent as indicating a largely functionally equivalent API with a different signature
- "Name" suggests strongly that we're tinkering with the filename (since this APIs don't accept another name)
- "ExName" is the same length as "Object" but far more explicit
Thoughts?
|
msg206391 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-12-16 23:22 |
Sorry, but because of the bikeshedding, I'm not more interested to work on this issue. Don't hesitate to re-work my patch if you want to fix the bug ("On Windows, these changes should allow to pass an unencodable filename on the command line").
|
msg206396 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2013-12-17 02:50 |
Just getting this on Larry's radar and summarising the current position.
The original problem: using "char *" to pass filenames around doesn't work properly on Windows, we need to use Unicode objects.
The solution: parallel APIs that accept PyObject * rather than char * for the filename parameters.
The new problem: both Serhiy and I find the *Object() suffix currently used for those "filename as Unicode object instead of C string" parallel APIs to be ambiguous and confusing. However, the problem the parallel APIs solve is real, and reverting or excessively modifying any of the work Victor has already done would be silly.
That means we're now in a situation where we have to either:
* accept *Object as the suffix for all of these APIs indefinitely, even though it's ambiguous and confusing
* choose a new suffix and use that for the APIs already added in 3.4 and add compatibility aliases for the older APIs to make them consistent
* change the public API additions already made for 3.4 to new private APIs by adding an underscore prefix, and then reconsider the public API naming question for 3.5
* accept *Object as the suffix for the moment, but aim to replace it with something more descriptive in Python 3.5
Neither Serhiy nor I are comfortable with the first option, and making a decision in haste for the second option doesn't seem like a good idea. Option 3 seems like far too much work to make things less useful (a capability that works, but has an ambiguous and confusing name, is better than a capability that isn't provided at all)
That leaves option number 4: don't change anything further now, but revisit it for 3.5, including changing the preferred name of the existing APIs.
I like that approach, so I'm assigning to myself to take a closer look at how some of the suggestions above read in the docs once 3.4 is out the door.
|
msg206449 - (view) |
Author: Larry Hastings (larry) * |
Date: 2013-12-17 14:38 |
So all the PyRun_*Object functions are new in 3.4, and none of them are documented yet?
Option 4 is silly--I don't think we should ship them as public APIs in 3.4 if we're planning to rename them. I prefer the previous options.
p.s. fwiw I hate "ExName".
|
msg206453 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-12-17 14:55 |
> So all the PyRun_*Object functions are new in 3.4, and none of them are documented yet?
Not all. Only following functions are new in 3.4:
Parser/parsetok.c:PyParser_ParseStringObject
Parser/parsetok.c:PyParser_ParseFileObject
Python/future.c:PyFuture_FromASTObject
Python/symtable.c:PySymtable_BuildObject
Python/compile.c:PyAST_CompileObject
Python/_warnings.c:PyErr_WarnExplicitObject
Python/ast.c:PyAST_FromNodeObject
Python/errors.c:PyErr_SyntaxLocationObject
Python/errors.c:PyErr_ProgramTextObject
Python/pythonrun.c:PyRun_InteractiveOneObject
Python/pythonrun.c:Py_CompileStringObject
Python/pythonrun.c:Py_SymtableStringObject
Python/pythonrun.c:PyParser_ASTFromStringObject
Python/pythonrun.c:PyParser_ASTFromFileObject
Following functions existed in 3.3:
Objects/moduleobject.c:PyModule_NewObject
Objects/moduleobject.c:PyModule_GetNameObject
Objects/moduleobject.c:PyModule_GetFilenameObject
Objects/abstract.c:PyObject_CallObject
Objects/bytesobject.c:PyBytes_FromObject
Objects/fileobject.c:PyFile_WriteObject
Objects/memoryobject.c:PyMemoryView_FromObject
Objects/longobject.c:PyLong_FromUnicodeObject
Objects/weakrefobject.c:PyWeakref_GetObject
Objects/exceptions.c:PyUnicodeEncodeError_GetObject
Objects/exceptions.c:PyUnicodeDecodeError_GetObject
Objects/exceptions.c:PyUnicodeTranslateError_GetObject
Objects/unicodeobject.c:PyUnicode_FromObject
Objects/unicodeobject.c:PyUnicode_FromEncodedObject
Objects/unicodeobject.c:PyUnicode_AsDecodedObject
Objects/unicodeobject.c:PyUnicode_AsEncodedObject
Objects/bytearrayobject.c:PyByteArray_FromObject
Python/sysmodule.c:PySys_GetObject
Python/sysmodule.c:PySys_SetObject
Python/errors.c:PyErr_SetObject
Python/errors.c:PyErr_SetFromErrnoWithFilenameObject
Python/import.c:_PyImport_FixupExtensionObject
Python/import.c:_PyImport_FindExtensionObject
Python/import.c:PyImport_AddModuleObject
Python/import.c:PyImport_ExecCodeModuleObject
Python/import.c:PyImport_ImportFrozenModuleObject
Python/import.c:PyImport_ImportModuleLevelObject
Python/modsupport.c:PyModule_AddObject
Python/pyarena.c:PyArena_AddPyObject
|
msg206456 - (view) |
Author: Larry Hastings (larry) * |
Date: 2013-12-17 14:58 |
Are all the functions that use "Object" to indicate "Unicode object instead of string" new in 3.4? Of those, how many are undocumented?
|
msg206460 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-12-17 15:16 |
> Are all the functions that use "Object" to indicate "Unicode object instead
> of string" new in 3.4? Of those, how many are undocumented?
Following 5 functions work with PyObject* filenames and have Object-less
variants which works with char * filenames:
Python/errors.c:PyErr_SetFromErrnoWithFilenameObject
Python/import.c:PyImport_AddModuleObject
Python/import.c:PyImport_ExecCodeModuleObject
Python/import.c:PyImport_ImportFrozenModuleObject
Python/import.c:PyImport_ImportModuleLevelObject
Private _PyImport_FixupExtensionObject and _PyImport_FindExtensionObject have
no Object-less variants.
All other *Object functions are unrelated.
|
msg206462 - (view) |
Author: Larry Hastings (larry) * |
Date: 2013-12-17 15:33 |
Are those five functions new in 3.4 and undocumented?
|
msg206464 - (view) |
Author: Larry Hastings (larry) * |
Date: 2013-12-17 15:34 |
Are we proposing renaming any functions that are either
a) not new in 3.4, or
b) were documented as of 3.4 beta 1?
|
msg206466 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-12-17 15:45 |
> Are those five functions new in 3.4 and undocumented?
PyErr_SetFromErrnoWithFilenameObject exists even in 2.7. Other 4
PyImport_*Object functions all added in 3.3 (see issue3080). All 5 functions
are documented.
14 new functions were added in 3.4.
|
msg247988 - (view) |
Author: Adam Bartoš (Drekin) * |
Date: 2015-08-04 12:20 |
I'm not sure this is the right issue. The support for Unicode filenames is not (at least on Windows) ideal.
Let α.py be a Python script with invalid syntax.
> py α.py
File "<encoding error>", line 2
as as compile error
^
SyntaxError: invalid syntax
On the other hand, if run.py is does something like
path = sys.argv[1]
with tokenize.open(path) as f:
source = f.read()
code = compile(source, path, "exec")
exec(code, __main__.__dict__)
we get
> py run.py α.py
File "Python Unicode\\u03b1.py", line 2
as as compile error
^
SyntaxError: invalid syntax
(or 'File "Python Unicode\α.py", line 2' depending on whether sys.stdout can encode the string).
So the "<encoding error>" in the first example is unfortunate as it is easy to get better result even by a simple pure Python approach.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:53 | admin | set | github: 63717 |
2015-10-02 21:09:44 | vstinner | set | status: open -> closed resolution: out of date |
2015-08-04 12:20:07 | Drekin | set | nosy:
+ Drekin messages:
+ msg247988
|
2015-06-28 03:03:46 | ncoghlan | set | assignee: ncoghlan -> |
2013-12-17 15:45:44 | serhiy.storchaka | set | messages:
+ msg206466 |
2013-12-17 15:34:42 | larry | set | messages:
+ msg206464 |
2013-12-17 15:33:16 | larry | set | messages:
+ msg206462 |
2013-12-17 15:16:12 | serhiy.storchaka | set | messages:
+ msg206460 |
2013-12-17 14:58:23 | larry | set | messages:
+ msg206456 |
2013-12-17 14:55:48 | serhiy.storchaka | set | messages:
+ msg206453 |
2013-12-17 14:38:11 | larry | set | messages:
+ msg206449 |
2013-12-17 02:54:09 | ncoghlan | set | priority: normal |
2013-12-17 02:50:48 | ncoghlan | set | priority: normal -> (no value)
nosy:
+ larry versions:
+ Python 3.5, - Python 3.4 messages:
+ msg206396
assignee: ncoghlan |
2013-12-16 23:22:32 | vstinner | set | nosy:
- vstinner
|
2013-12-16 23:22:20 | vstinner | set | nosy:
georg.brandl, ncoghlan, vstinner, Arfrever, eric.snow, serhiy.storchaka messages:
+ msg206391 |
2013-11-21 12:04:49 | ncoghlan | set | messages:
+ msg203618 |
2013-11-21 10:36:47 | vstinner | set | messages:
+ msg203608 |
2013-11-21 09:18:38 | serhiy.storchaka | set | messages:
+ msg203593 |
2013-11-21 09:09:13 | vstinner | set | messages:
+ msg203592 |
2013-11-20 15:11:51 | serhiy.storchaka | set | messages:
+ msg203490 |
2013-11-20 15:03:34 | ncoghlan | set | messages:
+ msg203489 |
2013-11-20 14:17:03 | vstinner | set | messages:
+ msg203481 |
2013-11-20 14:13:57 | ncoghlan | set | messages:
+ msg203480 |
2013-11-20 13:48:25 | serhiy.storchaka | set | messages:
+ msg203476 |
2013-11-20 13:38:59 | vstinner | set | messages:
+ msg203474 |
2013-11-20 07:45:57 | serhiy.storchaka | set | messages:
+ msg203464 |
2013-11-19 23:57:54 | vstinner | set | messages:
+ msg203447 |
2013-11-08 09:45:35 | ncoghlan | set | messages:
+ msg202411 |
2013-11-08 00:27:12 | eric.snow | set | messages:
+ msg202399 |
2013-11-08 00:07:30 | vstinner | set | messages:
+ msg202398 |
2013-11-08 00:05:16 | eric.snow | set | nosy:
+ eric.snow, ncoghlan messages:
+ msg202397
|
2013-11-07 22:43:22 | vstinner | set | messages:
+ msg202393 |
2013-11-07 22:42:52 | vstinner | set | files:
+ pyrun_object-2.patch
messages:
+ msg202392 |
2013-11-07 16:48:24 | Arfrever | set | nosy:
+ Arfrever
|
2013-11-07 13:31:46 | serhiy.storchaka | set | messages:
+ msg202338 |
2013-11-07 13:02:54 | vstinner | set | nosy:
+ georg.brandl
|
2013-11-07 13:01:19 | vstinner | set | messages:
+ msg202335 |
2013-11-07 12:32:41 | serhiy.storchaka | set | messages:
+ msg202329 |
2013-11-07 12:30:46 | serhiy.storchaka | set | type: enhancement components:
+ Interpreter Core stage: test needed |
2013-11-07 11:48:00 | vstinner | create | |