-
-
Notifications
You must be signed in to change notification settings - Fork 29.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new PyRun_xxx() functions to not encode the filename #63717
Comments
The changeset af822a6c9faf of the issue bpo-19512 added the function PyRun_InteractiveOneObject(). By the way, I forgot to document this function. This issue is also a reminder for that. The purpose of the new function is to avoid creation of temporary Unicode strings and useless call to Unicode encoder/decoder. I propose to generalize the change to other PyRun_xxx() functions. Attached patch adds the following functions:
On Windows, these changes should allow to pass an unencodable filename on the command line (ex: japanese script name on an english setup). TODO: I should document all these new functions. |
Doesn't the surrogateescape error handler solve this issue? |
2013/11/7 Serhiy Storchaka <report@bugs.python.org>:
surrogateescape is very specific to UNIX, or more generally systems surrogateescape avoids decoding errors, here is the problem is an For example, "abé" cannot be encoded to ASCII. "abé".encode("ascii", |
I added some comments on Rietveld. Please do not commit without documentation and tests. |
Updated patch addressing some remarks of Serhiy and adding documentation. |
Oh, and it adds also an unit test. I didn't run the unit test on Windows yet. |
PEP-432 relates pretty closely here. |
What is the relation between this issue and the PEP-432? |
PEP-432 is all about the PyRun_* API and especially relates to refactoring it with the goal of improving extensibility and maintainability. I'm sure Nick could expound, but the PEP is a response to the cruft that has accumulated over the years in Python/pythonrun.c. The result of that organic growth makes it harder than necessary to do things like adding new commandline options. While I haven't looked closely at the new function you added, I expect PEP-432 would have simplified things or even removed the need for a new function. |
PEP-432 doesn't really touch the PyRun_* APIs - it's all about refactoring pythonrun.c is just a monstrous beast that covers the entire interpreter |
Anyone for a new review? |
PyRun_FileObject() looks misleading, because it works with FILE*, not with a file object. |
I simply replaced the current suffix with Object(). Only filename is converted from char* to PyObject*. Do you have a better suggestion for the new name? |
No I have not a better suggestion. But I afraid that one day you will wanted to extend PyRun_File*() function to work with a general Python file object (perhaps there is such issue already) and then you will encountered a problem. |
Perhaps we could we use the suffix "Unicode" rather than "Object"? These don't work with arbitrary objects, they expect a unicode string. PyRun_InteractiveOneObject would be updated to use the new suffix as well. That would both be clearer for the user, and address Serhiy's concern about the possible ambiguity: PyRun_FileUnicode still isn't crystal clear, but it's clearer than PyRun_FileObject. |
FYI I already added a bunch of new functions with Object suffix when I replaced char* with PyObject*. Example: http://hg.python.org/cpython/rev/df2fdd42b375 |
Hmm, reading more of those and I think Serhiy is definitely right - The shortest accurate suffix I can come up with at the moment is the
Other possibilities:
mostly accurate Inserting an underscore before the suffix is another option (although |
Most of them were added in 3.4. Unfortunately several functions were added earlier (e.g. PyImport_ExecCodeModuleObject, PyErr_SetFromErrnoWithFilenameObject). |
So, which suffix should be used? |
"*Unicode" suffix in existing functions means Py_UNICODE* argument. May be "*Ex2"? It can't be misinterpreted but looks ugly. |
Yes, this is why I chose Object() suffix. Are you still opposed to (Yes, "*Ex2" is really ugly.) |
How about "ExName"? This patch: Previous patch:
Thoughts? |
Sorry, but because of the bikeshedding, I'm not more interested to work on this issue. Don't hesitate to re-work my patch if you want to fix the bug ("On Windows, these changes should allow to pass an unencodable filename on the command line"). |
Just getting this on Larry's radar and summarising the current position. The original problem: using "char *" to pass filenames around doesn't work properly on Windows, we need to use Unicode objects. The solution: parallel APIs that accept PyObject * rather than char * for the filename parameters. The new problem: both Serhiy and I find the *Object() suffix currently used for those "filename as Unicode object instead of C string" parallel APIs to be ambiguous and confusing. However, the problem the parallel APIs solve is real, and reverting or excessively modifying any of the work Victor has already done would be silly. That means we're now in a situation where we have to either:
Neither Serhiy nor I are comfortable with the first option, and making a decision in haste for the second option doesn't seem like a good idea. Option 3 seems like far too much work to make things less useful (a capability that works, but has an ambiguous and confusing name, is better than a capability that isn't provided at all) That leaves option number 4: don't change anything further now, but revisit it for 3.5, including changing the preferred name of the existing APIs. I like that approach, so I'm assigning to myself to take a closer look at how some of the suggestions above read in the docs once 3.4 is out the door. |
So all the PyRun_*Object functions are new in 3.4, and none of them are documented yet? Option 4 is silly--I don't think we should ship them as public APIs in 3.4 if we're planning to rename them. I prefer the previous options. p.s. fwiw I hate "ExName". |
Not all. Only following functions are new in 3.4: Parser/parsetok.c:PyParser_ParseStringObject Following functions existed in 3.3: Objects/moduleobject.c:PyModule_NewObject |
Are all the functions that use "Object" to indicate "Unicode object instead of string" new in 3.4? Of those, how many are undocumented? |
Following 5 functions work with PyObject* filenames and have Object-less Python/errors.c:PyErr_SetFromErrnoWithFilenameObject Private _PyImport_FixupExtensionObject and _PyImport_FindExtensionObject have All other *Object functions are unrelated. |
Are those five functions new in 3.4 and undocumented? |
Are we proposing renaming any functions that are either |
PyErr_SetFromErrnoWithFilenameObject exists even in 2.7. Other 4 14 new functions were added in 3.4. |
I'm not sure this is the right issue. The support for Unicode filenames is not (at least on Windows) ideal. Let α.py be a Python script with invalid syntax.
On the other hand, if run.py is does something like path = sys.argv[1]
with tokenize.open(path) as f:
source = f.read()
code = compile(source, path, "exec")
exec(code, __main__.__dict__) we get
(or 'File "Python Unicode\α.py", line 2' depending on whether sys.stdout can encode the string). So the "<encoding error>" in the first example is unfortunate as it is easy to get better result even by a simple pure Python approach. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: