classification
Title: Deprecate calling Py_Main() after Py_Initialize()? Add Py_InitializeFromArgv()?
Type: Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: inada.naoki, ncoghlan, steve.dower, vstinner
Priority: normal Keywords:

Created on 2019-03-06 01:08 by vstinner, last changed 2019-05-27 15:28 by vstinner. This issue is now closed.

Messages (9)
msg337256 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 01:08
If Py_Main() is called after Py_Initialize(), the configuration read by Py_Main() is mostly ignored to only keep the configuration read and writen by Py_Initialize(). Only sys.argv and the internal "path configuration" are updated. Problem: in this case, "core_config" is copied into PyInterpreter.core_config anyway, creating an inconsistency.

Technically, Py_Main() could be smarter and only partially update PyInterpreterState.core_config, but... is it really worth it?

Py_Main() can get many options from the command line arguments. For example, if "-X dev" is passed on the command line, the memory allocator should be "debug". Problem: Py_Initialize() already allocated a lot of memory, and it is no longer possible to change the memory allocator.

I propose to start to emit a deprecation warning when Py_Main() is called after Py_Initialize(): calling Py_Main() alone is just fine.

See bpo-34008: "Do we support calling Py_Main() after Py_Initialize()?". I had to fix a regression in Python 3.7 to fix the application called "fontforge".

Pseudo-code of fontforge:

Py_Initialize()
for file in files:
   PyRun_SimpleFileEx(file)
Py_Main(arg, argv)
Py_Finalize()

https://github.com/fontforge/fontforge/blob/cec4a984abb41419bf92fc58e5de0170404f0303/fontforge/python.c

Maybe we need to add a new "initialization" API which accepts (argc, argv)? I implemented such function in bpo-36142, but added functions are private:

* _PyPreConfig_ReadFromArgv()
* _PyCoreConfig_ReadFromArgv()

This issue has been discussed at:
https://discuss.python.org/t/adding-char-based-apis-for-unix/916/22
msg337257 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 01:10
PySys_SetArgvEx() can be called before Py_Initialize(), but arguments passed to this function are not parsed.

https://docs.python.org/dev/c-api/init.html#c.PySys_SetArgvEx
msg337258 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 01:13
INADA-san proposed to make the existing _Py_UnixMain() function public. (Currently, the function is private.)
https://discuss.python.org/t/adding-char-based-apis-for-unix/916
msg337259 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 01:19
See also bpo-36202: Calling Py_DecodeLocale() before _PyPreConfig_Write() can produce mojibake.
msg337310 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-03-06 14:54
I like having the functions you added to parse argv into config, and I like that they are separate from setting sys.argv.

Might it be better to go the other way and deprecate calling Main *without* Initialize? That's easy to fix in Programs/python.c, and eventually Main will just be the "standard" startup sequence based on argv and environ, right?
msg337313 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-03-06 14:59
RE making UnixMain public, I'd rather the core runtime require a known encoding, rather than trying to detect it. We should move the call into the detection logic into Programs/python.c so that embedders have to opt-in to detection (many embedding scenarios will prefer to do their own encoding).

Provided it's Unicode, I don't care whether it's a char or wchar API. Windows can always reliably convert to UTF-8, so if Linux needs some extra help here by using char then that's fine.
msg337319 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 16:27
> RE making UnixMain public, I'd rather the core runtime require a known encoding, rather than trying to detect it. We should move the call into the detection logic into Programs/python.c so that embedders have to opt-in to detection (many embedding scenarios will prefer to do their own encoding).

Unix is a very complex beast and Python makes it worse by adding more options (PEP 538 and PEP 540). Py_UnixMain() works "as expected": it uses the LC_CTYPE locale encoding.

If you want to force the usage of UTF-8, you can opt-in for UTF-8 mode: call putenv("PYTHONUTF8=1") before Py_UnixMain() for example.

You cannot pass an encoding to Py_UnixMain() because the implementation of Python heavily rely on the LC_CTYPE locale: see Py_DecodeLocale() and Py_EncodeLocale() functions. Anyway, Python must use the locale encoding to avoid mojibake. Python must use the codec from the C library: mbstowcs() and wcstombs() to be able to load its own codecs. Python has a few codecs implemented in C like ASCII, UTF-8 and Latin1, but locales are way more diverse than that. For example, ISO-8859-15 is used for "euro" locale variants. Example:

$ LANG=fr_FR.iso885915@euro python3 -c 'import sys; print(sys.getfilesystemencoding())'
iso8859-15

Python has a ISO-8859-15 codec, but it's implemented in pure Python. Python uses importlib to laod the codec, but how does Python decodes and encodes filenames to import Lib/encodings/iso8859_15.py? That's why mbstowcs()/wcstombs() and Py_DecodeLocale()/Py_EncodeLocale() come into the game :-) Enjoy:

PyObject*
PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
{
    PyInterpreterState *interp = _PyInterpreterState_GET_UNSAFE();
    const _PyCoreConfig *config = &interp->core_config;
#if defined(__APPLE__)
    return PyUnicode_DecodeUTF8Stateful(s, size, config->filesystem_errors, NULL);
#else
    /* Bootstrap check: if the filesystem codec is implemented in Python, we
       cannot use it to encode and decode filenames before it is loaded. Load
       the Python codec requires to encode at least its own filename. Use the C
       implementation of the locale codec until the codec registry is
       initialized and the Python codec is loaded. See initfsencoding(). */
    if (interp->fscodec_initialized) {
        return PyUnicode_Decode(s, size,
                                config->filesystem_encoding,
                                config->filesystem_errors);
    }
    else {
        return unicode_decode_locale(s, size,
                                     config->filesystem_errors, 0);
    }
#endif
}
msg337338 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-03-06 18:07
> If you want to force the usage of UTF-8, you can opt-in for UTF-8 mode: call putenv("PYTHONUTF8=1") before Py_UnixMain() for example.

I'm not talking about forcing UTF-8, I'm talking about *assuming* it (and letting "someone else" worry about forcing it).

As I understand it UTF-8 mode, is about overriding the environment's apparent encoding and saying "skip our detection logic and always encode/decode via UTF-8". That is part of the encoding detection logic.

Our embedding APIs currently accept "whatever" and try to figure out the encoding on the inside. I'm proposing that they should accept "UTF-8" and the caller has to figure out the encoding (maybe with our helper functions).

That way embedders can just worry about UTF-8 consistently, instead of having to work around our workarounds for encoding detection.
msg343645 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-27 15:28
I added Py_RunMain() in bpo-36763 which implements the PEP 587. IMHO it's the right solution for code like fontforge (see pseudo-code in my first message).

I'm no longer sure that we should deprecate calling Py_Main() after Py_Initialize(). Backward compatibility matters more here, no? Maybe the best we can do is to mention Py_RunMain() in Py_Main() documentation.

> PySys_SetArgvEx() can be called before Py_Initialize(), but arguments passed to this function are not parsed.

With the "Python Configuration" of PEP 587, PyConfig.argv is now parsed as Python command line arguments.

> INADA-san proposed to make the existing _Py_UnixMain() function public. (Currently, the function is private.)

The PEP 587 exposes this function with the name: Py_BytesMain().

--

I close the issue.
History
Date User Action Args
2019-05-27 15:28:13vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg343645

stage: resolved
2019-03-06 18:07:14steve.dowersetmessages: + msg337338
2019-03-06 16:27:44vstinnersetmessages: + msg337319
2019-03-06 14:59:38steve.dowersetmessages: + msg337313
2019-03-06 14:54:16steve.dowersetmessages: + msg337310
2019-03-06 01:19:10vstinnersetmessages: + msg337259
2019-03-06 01:13:46vstinnersetnosy: + inada.naoki
messages: + msg337258
2019-03-06 01:10:37vstinnersetmessages: + msg337257
2019-03-06 01:08:45vstinnercreate