Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Py3K cannot run as python -S #45608

Closed
brettcannon opened this issue Oct 11, 2007 · 48 comments
Closed

Py3K cannot run as python -S #45608

brettcannon opened this issue Oct 11, 2007 · 48 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@brettcannon
Copy link
Member

BPO 1267
Nosy @gvanrossum, @brettcannon, @tiran
Files
  • py3k_initio.patch
  • py3k_initio2.patch
  • py3k_initstdio_impfindmodule.patch
  • py3k_initstdio_impfindmodule2.patch
  • py3k_initstdio_impfindmodule3.patch
  • py3k_test_issue1267.patch
  • py3k_filefromfd.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/brettcannon'
    closed_at = <Date 2007-10-22.00:10:52.238>
    created_at = <Date 2007-10-11.23:02:39.612>
    labels = ['interpreter-core']
    title = 'Py3K cannot run as ``python -S``'
    updated_at = <Date 2008-01-06.22:29:45.547>
    user = 'https://github.com/brettcannon'

    bugs.python.org fields:

    activity = <Date 2008-01-06.22:29:45.547>
    actor = 'admin'
    assignee = 'brett.cannon'
    closed = True
    closed_date = <Date 2007-10-22.00:10:52.238>
    closer = 'gvanrossum'
    components = ['Interpreter Core']
    creation = <Date 2007-10-11.23:02:39.612>
    creator = 'brett.cannon'
    dependencies = []
    files = ['8545', '8551', '8553', '8554', '8555', '8575', '8585']
    hgrepos = []
    issue_num = 1267
    keywords = ['patch']
    message_count = 48.0
    messages = ['56350', '56465', '56484', '56488', '56493', '56494', '56499', '56502', '56503', '56504', '56564', '56565', '56574', '56579', '56581', '56582', '56583', '56584', '56585', '56586', '56587', '56588', '56589', '56590', '56591', '56592', '56593', '56594', '56595', '56597', '56598', '56599', '56600', '56604', '56605', '56607', '56608', '56609', '56610', '56611', '56612', '56613', '56622', '56623', '56625', '56626', '56631', '56633']
    nosy_count = 3.0
    nosy_names = ['gvanrossum', 'brett.cannon', 'christian.heimes']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1267'
    versions = ['Python 3.0']

    @brettcannon
    Copy link
    Member Author

    If Py3K is executed without importing site, it fails horribly. This is
    because site.py sets __builtins__.open, sys.stdout, sys.stderr, and
    sys.stdin.

    The setting of sys.stderr is especially bad as exception printing
    requires sys.stderr, otherwise it reports that sys.stderr was lost.
    This prevents debugging any exceptions thrown when site.py isn't/can't
    be imported (e.g., trying to debug bootstrapping importlib).

    @brettcannon brettcannon added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Oct 11, 2007
    @tiran
    Copy link
    Member

    tiran commented Oct 15, 2007

    I've some code around which sets sys.stdin, out and err in C code. The
    code is far from perfect and I haven't checked it for reference leaks
    yet. I like to get your comment on the style and error catching.

    @brettcannon brettcannon self-assigned this Oct 15, 2007
    @tiran
    Copy link
    Member

    tiran commented Oct 16, 2007

    I've carefully checked and tested the initstdio() method. I'm sure that
    I've catched every edged case. The unit tests pass w/o complains.

    I've also added a PyErr_Display() call to Py_FatalError(). It's still
    hard to understand an error in io.py but at least the dependency on
    site.py is removed.

    @gvanrossum
    Copy link
    Member

    I've carefully checked and tested the initstdio() method. I'm sure that
    I've catched every edged case. The unit tests pass w/o complains.

    I've also added a PyErr_Display() call to Py_FatalError(). It's still
    hard to understand an error in io.py but at least the dependency on
    site.py is removed.

    Very cool!

    Some final suggestions:

    + Py_FatalError("Py_Initialize: can't initialize sys standard"
    + "streams");

    Break this differently:

    Py_FatalError(
        "Py_........");
    

    Don't call open() with keyword arg for newline="\r"; open() takes
    positional args too. This is done specifically to simplify life for C
    code calling it. :-) Perhaps one of the PyFunction_Call(..) variants
    makes it easier to call it without having to explicitly construct the
    tuple for the argument list. (All this because you're leaking the
    value of PyString_FromString("\n"). :-)

    I would change the error handling to avoid the 'finally' label, like this:

            if (0) {
      error:
                    status = -1;
                    Py_XDECREF(args);
            }

    I would add a comment "see initstdio() in pythonrun.c" to the
    OpenWrapper class, which otherwise looks a bit mysterious (as it's not
    used anywhere in the Python code).

    Thanks for doing this!!!

    @tiran
    Copy link
    Member

    tiran commented Oct 16, 2007

    Guido van Rossum wrote:

    Don't call open() with keyword arg for newline="\r"; open() takes
    positional args too. This is done specifically to simplify life for C
    code calling it. :-) Perhaps one of the PyFunction_Call(..) variants
    makes it easier to call it without having to explicitly construct the
    tuple for the argument list. (All this because you're leaking the
    value of PyString_FromString("\n"). :-)

    I need a way to open a file with a specific encoding for
    imp.find_module(), too. I found PyFile_FromFile(). It uses io.open but
    it doesn't accept the additional arguments. I like to add another
    function PyFile_FromFileEx() to the API.

    PyObject *
    PyFile_FromFileEx(FILE *fp, char *name, char *mode, int (*close)(FILE
    *), int buffering, char *encoding, char *newline)

    Some people may also miss the PyFile_FromString() function but that's a
    different story. With the my new function I can create sys.stdin with:

    PyFile_FromFileEx(stdin, "<stdin>", "r", fclose, -1, NULL, "\n")

    I kinda like it :)

    I would change the error handling to avoid the 'finally' label, like this:

        if (0) {
    

    error:
    status = -1;
    Py_XDECREF(args);
    }

    Wow, that's tricky. :)

    I would add a comment "see initstdio() in pythonrun.c" to the
    OpenWrapper class, which otherwise looks a bit mysterious (as it's not
    used anywhere in the Python code).

    That sounds like a good idea!

    The code for const char *PyTokenizer_FindEncoding(FILE *fp) and
    imp.find_module() is also finished and works.

    Christian

    @tiran
    Copy link
    Member

    tiran commented Oct 16, 2007

    The patch is a combined patch for the imp.find_module() problem and
    initstdio. Both patches depend on the new PyFile_FromFileEx() function.
    I hope you don't mind.

    @tiran
    Copy link
    Member

    tiran commented Oct 16, 2007

    Changes since last patch:

    • removed unused import of open in initstdio()
    • fixed infinite loop in PyTokenizer_FindEncoding() by checking
      tok-done == E_OK

    @tiran
    Copy link
    Member

    tiran commented Oct 16, 2007

    Christian Heimes wrote:

    • removed unused import of open in initstdio()
    • fixed infinite loop in PyTokenizer_FindEncoding() by checking
      tok-done == E_OK

    I found another bug in Python/import.c:call_find_method. The function
    mustn't set an encoding of ftp->mode contains 'b' for binary.

    		if (strchr(fdp->mode, 'b') == NULL) {
    			/* Python text file, get encoding from tokenizer */
    			encoding = PyTokenizer_FindEncoding(fp);
    			encoding = (encoding != NULL) ? encoding :
    				   PyUnicode_GetDefaultEncoding();
    		}

    Christian

    @gvanrossum
    Copy link
    Member

    Does this mean I should hold off reviewing the patch?

    On 10/16/07, Christian Heimes <report@bugs.python.org> wrote:

    Christian Heimes added the comment:

    Christian Heimes wrote:
    > * removed unused import of open in initstdio()
    > * fixed infinite loop in PyTokenizer_FindEncoding() by checking
    > tok-done == E_OK

    I found another bug in Python/import.c:call_find_method. The function
    mustn't set an encoding of ftp->mode contains 'b' for binary.

                if (strchr(fdp-\>mode, 'b') == NULL) {
                        /* Python text file, get encoding from tokenizer \*/
                        encoding = PyTokenizer_FindEncoding(fp);
                        encoding = (encoding != NULL) ? encoding :
                                   PyUnicode_GetDefaultEncoding();
                }
    

    Christian


    Tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1267\>


    @tiran
    Copy link
    Member

    tiran commented Oct 16, 2007

    Update since last patch

    • removed unnecessary const from const char*
      PyTokenizer_FindEncoding(FILE *fp)
    • Fixed bug in find_module whith binary files

    Please review the patch.

    @gvanrossum
    Copy link
    Member

    I will take this.

    @gvanrossum gvanrossum assigned gvanrossum and unassigned brettcannon Oct 19, 2007
    @brettcannon
    Copy link
    Member Author

    Thanks, Guido.

    @gvanrossum
    Copy link
    Member

    Committed revision 58553 (with minor tweaks only).

    Thanks!

    @gvanrossum
    Copy link
    Member

    FWIW, on Mac I get six failures (all these pass on Linux):

    test_cmd_line test_inspect test_modulefinder test_pyclbr test_quopri test_runpy

    @brettcannon
    Copy link
    Member Author

    Yeah, I just noticed this as well. is it definitely this change?

    On 10/19/07, Guido van Rossum <report@bugs.python.org> wrote:

    Guido van Rossum added the comment:

    FWIW, on Mac I get six failures (all these pass on Linux):

    test_cmd_line test_inspect test_modulefinder test_pyclbr test_quopri test_runpy


    Tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1267\>


    @tiran
    Copy link
    Member

    tiran commented Oct 20, 2007

    I don't have a Mac at my disposal any more. :(

    Can you attach the output of the failing tests to the bug report? Maybe
    I can be of assistance.

    @brettcannon
    Copy link
    Member Author

    runpy is failing because pkgutil is failing because it is giving
    compile() part of a source file instead of the entire thing::

    Traceback (most recent call last):
      File "/Users/drifty/Dev/python/3.x/pristine/Lib/runpy.py", line 97,
    in _run_module_as_main
        loader, code, fname = _get_module_details(mod_name)
      File "/Users/drifty/Dev/python/3.x/pristine/Lib/runpy.py", line 82,
    in _get_module_details
        code = loader.get_code(mod_name)
      File "/Users/drifty/Dev/python/3.x/pristine/Lib/pkgutil.py", line
    275, in get_code
        self.code = compile(source, self.filename, 'exec')
      File "/Users/drifty/Dev/python/3.x/pristine/Lib/tokenize.py", line 2
        "UR'''": single3prog, 'UR"""': double3prog,
        ^
    IndentationError: unexpected indent

    That bad line is the first line in the 'source' variable being passed
    to compile(). So somewhere the beginning of the source file is being
    chopped up.

    On 10/19/07, Christian Heimes <report@bugs.python.org> wrote:

    Christian Heimes added the comment:

    I don't have a Mac at my disposal any more. :(

    Can you attach the output of the failing tests to the bug report? Maybe
    I can be of assistance.


    Tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1267\>


    @brettcannon
    Copy link
    Member Author

    It looks like the file object returned by imp.find_module() has its read
    position WAY too far forward (at least on OS X).

    Re-opening.

    @brettcannon brettcannon reopened this Oct 20, 2007
    @tiran
    Copy link
    Member

    tiran commented Oct 20, 2007

    > runpy is failing because pkgutil is failing because it is giving
    > compile() part of a source file instead of the entire thing::
    > 
    > Traceback (most recent call last):
    >   File "/Users/drifty/Dev/python/3.x/pristine/Lib/runpy.py", line 97,
    > in _run_module_as_main
    >     loader, code, fname = _get_module_details(mod_name)
    >   File "/Users/drifty/Dev/python/3.x/pristine/Lib/runpy.py", line 82,
    > in _get_module_details
    >     code = loader.get_code(mod_name)
    >   File "/Users/drifty/Dev/python/3.x/pristine/Lib/pkgutil.py", line
    > 275, in get_code
    >     self.code = compile(source, self.filename, 'exec')
    >   File "/Users/drifty/Dev/python/3.x/pristine/Lib/tokenize.py", line 2
    >     "UR'''": single3prog, 'UR"""': double3prog,
    >     ^
    > IndentationError: unexpected indent
    > 
    > That bad line is the first line in the 'source' variable being passed
    > to compile().  So somewhere the beginning of the source file is being
    > chopped up.

    Could you please comment out the PyTokenizer_FindEncoding(fp) call in
    Python/import.c to check if it is related to it? That's the only change
    (I can think of) that may be related to the problem. I always rewind the
    fp in the function but it may not work on Mac.

    Christian

    @tiran
    Copy link
    Member

    tiran commented Oct 20, 2007

    Brett Cannon wrote:

    Brett Cannon added the comment:

    It looks like the file object returned by imp.find_module() has its read
    position WAY too far forward (at least on OS X).

    That's strange. It should never read more than 2 lines of a file. I
    don't understand how it could happen.

    char *
    PyTokenizer_FindEncoding(FILE *fp) {
            struct tok_state *tok;
            char *p_start=NULL, *p_end=NULL;
    
            if ((tok = PyTokenizer_FromFile(fp, NULL, NULL, NULL)) == NULL) {
                    rewind(fp);
                    return NULL;
            }
            while(((tok->lineno <= 2) && (tok->done == E_OK))) {
                    PyTokenizer_Get(tok, &p_start, &p_end);
            }
    
            rewind(fp);
            return tok->encoding;
    }

    @brettcannon
    Copy link
    Member Author

    PyTokenizer_FindEncoding() is the culprit. If I comment it out in
    Python/import.c:call_find_module(), and thus just use the system
    encoding, runpy works again.

    @tiran
    Copy link
    Member

    tiran commented Oct 20, 2007

    It's unrelated to the problem but Parser/tokenizer.c:1619 has a minor bug.

        while(((tok-\>lineno \<= 2) && (tok-\>done == E_OK))) {
                PyTokenizer_Get(tok, &p_start, &p_end);
        }
    

    (tok->lineno < 2) is sufficient. See line 516

    Christian

    @brettcannon
    Copy link
    Member Author

    OK, for some reason, when PyTokenizer_FindEncoding() is called and the
    resulting file is big enough, it starts off with a seek position
    (according to TextIOWrapper.tell()) of 4096. That happens to be exactly
    half the size of the buffer used by io.

    But I am not sure what the magical size is as creating a file of 4095,
    4096, and 4097 bytes does not trigger this bug.

    @tiran
    Copy link
    Member

    tiran commented Oct 20, 2007

    I've made a short unit tests which tests a large file with and w/o --
    coding: -
    -. It passes on Linux.

    @tiran
    Copy link
    Member

    tiran commented Oct 20, 2007

    Brett Cannon wrote:

    Brett Cannon added the comment:

    OK, for some reason, when PyTokenizer_FindEncoding() is called and the
    resulting file is big enough, it starts off with a seek position
    (according to TextIOWrapper.tell()) of 4096. That happens to be exactly
    half the size of the buffer used by io.

    But I am not sure what the magical size is as creating a file of 4095,
    4096, and 4097 bytes does not trigger this bug.

    Maybe rewind() doesn't do what is should do? Could you replace the
    rewind() calls with fseek(fd, 0, 0)?

    w/o the rewind() calls in PyTokenizer_FindEncoding I'm getting an error
    in my new test:

    Traceback (most recent call last):
      File "Lib/test/test_imp.py", line 66, in <module>
        test_main()
      File "Lib/test/test_imp.py", line 62, in test_main
        ImportTests,
      File "/home/heimes/dev/python/py3k/Lib/test/test_support.py", line
    541, in run_unittest
        _run_suite(suite)
      File "/home/heimes/dev/python/py3k/Lib/test/test_support.py", line
    524, in _run_suite
        raise TestFailed(err)
    test.test_support.TestFailed: Traceback (most recent call last):
      File "Lib/test/test_imp.py", line 50, in test_issue1267
        self.assertEqual(fd.tell(), 0)
    AssertionError: 4096 != 0

    The number is looking familiar, isn't it? :) Is it the default buffer
    size on Unix?

    Christian

    @brettcannon
    Copy link
    Member Author

    I checked in your < 2 change and plugged a memory leak since you
    were not freeing the struct tok_state. Checked in r58555.

    @brettcannon
    Copy link
    Member Author

    Nope, didn't do it. I also checked using gdb a few minutes ago and
    ftell() was reporting a position of 0 all they way back to
    PyObject_MethodCall().

    @brettcannon
    Copy link
    Member Author

    OK, I think I might have a solution, and it is really stupid. I will
    run the unit test suite to see if it really fixes everything.

    @gvanrossum
    Copy link
    Member

    OK, I think I might have a solution, and it is really stupid. I will
    run the unit test suite to see if it really fixes everything.

    Here's keeping my fingers crossed.

    I do note that the new code still leaks the encoding string which is
    malloc'ed in PyTokenizer_FindEncoding but never freed by
    call_find_module.

    BTW, the test_inspect failure I saw was shallow; I'd renamed the
    parent directory and the failure was due to the wrong filename being
    baked into the .pyc file. The other 5 failures are all related to the
    issue you are chasing here.

    @brettcannon
    Copy link
    Member Author

    58557 as the fix. Yes, it was stupid on OS X's part and I was lucky to
    just think of the possible solution.

    Basically OS X does not like it when you do stuff with a file pointer
    but then poke around with the file descriptor. That means that when
    PyTokenizer_FindEncoding() seeked on the file pointer it was not being
    picked up by the the file descriptor that the file pointer represented.

    This suggests that perhaps we should standardize on file pointers or
    file descriptors in Python to prevent something like this from happening
    again.

    @tiran
    Copy link
    Member

    tiran commented Oct 20, 2007

    Brett Cannon wrote:

    This suggests that perhaps we should standardize on file pointers or
    file descriptors in Python to prevent something like this from happening
    again.

    rewind() it used couple of times in the Python code. Have you checked if
    the other calls are safe?

    Thx for finding the bug :) Great work!

    Christian

    @brettcannon
    Copy link
    Member Author

    On 10/19/07, Christian Heimes <report@bugs.python.org> wrote:

    Christian Heimes added the comment:

    Brett Cannon wrote:
    > This suggests that perhaps we should standardize on file pointers or
    > file descriptors in Python to prevent something like this from happening
    > again.

    rewind() it used couple of times in the Python code. Have you checked if
    the other calls are safe?

    No. I am still rather frustrated that was the problem.

    The only reason I feel safe with this solution is that the file
    pointer is not directly used except by _fileio to get the file
    descriptor. I would not trust this if the file pointer and file
    descriptor had intertwined uses. But I would trust rewind() if the
    file pointer is used the entire time.

    Thx for finding the bug :) Great work!

    Thanks. I just wish this whole ordeal had not been necessary (filed a
    bug report with Apple in hopes that this can be prevented for someone
    else). I can see why some people prefer to hack on PyPy, IronPython,
    or Jython instead of dealing with the joys of C. =)

    @tiran
    Copy link
    Member

    tiran commented Oct 20, 2007

    Brett Cannon wrote:

    Thanks. I just wish this whole ordeal had not been necessary (filed a
    bug report with Apple in hopes that this can be prevented for someone
    else). I can see why some people prefer to hack on PyPy, IronPython,
    or Jython instead of dealing with the joys of C. =)

    The bug is rather strange. I would have imagined that fseek() and
    rewind() are using the file descriptor internally. It's more than weird
    that an operation on the file handler doesn't affect the file
    descriptor. I wonder how they were able to manage it ...

    IronPython might be fun but I've hacked PythonDotNet. You get the worst
    from the C, .NET/C# and Mono together. Debugging is fun when you have to
    do multiple turns with two debuggers and one of the debuggers is partly
    broken. :[

    @gvanrossum
    Copy link
    Member

    Thanks for persevering!!!

    The dangers of switching between fileno(fp) and fp are actually well
    documented in the C and/or POSIX standards. The problem is caused in
    PyFile_FromFileEx() -- it creates a Python file object from the file
    descriptor. The fix actually only works because we're not using the
    FILE struct once PyTokenizer_FindEncoding() is called. I think it
    would be better to move the lseek() into call_find_module() so the
    FILE abstraction is not broken by PyTokenizer_FindEncoding().

    I think there's still a bug or two lurking in this area: first, each
    time you call imp.find_module() you leak a FILE object; second, the
    encoding allocated in PyTokenizer_FindEncoding() is leaked.

    You're right that a lot of this could be avoided if we used file
    descriptors consistently. It seems find_module() itself doesn't read
    the file; it just needs to know that it's possible to open the file.

    Rewriting everywhere that uses PyFile_FromFile[Ex] to use file
    descriptors doesn't seem too hard; there are only a few places.

    @gvanrossum
    Copy link
    Member

    Sorry, I was wrong about the encoding leak; you fixed that. The
    FILE/fd issue is real though.

    @tiran
    Copy link
    Member

    tiran commented Oct 20, 2007

    Guido van Rossum wrote:

    You're right that a lot of this could be avoided if we used file
    descriptors consistently. It seems find_module() itself doesn't read
    the file; it just needs to know that it's possible to open the file.

    Rewriting everywhere that uses PyFile_FromFile[Ex] to use file
    descriptors doesn't seem too hard; there are only a few places.

    If I understand you right you want to alter the interface of
    PyFile_FromFile and PyFile_FromFileEx from

    PyFile_FromFileEx(FILE *fp, char *name, char *mode, int (close)(FILE),
    int buffering, char *encoding, char *newline)

    to

    PyFile_FromFileEx(int fd, char *name, char *mode, int (close)(FILE),
    int buffering, char *encoding, char *newline)

    ?

    I could write a patch and remove int (close)(FILE), too. It's no
    longer used by the functions.

    Christian

    @brettcannon
    Copy link
    Member Author

    On 10/20/07, Christian Heimes <report@bugs.python.org> wrote:

    Christian Heimes added the comment:

    Guido van Rossum wrote:
    > You're right that a lot of this could be avoided if we used file
    > descriptors consistently. It seems find_module() itself doesn't read
    > the file; it just needs to know that it's possible to open the file.
    >
    > Rewriting everywhere that uses PyFile_FromFile[Ex] to use file
    > descriptors doesn't seem too hard; there are only a few places.

    If I understand you right you want to alter the interface of
    PyFile_FromFile and PyFile_FromFileEx from

    PyFile_FromFileEx(FILE *fp, char *name, char *mode, int (close)(FILE),
    int buffering, char *encoding, char *newline)

    to

    PyFile_FromFileEx(int fd, char *name, char *mode, int (close)(FILE),
    int buffering, char *encoding, char *newline)

    ?

    I could write a patch and remove int (close)(FILE), too. It's no
    longer used by the functions.

    Yes, I think you got it.

    -Brett

    @tiran
    Copy link
    Member

    tiran commented Oct 20, 2007

    Yes, I think you got it.

    Guido, Brett, how much of the code should I change? import.c is mainly
    using file pointers. Should I replace all operations on FILE with
    operations on a file descriptor?

    Christian

    @gvanrossum
    Copy link
    Member

    Guido, Brett, how much of the code should I change? import.c is mainly
    using file pointers. Should I replace all operations on FILE with
    operations on a file descriptor?

    I think find_module() should return a file descriptor (fd), not a
    FILE*, but most of the rest of the code should call fdopen() on that
    fd. Only call_find_module() should use the fd to turn it into a Python
    file object. Then the amount of change should be pretty minimal.

    I recommend changing the name of the API used to turn a fd into file
    object while we're at it; that's one of the few guidelines we have for
    C API changes.

    @tiran
    Copy link
    Member

    tiran commented Oct 20, 2007

    Guido van Rossum wrote:

    I think find_module() should return a file descriptor (fd), not a
    FILE*, but most of the rest of the code should call fdopen() on that
    fd. Only call_find_module() should use the fd to turn it into a Python
    file object. Then the amount of change should be pretty minimal.

    K, I understand.

    I recommend changing the name of the API used to turn a fd into file
    object while we're at it; that's one of the few guidelines we have for
    C API changes.

    Is PyFile_FromFd and PyFile_FromFdEx fine with you or do you prefer a
    different name like PyFile_FromFD and PyFile_FromFDEx or
    PyFile_FromFileDescriptor?

    I've another question. The os.tmpfile method is using tmpfile() which
    returns a file pointer. Do I have to dup its file number and explictely
    close the file pointer as shown below or is a simple fileno() enough? I
    haven't found sample code how to use a file descriptor after the file
    pointer is closed.

    static PyObject *
    posix_tmpfile(PyObject *self, PyObject *noargs)
    {
        FILE *fp;
        int fd;
        fp = tmpfile();
        if (fp == NULL)
            return posix_error();
        fd = dup(fileno(fp));
        if (fd == -1)
            return posix_error();
        fclose(fp);
        return PyFile_FromFD(fd, "<tmpfile>", "w+b");
    }

    @tiran
    Copy link
    Member

    tiran commented Oct 20, 2007

    The dangers of switching between fileno(fp) and fp are actually well
    documented in the C and/or POSIX standards. The problem is caused in
    PyFile_FromFileEx() -- it creates a Python file object from the file
    descriptor. The fix actually only works because we're not using the
    FILE struct once PyTokenizer_FindEncoding() is called. I think it
    would be better to move the lseek() into call_find_module() so the
    FILE abstraction is not broken by PyTokenizer_FindEncoding().

    FYI:

    http://www.gnu.org/software/libc/manual/html_mono/libc.html#Stream_002fDescriptor-Precautions
    http://www.gnu.org/software/libc/manual/html_mono/libc.html#Cleaning-Streams

    Christian

    @gvanrossum
    Copy link
    Member

    Is PyFile_FromFd and PyFile_FromFdEx fine with you or do you prefer a
    different name like PyFile_FromFD and PyFile_FromFDEx or
    PyFile_FromFileDescriptor?

    Let's have a single function PyFile_FromFd(fd, name, mode, buffering,
    encoding, newline).

    I've another question. The os.tmpfile method is using tmpfile() which
    returns a file pointer. Do I have to dup its file number and explictely
    close the file pointer as shown below or is a simple fileno() enough?

    You should dup it; if you don't dup it, the fd will be closed when you
    call fclose(), rendering it useless; or when you don't call fclose(),
    you leak a FILE struct each time.

    @tiran
    Copy link
    Member

    tiran commented Oct 21, 2007

    I think find_module() should return a file descriptor (fd), not a
    FILE*, but most of the rest of the code should call fdopen() on that
    fd. Only call_find_module() should use the fd to turn it into a Python
    file object. Then the amount of change should be pretty minimal.

    I'd have to touch most functions in import.c and related files to change
    find_module() to use a file descriptor. It's a PITA and I don't think
    it's worse the effort for now.

    The new patch adds PyFile_FromFd and removes the other PyFile_FromFile*
    functions. It also changes some methods to use a file descriptor and
    some documentation. Two minor changes aren't related to the bug but they
    nagged me.

    @gvanrossum
    Copy link
    Member

    Do you have access to Windows? I believe it doesn't have dup(). :-(

    @tiran
    Copy link
    Member

    tiran commented Oct 21, 2007

    Guido van Rossum wrote:

    Guido van Rossum added the comment:

    Do you have access to Windows? I believe it doesn't have dup(). :-(

    I've an old laptop with Win2k at my disposal but it has no VS yet. Can
    you point me to a set of instruction how to install VS 2003? I've just
    VS .NET 2005 available.

    @gvanrossum
    Copy link
    Member

    > Do you have access to Windows? I believe it doesn't have dup(). :-(

    I've an old laptop with Win2k at my disposal but it has no VS yet. Can
    you point me to a set of instruction how to install VS 2003? I've just
    VS .NET 2005 available.

    Sorry, I'm as green as you when it comes to these. :-(

    But I believe I was mistaken about dup() not existing on Windows;
    dup() can't be used to duplicate sockets, but that's irrelevant here.
    So the dup()-based solution is fine.

    Alas, my family won't let me use the computer for more than a minute
    at a time today, so I won't be able to review any code... :-)

    @gvanrossum gvanrossum reopened this Oct 21, 2007
    @gvanrossum
    Copy link
    Member

    OK, checked in.

    You might want to compare what I checked in to your patch; I did a few
    style cleanups. I also moved the lseek() call into import.c, where it
    seems more appropriate.

    Committed revision 58587.

    @tiran
    Copy link
    Member

    tiran commented Oct 22, 2007

    Guido van Rossum wrote:

    You might want to compare what I checked in to your patch; I did a few
    style cleanups. I also moved the lseek() call into import.c, where it
    seems more appropriate.

    Ah I see that you prefer to keep assignment and check against NULL/-1 on
    two separate lines.

    I had the lseek() in PyTokenizer_FindEncoding() because I prefer
    functions that restore their environment. I find it less surprising when
    it restores the position of the file descriptor.

    By the way I got Windows, VS 2003 and several SDKs installed in VMWare
    today. It's annoying and it takes hours. Most unit tests are passing.
    http://wiki.python.org/moin/Building_Python_with_the_free_MS_C_Toolkit

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants