This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Py3K cannot run as ``python -S``
Type: Stage:
Components: Interpreter Core Versions: Python 3.0
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: brett.cannon Nosy List: brett.cannon, christian.heimes, gvanrossum
Priority: normal Keywords: patch

Created on 2007-10-11 23:02 by brett.cannon, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
py3k_initio.patch christian.heimes, 2007-10-15 20:28
py3k_initio2.patch christian.heimes, 2007-10-16 02:11
py3k_initstdio_impfindmodule.patch christian.heimes, 2007-10-16 05:40
py3k_initstdio_impfindmodule2.patch christian.heimes, 2007-10-16 17:55
py3k_initstdio_impfindmodule3.patch christian.heimes, 2007-10-16 18:51
py3k_test_issue1267.patch christian.heimes, 2007-10-20 02:28
py3k_filefromfd.patch christian.heimes, 2007-10-21 04:40
Messages (48)
msg56350 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-10-11 23:02
If Py3K is executed without importing site, it fails horribly.  This is
because site.py sets __builtins__.open, sys.stdout, sys.stderr, and
sys.stdin.

The setting of sys.stderr is especially bad as exception printing
requires sys.stderr, otherwise it reports that sys.stderr was lost. 
This prevents debugging any exceptions thrown when site.py isn't/can't
be imported (e.g., trying to debug bootstrapping importlib).
msg56465 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-15 20:28
I've some code around which sets sys.stdin, out and err in C code. The
code is far from perfect and I haven't checked it for reference leaks
yet. I like to get your comment on the style and error catching.
msg56484 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-16 02:11
I've carefully checked and tested the initstdio() method. I'm sure that
I've catched every edged case. The unit tests pass w/o complains.

I've also added a PyErr_Display() call to Py_FatalError(). It's still
hard to understand an error in io.py but at least the dependency on
site.py is removed.
msg56488 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-16 03:47
> I've carefully checked and tested the initstdio() method. I'm sure that
> I've catched every edged case. The unit tests pass w/o complains.
>
> I've also added a PyErr_Display() call to Py_FatalError(). It's still
> hard to understand an error in io.py but at least the dependency on
> site.py is removed.

Very cool!

Some final suggestions:

+               Py_FatalError("Py_Initialize: can't initialize sys standard"
+                               "streams");

Break this differently:

    Py_FatalError(
        "Py_........");

Don't call open() with keyword arg for newline="\r"; open() takes
positional args too. This is done specifically to simplify life for C
code calling it. :-) Perhaps one of the PyFunction_Call(..) variants
makes it easier to call it without having to explicitly construct the
tuple for the argument list. (All this because you're leaking the
value of PyString_FromString("\n"). :-)

I would change the error handling to avoid the 'finally' label, like this:

        if (0) {
  error:
                status = -1;
                Py_XDECREF(args);
        }

I would add a comment "see initstdio() in pythonrun.c" to the
OpenWrapper class, which otherwise looks a bit mysterious (as it's not
used anywhere in the Python code).

Thanks for doing this!!!
msg56493 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-16 05:27
Guido van Rossum wrote:
> Don't call open() with keyword arg for newline="\r"; open() takes
> positional args too. This is done specifically to simplify life for C
> code calling it. :-) Perhaps one of the PyFunction_Call(..) variants
> makes it easier to call it without having to explicitly construct the
> tuple for the argument list. (All this because you're leaking the
> value of PyString_FromString("\n"). :-)

I need a way to open a file with a specific encoding for
imp.find_module(), too. I found PyFile_FromFile(). It uses io.open but
it doesn't accept the additional arguments. I like to add another
function PyFile_FromFileEx() to the API.

PyObject *
PyFile_FromFileEx(FILE *fp, char *name, char *mode, int (*close)(FILE
*), int buffering, char *encoding, char *newline)

Some people may also miss the PyFile_FromString() function but that's a
different story. With the my new function I can create sys.stdin with:

PyFile_FromFileEx(stdin, "<stdin>", "r", fclose, -1, NULL, "\n")

I kinda like it :)

> I would change the error handling to avoid the 'finally' label, like this:
> 
>         if (0) {
>   error:
>                 status = -1;
>                 Py_XDECREF(args);
>         }

Wow, that's tricky. :)

> I would add a comment "see initstdio() in pythonrun.c" to the
> OpenWrapper class, which otherwise looks a bit mysterious (as it's not
> used anywhere in the Python code).

That sounds like a good idea!

The code for const char *PyTokenizer_FindEncoding(FILE *fp) and
imp.find_module() is also finished and works.

Christian
msg56494 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-16 05:40
The patch is a combined patch for the imp.find_module() problem and
initstdio. Both patches depend on the new PyFile_FromFileEx() function.
I hope you don't mind.
msg56499 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-16 17:55
Changes since last patch:

 * removed unused import of open in initstdio()
 * fixed infinite loop in PyTokenizer_FindEncoding() by checking
tok-done == E_OK
msg56502 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-16 18:33
Christian Heimes wrote:
>  * removed unused import of open in initstdio()
>  * fixed infinite loop in PyTokenizer_FindEncoding() by checking
> tok-done == E_OK

I found another bug in Python/import.c:call_find_method. The function
mustn't set an encoding of ftp->mode contains 'b' for binary.

		if (strchr(fdp->mode, 'b') == NULL) {
			/* Python text file, get encoding from tokenizer */
			encoding = PyTokenizer_FindEncoding(fp);
			encoding = (encoding != NULL) ? encoding :
				   PyUnicode_GetDefaultEncoding();
		}

Christian
msg56503 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-16 18:43
Does this mean I should hold off reviewing the patch?

On 10/16/07, Christian Heimes <report@bugs.python.org> wrote:
>
> Christian Heimes added the comment:
>
> Christian Heimes wrote:
> >  * removed unused import of open in initstdio()
> >  * fixed infinite loop in PyTokenizer_FindEncoding() by checking
> > tok-done == E_OK
>
> I found another bug in Python/import.c:call_find_method. The function
> mustn't set an encoding of ftp->mode contains 'b' for binary.
>
>                 if (strchr(fdp->mode, 'b') == NULL) {
>                         /* Python text file, get encoding from tokenizer */
>                         encoding = PyTokenizer_FindEncoding(fp);
>                         encoding = (encoding != NULL) ? encoding :
>                                    PyUnicode_GetDefaultEncoding();
>                 }
>
> Christian
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1267>
> __________________________________
>
msg56504 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-16 18:51
Update since last patch

 * removed unnecessary const from const char*
PyTokenizer_FindEncoding(FILE *fp)
 * Fixed bug in find_module whith binary files

Please review the patch.
msg56564 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-19 21:02
I will take this.
msg56565 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-10-19 21:14
Thanks, Guido.
msg56574 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-19 23:17
Committed revision 58553 (with minor tweaks only).

Thanks!
msg56579 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-20 00:30
FWIW, on Mac I get six failures (all these pass on Linux):

test_cmd_line test_inspect test_modulefinder test_pyclbr test_quopri test_runpy
msg56581 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-10-20 01:00
Yeah, I just noticed this as well.  is it definitely this change?

On 10/19/07, Guido van Rossum <report@bugs.python.org> wrote:
>
> Guido van Rossum added the comment:
>
> FWIW, on Mac I get six failures (all these pass on Linux):
>
> test_cmd_line test_inspect test_modulefinder test_pyclbr test_quopri test_runpy
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1267>
> __________________________________
>
msg56582 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-20 01:17
I don't have a Mac at my disposal any more. :(

Can you attach the output of the failing tests to the bug report? Maybe
I can be of assistance.
msg56583 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-10-20 01:20
runpy is failing because pkgutil is failing because it is giving
compile() part of a source file instead of the entire thing::

Traceback (most recent call last):
  File "/Users/drifty/Dev/python/3.x/pristine/Lib/runpy.py", line 97,
in _run_module_as_main
    loader, code, fname = _get_module_details(mod_name)
  File "/Users/drifty/Dev/python/3.x/pristine/Lib/runpy.py", line 82,
in _get_module_details
    code = loader.get_code(mod_name)
  File "/Users/drifty/Dev/python/3.x/pristine/Lib/pkgutil.py", line
275, in get_code
    self.code = compile(source, self.filename, 'exec')
  File "/Users/drifty/Dev/python/3.x/pristine/Lib/tokenize.py", line 2
    "UR'''": single3prog, 'UR"""': double3prog,
    ^
IndentationError: unexpected indent

That bad line is the first line in the 'source' variable being passed
to compile().  So somewhere the beginning of the source file is being
chopped up.

On 10/19/07, Christian Heimes <report@bugs.python.org> wrote:
>
> Christian Heimes added the comment:
>
> I don't have a Mac at my disposal any more. :(
>
> Can you attach the output of the failing tests to the bug report? Maybe
> I can be of assistance.
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1267>
> __________________________________
>
msg56584 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-10-20 01:30
It looks like the file object returned by imp.find_module() has its read
position WAY too far forward (at least on OS X).

Re-opening.
msg56585 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-20 01:32
> runpy is failing because pkgutil is failing because it is giving
> compile() part of a source file instead of the entire thing::
> 
> Traceback (most recent call last):
>   File "/Users/drifty/Dev/python/3.x/pristine/Lib/runpy.py", line 97,
> in _run_module_as_main
>     loader, code, fname = _get_module_details(mod_name)
>   File "/Users/drifty/Dev/python/3.x/pristine/Lib/runpy.py", line 82,
> in _get_module_details
>     code = loader.get_code(mod_name)
>   File "/Users/drifty/Dev/python/3.x/pristine/Lib/pkgutil.py", line
> 275, in get_code
>     self.code = compile(source, self.filename, 'exec')
>   File "/Users/drifty/Dev/python/3.x/pristine/Lib/tokenize.py", line 2
>     "UR'''": single3prog, 'UR"""': double3prog,
>     ^
> IndentationError: unexpected indent
> 
> That bad line is the first line in the 'source' variable being passed
> to compile().  So somewhere the beginning of the source file is being
> chopped up.

Could you please comment out the PyTokenizer_FindEncoding(fp) call in
Python/import.c to check if it is related to it? That's the only change
(I can think of) that may be related to the problem. I always rewind the
fp in the function but it may not work on Mac.

Christian
msg56586 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-20 01:37
Brett Cannon wrote:
> Brett Cannon added the comment:
> 
> It looks like the file object returned by imp.find_module() has its read
> position WAY too far forward (at least on OS X).

That's strange. It should never read more than 2 lines of a file. I
don't understand how it could happen.

char *
PyTokenizer_FindEncoding(FILE *fp) {
        struct tok_state *tok;
        char *p_start=NULL, *p_end=NULL;

        if ((tok = PyTokenizer_FromFile(fp, NULL, NULL, NULL)) == NULL) {
                rewind(fp);
                return NULL;
        }
        while(((tok->lineno <= 2) && (tok->done == E_OK))) {
                PyTokenizer_Get(tok, &p_start, &p_end);
        }

        rewind(fp);
        return tok->encoding;
}
msg56587 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-10-20 01:38
PyTokenizer_FindEncoding() is the culprit.  If I comment it out in
Python/import.c:call_find_module(), and thus just use the system
encoding, runpy works again.
msg56588 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-20 01:46
It's unrelated to the problem but Parser/tokenizer.c:1619 has a minor bug.

>         while(((tok->lineno <= 2) && (tok->done == E_OK))) {
>                 PyTokenizer_Get(tok, &p_start, &p_end);
>         }

(tok->lineno < 2) is sufficient. See line 516

Christian
msg56589 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-10-20 02:17
OK, for some reason, when PyTokenizer_FindEncoding() is called and the
resulting file is big enough, it starts off with a seek position
(according to TextIOWrapper.tell()) of 4096.  That happens to be exactly
half the size of the buffer used by io.

But I am not sure what the magical size is as creating a file of 4095,
4096, and 4097 bytes does not trigger this bug.
msg56590 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-20 02:28
I've made a short unit tests which tests a large file with and w/o -*-
coding: -*-. It passes on Linux.
msg56591 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-20 02:33
Brett Cannon wrote:
> Brett Cannon added the comment:
> 
> OK, for some reason, when PyTokenizer_FindEncoding() is called and the
> resulting file is big enough, it starts off with a seek position
> (according to TextIOWrapper.tell()) of 4096.  That happens to be exactly
> half the size of the buffer used by io.
> 
> But I am not sure what the magical size is as creating a file of 4095,
> 4096, and 4097 bytes does not trigger this bug.

Maybe rewind() doesn't do what is should do? Could you replace the
rewind() calls with fseek(fd, 0, 0)?

w/o the rewind() calls in PyTokenizer_FindEncoding I'm getting an error
in my new test:

Traceback (most recent call last):
  File "Lib/test/test_imp.py", line 66, in <module>
    test_main()
  File "Lib/test/test_imp.py", line 62, in test_main
    ImportTests,
  File "/home/heimes/dev/python/py3k/Lib/test/test_support.py", line
541, in run_unittest
    _run_suite(suite)
  File "/home/heimes/dev/python/py3k/Lib/test/test_support.py", line
524, in _run_suite
    raise TestFailed(err)
test.test_support.TestFailed: Traceback (most recent call last):
  File "Lib/test/test_imp.py", line 50, in test_issue1267
    self.assertEqual(fd.tell(), 0)
AssertionError: 4096 != 0

The number is looking familiar, isn't it? :) Is it the default buffer
size on Unix?

Christian
msg56592 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-10-20 02:54
I checked in your ``< 2`` change and plugged a memory leak since you
were not freeing the struct tok_state.  Checked in r58555.
msg56593 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-10-20 02:58
Nope, didn't do it.  I also checked using gdb a few minutes ago and
ftell() was reporting a position of 0 all they way back to
PyObject_MethodCall().
msg56594 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-10-20 03:18
OK, I think I might have a solution, and it is really stupid.  I will
run the unit test suite to see if it really fixes everything.
msg56595 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-20 03:26
> OK, I think I might have a solution, and it is really stupid.  I will
> run the unit test suite to see if it really fixes everything.

Here's keeping my fingers crossed.

I do note that the new code still leaks the encoding string which is
malloc'ed in PyTokenizer_FindEncoding but never freed by
call_find_module.

BTW, the test_inspect failure I saw was shallow; I'd renamed the
parent directory and the failure was due to the wrong filename being
baked into the .pyc file. The other 5 failures are all related to the
issue you are chasing here.
msg56597 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-10-20 03:49
58557 as the fix.  Yes, it was stupid on OS X's part and I was lucky to
just think of the possible solution.

Basically OS X does not like it when you do stuff with a file pointer
but then poke around with the file descriptor.  That means that when
PyTokenizer_FindEncoding() seeked on the file pointer it was not being
picked up by the the file descriptor that the file pointer represented.

This suggests that perhaps we should standardize on file pointers or
file descriptors in Python to prevent something like this from happening
again.
msg56598 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-20 03:59
Brett Cannon wrote:
> This suggests that perhaps we should standardize on file pointers or
> file descriptors in Python to prevent something like this from happening
> again.

rewind() it used couple of times in the Python code. Have you checked if
the other calls are safe?

Thx for finding the bug :) Great work!

Christian
msg56599 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-10-20 04:06
On 10/19/07, Christian Heimes <report@bugs.python.org> wrote:
>
> Christian Heimes added the comment:
>
> Brett Cannon wrote:
> > This suggests that perhaps we should standardize on file pointers or
> > file descriptors in Python to prevent something like this from happening
> > again.
>
> rewind() it used couple of times in the Python code. Have you checked if
> the other calls are safe?
>

No.  I am still rather frustrated that was the problem.

The only reason I feel safe with this solution is that the file
pointer is not directly used except by _fileio to get the file
descriptor.  I would not trust this if the file pointer and file
descriptor had intertwined uses.  But I would trust rewind() if the
file pointer is used the entire time.

> Thx for finding the bug :) Great work!

Thanks.  I just wish this whole ordeal had not been necessary (filed a
bug report with Apple in hopes that this can be prevented for someone
else).  I can see why some people prefer to hack on PyPy, IronPython,
or Jython instead of dealing with the joys of C.  =)
msg56600 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-20 04:22
Brett Cannon wrote:
> Thanks.  I just wish this whole ordeal had not been necessary (filed a
> bug report with Apple in hopes that this can be prevented for someone
> else).  I can see why some people prefer to hack on PyPy, IronPython,
> or Jython instead of dealing with the joys of C.  =)

The bug is rather strange. I would have imagined that fseek() and
rewind() are using the file descriptor internally. It's more than weird
that an operation on the file handler doesn't affect the file
descriptor. I wonder how they were able to manage it ...

IronPython might be fun but I've hacked PythonDotNet. You get the worst
from the C, .NET/C# and Mono together. Debugging is fun when you have to
do multiple turns with two debuggers and one of the debuggers is partly
broken. :[
msg56604 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-20 15:36
Thanks for persevering!!!

The dangers of switching between fileno(fp) and fp are actually well
documented in the C and/or POSIX standards. The problem is caused in
PyFile_FromFileEx() -- it creates a Python file object from the file
descriptor. The fix actually only works because we're not using the
FILE struct once PyTokenizer_FindEncoding() is called. I think it
would be better to move the lseek() into call_find_module() so the
FILE abstraction is not broken by PyTokenizer_FindEncoding().

I think there's still a bug or two lurking in this area: first, each
time you call imp.find_module() you leak a FILE object; second, the
encoding allocated in PyTokenizer_FindEncoding() is leaked.

You're right that a lot of this could be avoided if we used file
descriptors consistently. It seems find_module() itself doesn't read
the file; it just needs to know that it's possible to open the file.

Rewriting everywhere that uses PyFile_FromFile[Ex] to use file
descriptors doesn't seem too hard; there are only a few places.
msg56605 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-20 15:39
Sorry, I was wrong about the encoding leak; you fixed that. The
FILE/fd issue is real though.
msg56607 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-20 18:00
Guido van Rossum wrote:
> You're right that a lot of this could be avoided if we used file
> descriptors consistently. It seems find_module() itself doesn't read
> the file; it just needs to know that it's possible to open the file.
> 
> Rewriting everywhere that uses PyFile_FromFile[Ex] to use file
> descriptors doesn't seem too hard; there are only a few places.

If I understand you right you want to alter the interface of
PyFile_FromFile and PyFile_FromFileEx from

PyFile_FromFileEx(FILE *fp, char *name, char *mode, int (*close)(FILE*),
int buffering, char *encoding, char *newline)

to

PyFile_FromFileEx(int fd, char *name, char *mode, int (*close)(FILE*),
int buffering, char *encoding, char *newline)

?

I could write a patch and remove int (*close)(FILE*), too. It's no
longer used by the functions.

Christian
msg56608 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-10-20 18:50
On 10/20/07, Christian Heimes <report@bugs.python.org> wrote:
>
> Christian Heimes added the comment:
>
> Guido van Rossum wrote:
> > You're right that a lot of this could be avoided if we used file
> > descriptors consistently. It seems find_module() itself doesn't read
> > the file; it just needs to know that it's possible to open the file.
> >
> > Rewriting everywhere that uses PyFile_FromFile[Ex] to use file
> > descriptors doesn't seem too hard; there are only a few places.
>
> If I understand you right you want to alter the interface of
> PyFile_FromFile and PyFile_FromFileEx from
>
> PyFile_FromFileEx(FILE *fp, char *name, char *mode, int (*close)(FILE*),
> int buffering, char *encoding, char *newline)
>
> to
>
> PyFile_FromFileEx(int fd, char *name, char *mode, int (*close)(FILE*),
> int buffering, char *encoding, char *newline)
>
> ?
>
> I could write a patch and remove int (*close)(FILE*), too. It's no
> longer used by the functions.

Yes, I think you got it.

-Brett
msg56609 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-20 19:42
> Yes, I think you got it.

Guido, Brett, how much of the code should I change? import.c is mainly
using file pointers. Should I replace all operations on FILE with
operations on a file descriptor?

Christian
msg56610 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-20 19:59
> Guido, Brett, how much of the code should I change? import.c is mainly
> using file pointers. Should I replace all operations on FILE with
> operations on a file descriptor?

I think find_module() should return a file descriptor (fd), not a
FILE*, but most of the rest of the code should call fdopen() on that
fd. Only call_find_module() should use the fd to turn it into a Python
file object. Then the amount of change should be pretty minimal.

I recommend changing the name of the API used to turn a fd into file
object while we're at it; that's one of the few guidelines we have for
C API changes.
msg56611 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-20 20:17
Guido van Rossum wrote:
> I think find_module() should return a file descriptor (fd), not a
> FILE*, but most of the rest of the code should call fdopen() on that
> fd. Only call_find_module() should use the fd to turn it into a Python
> file object. Then the amount of change should be pretty minimal.

K, I understand.

> I recommend changing the name of the API used to turn a fd into file
> object while we're at it; that's one of the few guidelines we have for
> C API changes.

Is PyFile_FromFd and PyFile_FromFdEx fine with you or do you prefer a
different name like PyFile_FromFD and PyFile_FromFDEx or
PyFile_FromFileDescriptor?

I've another question. The os.tmpfile method is using tmpfile() which
returns a file pointer. Do I have to dup its file number and explictely
close the file pointer as shown below or is a simple fileno() enough? I
haven't found sample code how to use a file descriptor after the file
pointer is closed.

static PyObject *
posix_tmpfile(PyObject *self, PyObject *noargs)
{
    FILE *fp;
    int fd;

    fp = tmpfile();
    if (fp == NULL)
        return posix_error();
    fd = dup(fileno(fp));
    if (fd == -1)
        return posix_error();
    fclose(fp);
    return PyFile_FromFD(fd, "<tmpfile>", "w+b");
}
msg56612 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-20 21:12
> The dangers of switching between fileno(fp) and fp are actually well
> documented in the C and/or POSIX standards. The problem is caused in
> PyFile_FromFileEx() -- it creates a Python file object from the file
> descriptor. The fix actually only works because we're not using the
> FILE struct once PyTokenizer_FindEncoding() is called. I think it
> would be better to move the lseek() into call_find_module() so the
> FILE abstraction is not broken by PyTokenizer_FindEncoding().

FYI:

http://www.gnu.org/software/libc/manual/html_mono/libc.html#Stream_002fDescriptor-Precautions
http://www.gnu.org/software/libc/manual/html_mono/libc.html#Cleaning-Streams

Christian
msg56613 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-20 22:36
> Is PyFile_FromFd and PyFile_FromFdEx fine with you or do you prefer a
> different name like PyFile_FromFD and PyFile_FromFDEx or
> PyFile_FromFileDescriptor?

Let's have a single function PyFile_FromFd(fd, name, mode, buffering,
encoding, newline).

> I've another question. The os.tmpfile method is using tmpfile() which
> returns a file pointer. Do I have to dup its file number and explictely
> close the file pointer as shown below or is a simple fileno() enough?

You should dup it; if you don't dup it, the fd will be closed when you
call fclose(), rendering it useless; or when you don't call fclose(),
you leak a FILE struct each time.
msg56622 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-21 04:40
> I think find_module() should return a file descriptor (fd), not a
> FILE*, but most of the rest of the code should call fdopen() on that
> fd. Only call_find_module() should use the fd to turn it into a Python
> file object. Then the amount of change should be pretty minimal.

I'd have to touch most functions in import.c and related files to change
find_module() to use a file descriptor. It's a PITA and I don't think
it's worse the effort for now.

The new patch adds PyFile_FromFd and removes the other PyFile_FromFile*
functions. It also changes some methods to use a file descriptor and
some documentation. Two minor changes aren't related to the bug but they
nagged me.
msg56623 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-21 16:00
Do you have access to Windows? I believe it doesn't have dup(). :-(
msg56625 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-21 18:09
Guido van Rossum wrote:
> Guido van Rossum added the comment:
> 
> Do you have access to Windows? I believe it doesn't have dup(). :-(

I've an old laptop with Win2k at my disposal but it has no VS yet. Can
you point me to a set of instruction how to install VS 2003? I've just
VS .NET 2005 available.
msg56626 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-21 18:46
> > Do you have access to Windows? I believe it doesn't have dup(). :-(
>
> I've an old laptop with Win2k at my disposal but it has no VS yet. Can
> you point me to a set of instruction how to install VS 2003? I've just
> VS .NET 2005 available.

Sorry, I'm as green as you when it comes to these. :-(

But I believe I was mistaken about dup() not existing on Windows;
dup() can't be used to duplicate sockets, but that's irrelevant here.
So the dup()-based solution is fine.

Alas, my family won't let me use the computer for more than a minute
at a time today, so I won't be able to review any code... :-)
msg56631 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-22 00:10
OK, checked in.

You might want to compare what I checked in to your patch; I did a few
style cleanups.  I also moved the lseek() call into import.c, where it
seems more appropriate.

Committed revision 58587.
msg56633 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-22 00:59
Guido van Rossum wrote:
> You might want to compare what I checked in to your patch; I did a few
> style cleanups.  I also moved the lseek() call into import.c, where it
> seems more appropriate.

Ah I see that you prefer to keep assignment and check against NULL/-1 on
two separate lines.

I had the lseek() in PyTokenizer_FindEncoding() because I prefer
functions that restore their environment. I find it less surprising when
it restores the position of the file descriptor.

By the way I got Windows, VS 2003 and several SDKs installed in VMWare
today. It's annoying and it takes hours. Most unit tests are passing.
http://wiki.python.org/moin/Building_Python_with_the_free_MS_C_Toolkit
History
Date User Action Args
2022-04-11 14:56:27adminsetgithub: 45608
2008-01-06 22:29:45adminsetkeywords: - py3k
versions: Python 3.0
2007-10-22 00:59:21christian.heimessetmessages: + msg56633
2007-10-22 00:10:52gvanrossumsetstatus: open -> closed
resolution: accepted
messages: + msg56631
2007-10-21 23:28:03gvanrossumsetstatus: closed -> open
resolution: accepted -> (no value)
2007-10-21 18:46:33gvanrossumsetmessages: + msg56626
2007-10-21 18:09:26christian.heimessetmessages: + msg56625
2007-10-21 16:00:25gvanrossumsetmessages: + msg56623
2007-10-21 04:40:46christian.heimessetfiles: + py3k_filefromfd.patch
messages: + msg56622
2007-10-20 22:36:16gvanrossumsetmessages: + msg56613
2007-10-20 21:12:30christian.heimessetmessages: + msg56612
2007-10-20 20:17:08christian.heimessetmessages: + msg56611
2007-10-20 19:59:25gvanrossumsetmessages: + msg56610
2007-10-20 19:42:22christian.heimessetmessages: + msg56609
2007-10-20 18:50:25brett.cannonsetmessages: + msg56608
2007-10-20 18:00:04christian.heimessetmessages: + msg56607
2007-10-20 15:39:09gvanrossumsetmessages: + msg56605
2007-10-20 15:36:07gvanrossumsetmessages: + msg56604
2007-10-20 04:22:20christian.heimessetmessages: + msg56600
2007-10-20 04:06:24brett.cannonsetmessages: + msg56599
2007-10-20 03:59:50christian.heimessetmessages: + msg56598
2007-10-20 03:49:36brett.cannonsetstatus: open -> closed
assignee: gvanrossum -> brett.cannon
messages: + msg56597
2007-10-20 03:26:30gvanrossumsetmessages: + msg56595
2007-10-20 03:18:43brett.cannonsetmessages: + msg56594
2007-10-20 02:58:28brett.cannonsetmessages: + msg56593
2007-10-20 02:54:42brett.cannonsetmessages: + msg56592
2007-10-20 02:33:56christian.heimessetmessages: + msg56591
2007-10-20 02:28:04christian.heimessetfiles: + py3k_test_issue1267.patch
messages: + msg56590
2007-10-20 02:17:04brett.cannonsetmessages: + msg56589
2007-10-20 01:46:01christian.heimessetmessages: + msg56588
2007-10-20 01:38:54brett.cannonsetmessages: + msg56587
2007-10-20 01:37:14christian.heimessetmessages: + msg56586
2007-10-20 01:32:57christian.heimessetmessages: + msg56585
2007-10-20 01:30:30brett.cannonsetstatus: closed -> open
messages: + msg56584
2007-10-20 01:20:37brett.cannonsetmessages: + msg56583
2007-10-20 01:17:07christian.heimessetmessages: + msg56582
2007-10-20 01:00:32brett.cannonsetmessages: + msg56581
2007-10-20 00:30:49gvanrossumsetmessages: + msg56579
2007-10-19 23:17:12gvanrossumsetstatus: open -> closed
resolution: accepted
messages: + msg56574
2007-10-19 21:14:20brett.cannonsetmessages: + msg56565
2007-10-19 21:02:20gvanrossumsetassignee: brett.cannon -> gvanrossum
messages: + msg56564
2007-10-16 18:51:18christian.heimessetfiles: + py3k_initstdio_impfindmodule3.patch
messages: + msg56504
2007-10-16 18:43:27gvanrossumsetmessages: + msg56503
2007-10-16 18:33:21christian.heimessetmessages: + msg56502
2007-10-16 17:55:23christian.heimessetfiles: + py3k_initstdio_impfindmodule2.patch
messages: + msg56499
2007-10-16 05:40:39christian.heimessetfiles: + py3k_initstdio_impfindmodule.patch
messages: + msg56494
2007-10-16 05:27:02christian.heimessetmessages: + msg56493
2007-10-16 03:47:59gvanrossumsetmessages: + msg56488
2007-10-16 02:11:25christian.heimessetfiles: + py3k_initio2.patch
messages: + msg56484
2007-10-15 20:31:54brett.cannonsetkeywords: + patch
assignee: brett.cannon
2007-10-15 20:28:51christian.heimessetfiles: + py3k_initio.patch
nosy: + christian.heimes
messages: + msg56465
2007-10-14 01:03:18gvanrossumsetnosy: + gvanrossum
2007-10-11 23:02:39brett.cannoncreate