Title: Parsing a simple script eats all of your memory
msg55758 - (view) Author: Viktor Ferenczi (complex) Date: 2007-09-09 00:35
Read the WARNING below, then run the attached script with Python3.0a2.
It will eat all of your memory.

WARNING: Keep a process killing tool or an extra command line at your
fingertips, since this script could render your machine unusable in
about 10-20 seconds depending on your memory and CPU speed!!! YOU ARE

OS: Ubuntu Feisty, up-to-date
Python: Python3.0a1, built from sources,
configured with: --prefix=/usr/local
msg55759 - (view) Author: Viktor Ferenczi (complex) Date: 2007-09-09 00:45
Confirmed on Windows:

OS: Windows XP SP2 ENG
Python: Python3.0a1 MSI installer, default installation
msg55760 - (view) Author: Viktor Ferenczi (complex) Date: 2007-09-09 00:50
Works fine (does nothing) with Python 2.4.4 and Python 2.5.1 under Windows, so this bug must be caused by some new code in Python3.0a1. The bug depends on the contents of the doc string. There's another strange behavior if you write the word "this" in the docstring somewhere. The docstring could be parsed as source code somehow and causes strange things to the new parser.
msg55761 - (view) Author: Viktor Ferenczi (complex) Date: 2007-09-09 00:57
Errata: In the first line of my original post I mean Python3.0a1 and not
3.0a2, certainly.
msg55768 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2007-09-09 21:56
Confirmed that this happens on Mac OS X with a fresh build of py3k from svn.
msg55773 - (view) Author: Stefan Sonnenberg-Carstens (pythonmeister) Date: 2007-09-10 03:05
Same under Linux with Python 3.0a1.
Eats all cpu + memory
msg55932 - (view) Author: Alexey Suda-Chen (alexeychen) Date: 2007-09-15 22:22
--- tokenizer.c	(revision 58161)
+++ tokenizer.c	(working copy)
@@ -402,6 +402,8 @@
 	if (allocated) {
+  Py_XDECREF(tok->decoding_buffer);
+  tok->decoding_buffer = 0;
 	return s;
msg55934 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-09-16 00:17
Note the patch is inlined in a message.
msg55943 - (view) Author: Alexey Suda-Chen (alexeychen) Date: 2007-09-16 15:11
Oops, i see there are two bugs. Previously i have fixed multiline
strings only.

I think it will be:

Index: tokenizer.c
--- tokenizer.c	(revision 58161)
+++ tokenizer.c	(working copy)
@@ -395,6 +395,7 @@
 			goto error;
 		buflen = size;
 	memcpy(s, buf, buflen);
 	s[buflen] = '\0';
 	if (buflen == 0) /* EOF */
@@ -402,6 +403,12 @@
 	if (allocated) {
+  if ( bufobj == tok->decoding_buffer ){
+    Py_XDECREF(tok->decoding_buffer);
+    tok->decoding_buffer = 0;
+  }
 	return s;
msg55995 - (view) Author: Sean Reifschneider (jafo) * (Python committer) Date: 2007-09-18 13:11
Confirmed problem (used 4.5GB before I killed it), and that the second
patch resolved the problem.  I'm uploading the inline patch as an
attachment, with the directory name in it as well (from svn diff).  

Bumping the priority to high because the side effect can cause all sorts
of problems on a system including other processes being killed.
msg56083 - (view) Author: Neil Schemenauer (nas) Date: 2007-09-21 21:32
It looks to me like fp_readl is no longer working as designed and the
patch is not really the right fix.  The use of "decoding_buffer" is
tricky and I think the conversion to bytes screwed it up.  It might be
clearer to have a separate "decoding_overflow" struct member that's used
for overflow rather than overloading "decoding_buffer".
msg57475 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-11-14 00:40
The issue isn't fixed yet. The script is still eating precious memory.
msg57477 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-11-14 00:54
Amaury, can you have a look at this?  I think it's a bug in tok_nextc()
in tokenizer.c.
msg57478 - (view) Author: Viktor Ferenczi (complex) Date: 2007-11-14 02:36
This bug prevents me and many others to do preliminary testing on Py3k,
which slows down it's development. This bug is _really_ hurts. I've a
completely developed new module for Py3k that cannot be released due to
this bug, since it's unit tests are affected by this bug and would crash
the user's machine.

Sadly I've not enough free time and readily available in-depth knowledge
to fix this, especially after the first attempt was not perfect, which
shows that it may be a bug that cannot be fixed by correcting a typo
somewhere... :-)
msg57480 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-11-14 03:14
I've already raised the priority to draw more attention to this bug.

So far I'm not able to solve the bug but I've nailed down the issue to a
short test case:

# -*- coding: ascii -*-

The problem manifests itself only in the combination of the ascii
encoding and triple quotes across two or more line. Neither a different
encoding nor a string across a single line has the same problem

# -*- coding: ascii -*-
""" """

# -*- coding: latin.1 -*-

# -*- coding: ascii -*-
""" """

# -*- coding: ascii -*-
File "", line 5
SyntaxError: EOL while scanning single-quoted string

The latest example does compile with Python 2.5. Please note also the
wrong line number. The file has only three (!) lines.

During my debugging session I saw an infinite loop in tokenzize.c:1429

	/* String */
	if (c == '\'' || c == '"') {
		for (;;) {
msg57483 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-11-14 05:04
Is this also broken in the 3.0a1 release? If not, it might be useful
to try to find the most recent rev where it's not broken.
msg57484 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-11-14 05:05
Is this also broken in the 3.0a1 release? If not, it might be useful
to try to find the most recent rev where it's not broken.
msg57486 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2007-11-14 10:27
fp_readl is indeed broken in several ways:
- decoding_buffer should be reset to NULL when all data has been read
(buflen <= size).
- the (buflen > size) case will cause an error on the next pass, since
the function cannot handle PyBytesObject.

IOW, the function is always wrong ;-)

I have a correction ready (jafo's patch already addresses the first
case), but cannot access svn here. I will try to provide a patch + test
cases later tonight.
msg57487 - (view) Author: Viktor Ferenczi (complex) Date: 2007-11-14 12:40
In response to Guido:

According to pythonmeister's post (2007-09-10):

"Same under Linux with Python 3.0a1.
Eats all cpu + memory"

I found the bug with this version:

fviktor@rigel:~$ python3.0 --version
Python 3.0a1

AFAIK it is the latest alpha released.
I did not try the SVN trunk, but may be
buggy with high probability, since this
issue has not been closed yet.

Viktor (complex)
msg57574 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2007-11-15 23:21
Corrected in revision 59001, with a modified patch.
