This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Crash in MIMEText on FreeBSD
Type: behavior Stage:
Components: Library (Lib), Unicode Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: loewis, rpetrov, surkamp, vstinner
Priority: normal Keywords:

Created on 2008-10-22 18:40 by surkamp, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test_MIMEText.tar.bz2 surkamp, 2008-10-22 18:40 Test case
Messages (13)
msg75097 - (view) Author: Sérgio Surkamp (surkamp) Date: 2008-10-22 18:40
If you try to create a MIMEText object from a very large string (test
case include a 40Mbytes string), the program just eat all the CPU and
with high memory usage or raise a MemoryError. Sometimes it just
deadlocks when using _charset = "iso-8859-1".

Use the submited file and the script to test the case.

** On Linux its very slow, but work's ** - the problem occour on a
FreeBSD installation.
msg75112 - (view) Author: Roumen Petrov (rpetrov) * Date: 2008-10-22 21:46
I don't think that test work on linux without MemoryError. What about if
you set user limits on linux ?
If you enable core file on linux did the test really crash and dump core
or just raise exception and exit without coredump ?
msg75142 - (view) Author: Sérgio Surkamp (surkamp) Date: 2008-10-23 12:45
Testing on Linux:

$ ulimit -m 128000
$ ulimit -v 196000
$ python test_MIMEText.py
[...]
Traceback (most recent call last):
  File "test_MIMEText.py", line 23, in <module>
    txt = MIMEText(buffer, _subtype="plain", _charset="iso-8859-1")
  File "/usr/lib/python2.5/email/mime/text.py", line 30, in __init__
    self.set_payload(_text, _charset)
  File "/usr/lib/python2.5/email/message.py", line 220, in set_payload
    self.set_charset(charset)
  File "/usr/lib/python2.5/email/message.py", line 262, in set_charset
    self._payload = charset.body_encode(self._payload)
  File "/usr/lib/python2.5/email/charset.py", line 386, in body_encode
    return email.quoprimime.body_encode(s)
  File "/usr/lib/python2.5/email/quoprimime.py", line 198, in encode
    body = fix_eols(body)
  File "/usr/lib/python2.5/email/utils.py", line 77, in fix_eols
    s = re.sub(r'(?<!\r)\n', CRLF, s)
  File "/usr/lib/python2.5/re.py", line 150, in sub
    return _compile(pattern, 0).sub(repl, string, count)
MemoryError

Ok. Setting a "low" ulimit for memory and vmemory, raise a MemoryError
on Linux too.

Chacking the same limits on FreeBSD, they are set to unlimited, so the
problem should not occour there.
msg75158 - (view) Author: Roumen Petrov (rpetrov) * Date: 2008-10-24 10:21
what about data segment and stack size limits ?
msg75160 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-24 11:27
Your example work here on:
 - Linux, i386, 2 Go of memory, Python 2.5
 - FreeBSD in Qemu, i386, 512 MB of memory, Python 2.5

> The program just eat all the CPU and with high memory usage or raise 
a MemoryError

Yes, it takes one minute or more to finish. If there is not enough 
memory, Python raises a MemoryError. The behaviour is correct: Python 
doesn't crash, it's just slow.

Your text file is ~40 MB. Python may allocate mutiple objects bigger 
than 40 MB to create the email content. The algorithm should be 
changed to work on a stream (process small chunks, eg. 4 KB) instead 
of manipule the full text in memory (+40,000 KB).

Why do you try to send 40 MB by email? Use FTP or another protocol :-p 
Or use another encoding (base64) to attach the text to the email.
msg75164 - (view) Author: Sérgio Surkamp (surkamp) Date: 2008-10-24 12:48
> Your text file is ~40 MB. Python may allocate mutiple objects bigger 
than 40 MB to create the email content. The algorithm should be 
changed to work on a stream (process small chunks, eg. 4 KB) instead 
of manipule the full text in memory (+40,000 KB).

The original text block is about 5 to 9 Mbytes - its a server generated
report by pflogsum. When it came to our mailing list processing program
(wrote by someone else in Python), it freezes building the MIMEText
object. Actually no MemoryError isn't raised, just a sudden freeze of
the running thread.

Unfortunately the test script submited does not do the same behavior,
maybe some other things are freezing the software instead of raise the
MemoryError. I have checked for blocks of try: ... except ...: pass that
could hide the problem, but found nothing.

I have already limited the size on Postfix, but the strange thing is why
this happens on FreeBSD and don't on Linux.
msg75165 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-24 13:03
> The original text block is about 5 to 9 Mbytes (...), it freezes 
> building the MIMEText object. Actually no MemoryError isn't raised,
> just a sudden freeze of the running thread.

Can you give more details about the freeze?
 - FreeBSD version?
 - CPU, memory?
 - Full Python version?

On "freeze", the process uses 0% or 100% of the CPU time? You can use the 
strace program to trace Python activity during the freeze.

Your might try my clone of strace, strace.py, which works on FreeBSD without 
the Linux emulation (but on FreeBSD, only i386 is supported):
   http://python-ptrace.hachoir.org/trac

> Unfortunately the test script submited does not do the same behavior,
> maybe some other things are freezing the software instead of raise the
> MemoryError.

You can try the isolate the bug? Remove some code, disable functions, etc.
msg75166 - (view) Author: Sérgio Surkamp (surkamp) Date: 2008-10-24 13:17
- FreeBSD version?

FreeBSD 7.0-RELEASE

 - CPU, memory?

CPU: 2 x Pentium III 1.133 GHz
Memory: 512 Mbytes

 - Full Python version?

Python 2.5.2 (r252:60911, Oct  2 2008, 10:03:50) 
[GCC 4.2.1 20070719  [FreeBSD]] on freebsd7

> On "freeze", the process uses 0% or 100% of the CPU time? You can use
the strace program to trace Python activity during the freeze.

Usually 100%. But saw it with more (using both CPU's), I think that mean
more then one thread "freezed".

I will download your trace program and do some tests with it. Ill try to
collect some informations using GDB too.
msg75167 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-24 13:24
> Usually 100%. But saw it with more (using both CPU's), I think that mean
> more then one thread "freezed".

Does the program finish its job after 10 minutes or 1 hour? Using all the CPU 
doesn't mean that Python is frozen, it's the opposite: Python is working hard 
to compute the result :-)
msg75168 - (view) Author: Sérgio Surkamp (surkamp) Date: 2008-10-24 13:39
When I first saw the problem, the email system queue was stopped about 2
days (weekend) :-(

The email system control the number of open threads, so I wasn't opening
new threads too and issuing many warnings about it on logs

Anyway, already installed the ptrace tool and Ill start debuging when I
came back from launch
msg75297 - (view) Author: Sérgio Surkamp (surkamp) Date: 2008-10-28 17:44
Ok. Something is very wrong with our code too. I have dumped the text
that's cousing the "freeze" and run it using the test case scripts. It
worked slow, but worked. It seems that our application is eating too
many memory from server (about 60Mbytes for a 2.4Mbytes message), so its
obviously a application bug/leak.

Unfortunately I cant submit the files for performance test, becose they
may contain confidential information.

As long as I can see on GDB, the python process is in a loop inside this
functions:

#0  0x2825798e in memcpy () from /lib/libc.so.7
#1  0x080a4607 in PyUnicodeUCS4_Concat ()
#2  0x080aec8d in PyEval_EvalFrameEx ()
#3  0x080b2c49 in PyEval_EvalCodeEx ()
#4  0x080b111a in PyEval_EvalFrameEx ()
#5  0x080b2c49 in PyEval_EvalCodeEx ()
#6  0x080b111a in PyEval_EvalFrameEx ()
#7  0x080b1f65 in PyEval_EvalFrameEx ()
#8  0x080b2c49 in PyEval_EvalCodeEx ()
#9  0x080b111a in PyEval_EvalFrameEx ()
#10 0x080b2c49 in PyEval_EvalCodeEx ()
#11 0x080eebd6 in PyClassMethod_New ()
#12 0x08059ef7 in PyObject_Call ()
#13 0x0805f341 in PyClass_IsSubclass ()
#14 0x08059ef7 in PyObject_Call ()
#15 0x080ac86c in PyEval_CallObjectWithKeywords ()
#16 0x080629d6 in PyInstance_New ()
#17 0x08059ef7 in PyObject_Call ()
#18 0x080af2bb in PyEval_EvalFrameEx ()
#19 0x080b2c49 in PyEval_EvalCodeEx ()
#20 0x080b111a in PyEval_EvalFrameEx ()
#21 0x080b1f65 in PyEval_EvalFrameEx ()
#22 0x080b1f65 in PyEval_EvalFrameEx ()
#23 0x080b1f65 in PyEval_EvalFrameEx ()
#24 0x080b2c49 in PyEval_EvalCodeEx ()
#25 0x080eec4e in PyClassMethod_New ()
#26 0x08059ef7 in PyObject_Call ()
#27 0x0805f341 in PyClass_IsSubclass ()
#28 0x08059ef7 in PyObject_Call ()
#29 0x080ac86c in PyEval_CallObjectWithKeywords ()
#30 0x080d4b58 in initthread ()
#31 0x28175acf in pthread_getprio () from /lib/libthr.so.3
#32 0x00000000 in ?? ()

Every memcpy call take a lot to complete, but it seems a problem with
GDB debugging as it eats 80% to 95% of the CPU and python just 1% or 2%.

How python charset conversion works from inside? It duplicates the
original string every character substitution?
If this is the case, shouldn't be better to count the substituitions,
calculate the amount of needed memory and make just one allocation for
the new string? Then copy the unmodified characters from the original to
the new string and change other chars as needed?
msg77496 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-10 08:32
IIUC, no patch has been proposed. So retargetting it to later versions.
msg127182 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-01-27 12:34
> Something is very wrong with our code too. I have dumped the text
> that's cousing the "freeze" and run it using the test case scripts.
> It worked slow, but worked.

I retried test_MIMEText.tar.bz2	on FreeBSD 8.0 with 640 MB of memory: the program takes ~5 minutes, but it doesn't fail (no memory error or crash).

I suppose that the crash cannot be reproduced by test_MIMEText.tar.bz2 example, only with the full program. Because I don't have access to the full program, I am unable to reproduce the bug, and because there is no activity on this issue since 2 years: I close this issue.

If you have more information (especially a short script to reproduce the crash), reopen the issue or create a new issue (maybe more specific? eg. patch MIMEText to use less memory).
History
Date User Action Args
2022-04-11 14:56:40adminsetgithub: 48427
2011-01-27 12:34:20vstinnersetstatus: open -> closed

nosy: + vstinner
messages: + msg127182

resolution: not a bug
2009-01-17 01:40:39vstinnersetnosy: - vstinner
2008-12-10 08:32:42loewissetnosy: + loewis
messages: + msg77496
versions: + Python 2.7, - Python 2.5, Python 2.5.3
2008-10-28 17:44:44surkampsetmessages: + msg75297
2008-10-24 13:39:47surkampsetmessages: + msg75168
2008-10-24 13:24:27vstinnersetmessages: + msg75167
2008-10-24 13:17:03surkampsetmessages: + msg75166
2008-10-24 13:03:23vstinnersetmessages: + msg75165
2008-10-24 12:48:15surkampsetmessages: + msg75164
2008-10-24 11:27:20vstinnersetnosy: + vstinner
messages: + msg75160
2008-10-24 10:21:04rpetrovsetmessages: + msg75158
2008-10-23 12:45:06surkampsetmessages: + msg75142
2008-10-22 21:46:14rpetrovsetnosy: + rpetrov
messages: + msg75112
2008-10-22 18:40:38surkampcreate