Message 75297 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	surkamp
Recipients	rpetrov, surkamp, vstinner
Date	2008-10-28.17:44:43
SpamBayes Score	2.946332e-11
Marked as misclassified	No
Message-id	<1225215885.65.0.645170802593.issue4177@psf.upfronthosting.co.za>
In-reply-to

Content
Ok. Something is very wrong with our code too. I have dumped the text that's cousing the "freeze" and run it using the test case scripts. It worked slow, but worked. It seems that our application is eating too many memory from server (about 60Mbytes for a 2.4Mbytes message), so its obviously a application bug/leak. Unfortunately I cant submit the files for performance test, becose they may contain confidential information. As long as I can see on GDB, the python process is in a loop inside this functions: #0 0x2825798e in memcpy () from /lib/libc.so.7 #1 0x080a4607 in PyUnicodeUCS4_Concat () #2 0x080aec8d in PyEval_EvalFrameEx () #3 0x080b2c49 in PyEval_EvalCodeEx () #4 0x080b111a in PyEval_EvalFrameEx () #5 0x080b2c49 in PyEval_EvalCodeEx () #6 0x080b111a in PyEval_EvalFrameEx () #7 0x080b1f65 in PyEval_EvalFrameEx () #8 0x080b2c49 in PyEval_EvalCodeEx () #9 0x080b111a in PyEval_EvalFrameEx () #10 0x080b2c49 in PyEval_EvalCodeEx () #11 0x080eebd6 in PyClassMethod_New () #12 0x08059ef7 in PyObject_Call () #13 0x0805f341 in PyClass_IsSubclass () #14 0x08059ef7 in PyObject_Call () #15 0x080ac86c in PyEval_CallObjectWithKeywords () #16 0x080629d6 in PyInstance_New () #17 0x08059ef7 in PyObject_Call () #18 0x080af2bb in PyEval_EvalFrameEx () #19 0x080b2c49 in PyEval_EvalCodeEx () #20 0x080b111a in PyEval_EvalFrameEx () #21 0x080b1f65 in PyEval_EvalFrameEx () #22 0x080b1f65 in PyEval_EvalFrameEx () #23 0x080b1f65 in PyEval_EvalFrameEx () #24 0x080b2c49 in PyEval_EvalCodeEx () #25 0x080eec4e in PyClassMethod_New () #26 0x08059ef7 in PyObject_Call () #27 0x0805f341 in PyClass_IsSubclass () #28 0x08059ef7 in PyObject_Call () #29 0x080ac86c in PyEval_CallObjectWithKeywords () #30 0x080d4b58 in initthread () #31 0x28175acf in pthread_getprio () from /lib/libthr.so.3 #32 0x00000000 in ?? () Every memcpy call take a lot to complete, but it seems a problem with GDB debugging as it eats 80% to 95% of the CPU and python just 1% or 2%. How python charset conversion works from inside? It duplicates the original string every character substitution? If this is the case, shouldn't be better to count the substituitions, calculate the amount of needed memory and make just one allocation for the new string? Then copy the unmodified characters from the original to the new string and change other chars as needed?

Ok. Something is very wrong with our code too. I have dumped the text
that's cousing the "freeze" and run it using the test case scripts. It
worked slow, but worked. It seems that our application is eating too
many memory from server (about 60Mbytes for a 2.4Mbytes message), so its
obviously a application bug/leak.

Unfortunately I cant submit the files for performance test, becose they
may contain confidential information.

As long as I can see on GDB, the python process is in a loop inside this
functions:

#0  0x2825798e in memcpy () from /lib/libc.so.7
#1  0x080a4607 in PyUnicodeUCS4_Concat ()
#2  0x080aec8d in PyEval_EvalFrameEx ()
#3  0x080b2c49 in PyEval_EvalCodeEx ()
#4  0x080b111a in PyEval_EvalFrameEx ()
#5  0x080b2c49 in PyEval_EvalCodeEx ()
#6  0x080b111a in PyEval_EvalFrameEx ()
#7  0x080b1f65 in PyEval_EvalFrameEx ()
#8  0x080b2c49 in PyEval_EvalCodeEx ()
#9  0x080b111a in PyEval_EvalFrameEx ()
#10 0x080b2c49 in PyEval_EvalCodeEx ()
#11 0x080eebd6 in PyClassMethod_New ()
#12 0x08059ef7 in PyObject_Call ()
#13 0x0805f341 in PyClass_IsSubclass ()
#14 0x08059ef7 in PyObject_Call ()
#15 0x080ac86c in PyEval_CallObjectWithKeywords ()
#16 0x080629d6 in PyInstance_New ()
#17 0x08059ef7 in PyObject_Call ()
#18 0x080af2bb in PyEval_EvalFrameEx ()
#19 0x080b2c49 in PyEval_EvalCodeEx ()
#20 0x080b111a in PyEval_EvalFrameEx ()
#21 0x080b1f65 in PyEval_EvalFrameEx ()
#22 0x080b1f65 in PyEval_EvalFrameEx ()
#23 0x080b1f65 in PyEval_EvalFrameEx ()
#24 0x080b2c49 in PyEval_EvalCodeEx ()
#25 0x080eec4e in PyClassMethod_New ()
#26 0x08059ef7 in PyObject_Call ()
#27 0x0805f341 in PyClass_IsSubclass ()
#28 0x08059ef7 in PyObject_Call ()
#29 0x080ac86c in PyEval_CallObjectWithKeywords ()
#30 0x080d4b58 in initthread ()
#31 0x28175acf in pthread_getprio () from /lib/libthr.so.3
#32 0x00000000 in ?? ()

Every memcpy call take a lot to complete, but it seems a problem with
GDB debugging as it eats 80% to 95% of the CPU and python just 1% or 2%.

How python charset conversion works from inside? It duplicates the
original string every character substitution?
If this is the case, shouldn't be better to count the substituitions,
calculate the amount of needed memory and make just one allocation for
the new string? Then copy the unmodified characters from the original to
the new string and change other chars as needed?

History
Date	User	Action	Args
2008-10-28 17:44:46	surkamp	set	recipients: + surkamp, vstinner, rpetrov
2008-10-28 17:44:45	surkamp	set	messageid: <1225215885.65.0.645170802593.issue4177@psf.upfronthosting.co.za>
2008-10-28 17:44:44	surkamp	link	issue4177 messages
2008-10-28 17:44:43	surkamp	create