Message 51678 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	larry
Recipients
Date	2007-01-14.10:42:55
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Thanks for taking the time! > - Style: you set your tab stops to 4 spaces. That is an absolute > no-no! Sorry about that; I'll fix it if I resubmit. > - Segfault in test_array. It seems that it's receiving a unicode > slice object and treating it like a "classic" unicode object. I tested on Windows and Linux, and I haven't seen that behavior. Which test_array, by the way? In Lib/test, or Lib/ctypes/test? I'm having trouble with most of the DLL extensions on Windows; they complain that the module uses the incompatible python26.dll or python26_d.dll. So I haven't tested ctypes/test_array.py on Windows, but I have tested the other three permutations of Linux vs Windows and Lib/test/test_array vs Lib/ctypes/test/test_array. Can you give me a stack trace to the segfault? With that I bet I can fix it even without a reproducible test case. > - I got it to come to a grinding halt with the following worst-case > scenario: > > a = [] > while True: > x = u"x"1000000 > x = x[30:60] # Short slice of long string > a.append(x) > > If you can't do better than that, I'll have to reject it. > > PS I used your combined patch, if it matters. It matters. The combined patch has "lazy slices", the other patch does not. When you say "grind to a halt" I'm not sure what you mean. Was it thrashing? How much CPU was it using? When I ran that test, my Windows computer got to 1035 iterations then threw a MemoryError. My Linux box behaved the same, except it got to 1605 iterations. Adding a call to .simplify() on the slice defeats this worst-case scenario: a = [] while True: x = u"x"1000000 x = x[30:60].simplify() # Short slice of long string a.append(x) .simplify() forces lazy strings to render themselves. With that change, this test will run until the cows come home. Is that acceptable? Failing that, is there any sort of last-ditch garbage collection pass that gets called when a memory allocation fails but before it returns NULL? If so, I could hook in to that and try to render some slices. (I don't see such a pass, but maybe I missed it.) Failing that, I could add garbage-collect-and-retry-once logic to memory allocation myself, either just for unicodeobject.c or as a global change. But I'd be shocked if you were interested in that approach; if Python doesn't have such a thing by now, you probably don't want it. And failing that, "lazy slices" are probably toast. It always was a tradeoff of speed for worst-case memory use, and I always knew it might not fly. If that's the case, please take a look at the other patch, and in the meantime I'll see if anyone can come up with other ways to mitigate the worst-case scenario.

Thanks for taking the time!

> - Style: you set your tab stops to 4 spaces.  That is an absolute
> no-no!

Sorry about that; I'll fix it if I resubmit.


> - Segfault in test_array. It seems that it's receiving a unicode
> slice object and treating it like a "classic" unicode object.

I tested on Windows and Linux, and I haven't seen that behavior.

Which test_array, by the way?  In Lib/test, or Lib/ctypes/test?
I'm having trouble with most of the DLL extensions on Windows;
they complain that the module uses the incompatible python26.dll
or python26_d.dll.  So I haven't tested ctypes/test_array.py
on Windows, but I have tested the other three permutations of
Linux vs Windows and Lib/test/test_array vs
Lib/ctypes/test/test_array.

Can you give me a stack trace to the segfault?  With that I bet I
can fix it even without a reproducible test case.


> - I got it to come to a grinding halt with the following worst-case
> scenario:
> 
>   a = []
>   while True:
>       x = u"x"*1000000
>       x = x[30:60]  # Short slice of long string
>       a.append(x)
> 
> If you can't do better than that, I'll have to reject it.
> 
> PS I used your combined patch, if it matters.

It matters.  The combined patch has "lazy slices", the other
patch does not.


When you say "grind to a halt" I'm not sure what you mean.
Was it thrashing?  How much CPU was it using?

When I ran that test, my Windows computer got to 1035 iterations
then threw a MemoryError.  My Linux box behaved the same, except
it got to 1605 iterations.


Adding a call to .simplify() on the slice defeats this worst-case
scenario:

a = []
while True:
    x = u"x"*1000000
    x = x[30:60].simplify()  # Short slice of long string
    a.append(x)

.simplify() forces lazy strings to render themselves.  With that
change, this test will run until the cows come home.  Is that
acceptable?


Failing that, is there any sort of last-ditch garbage collection
pass that gets called when a memory allocation fails but before
it returns NULL?  If so, I could hook in to that and try to render
some slices.  (I don't see such a pass, but maybe I missed it.)

Failing that, I could add garbage-collect-and-retry-once logic to
memory allocation myself, either just for unicodeobject.c or as a
global change.  But I'd be shocked if you were interested in that
approach; if Python doesn't have such a thing by now, you probably
don't want it.

And failing that, "lazy slices" are probably toast.  It always was
a tradeoff of speed for worst-case memory use, and I always knew
it might not fly.  If that's the case, please take a look at the
other patch, and in the meantime I'll see if anyone can come up with
other ways to mitigate the worst-case scenario.

History
Date	User	Action	Args
2007-08-23 15:56:04	admin	link	issue1629305 messages
2007-08-23 15:56:04	admin	create