This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dmtr
Recipients dmtr, eric.araujo, ezio.melotti, mark.dickinson, pitrou, rhettinger
Date 2010-08-06.01:40:19
SpamBayes Score 1.0796695e-07
Marked as misclassified No
Message-id <1281058823.56.0.268082505436.issue9520@psf.upfronthosting.co.za>
In-reply-to
Content
No. I'm not simply running out of system memory. 8Gb/x64/linux. And in my test cases I've only seen ~25% of memory utilized. And good idea. I'll try to play with the cyclic garbage collector.

It is harder than I thought to make a solid synthetic test case addressing that issue. The trouble you need to be able to generate data (e.g. 100,000,000 words/5,000,000 unique) with a distribution close to that in the real life scenario (e.g. word lengths, frequencies and uniqueness in the english text). If somebody have a good idea onto how to do it nicely - you'd be very welcome. 

My best shot so far is in the attachment.
History
Date User Action Args
2010-08-06 01:40:24dmtrsetrecipients: + dmtr, rhettinger, mark.dickinson, pitrou, ezio.melotti, eric.araujo
2010-08-06 01:40:23dmtrsetmessageid: <1281058823.56.0.268082505436.issue9520@psf.upfronthosting.co.za>
2010-08-06 01:40:21dmtrlinkissue9520 messages
2010-08-06 01:40:20dmtrcreate