Message 113060 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	dmtr
Recipients	dmtr, eric.araujo, ezio.melotti, mark.dickinson, pitrou, rhettinger
Date	2010-08-06.01:40:19
SpamBayes Score	1.0796695e-07
Marked as misclassified	No
Message-id	<1281058823.56.0.268082505436.issue9520@psf.upfronthosting.co.za>
In-reply-to

Content
No. I'm not simply running out of system memory. 8Gb/x64/linux. And in my test cases I've only seen ~25% of memory utilized. And good idea. I'll try to play with the cyclic garbage collector. It is harder than I thought to make a solid synthetic test case addressing that issue. The trouble you need to be able to generate data (e.g. 100,000,000 words/5,000,000 unique) with a distribution close to that in the real life scenario (e.g. word lengths, frequencies and uniqueness in the english text). If somebody have a good idea onto how to do it nicely - you'd be very welcome. My best shot so far is in the attachment.

No. I'm not simply running out of system memory. 8Gb/x64/linux. And in my test cases I've only seen ~25% of memory utilized. And good idea. I'll try to play with the cyclic garbage collector.

It is harder than I thought to make a solid synthetic test case addressing that issue. The trouble you need to be able to generate data (e.g. 100,000,000 words/5,000,000 unique) with a distribution close to that in the real life scenario (e.g. word lengths, frequencies and uniqueness in the english text). If somebody have a good idea onto how to do it nicely - you'd be very welcome. 

My best shot so far is in the attachment.

History
Date	User	Action	Args
2010-08-06 01:40:24	dmtr	set	recipients: + dmtr, rhettinger, mark.dickinson, pitrou, ezio.melotti, eric.araujo
2010-08-06 01:40:23	dmtr	set	messageid: <1281058823.56.0.268082505436.issue9520@psf.upfronthosting.co.za>
2010-08-06 01:40:21	dmtr	link	issue9520 messages
2010-08-06 01:40:20	dmtr	create