classification
Title: Limit the max size of PyUnicodeObject->defenc?
Type: Stage:
Components: Interpreter Core Versions: Python 3.0
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, gvanrossum, lemburg, pitrou
Priority: normal Keywords:

Created on 2007-12-18 12:13 by christian.heimes, last changed 2008-01-22 23:49 by gvanrossum. This issue is now closed.

Messages (6)
msg58744 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-18 12:13
I think that the cached default encoding version of the unicode object
should be limited in size. It's probably a bad idea to cache a 100MB of
data. For large amount strings and unicode objects the user should do
explicit caching if required.
msg58756 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-18 19:14
I don't see a patch. And I think you cannot do this without compromising
correctness, since _PyUnicode_AsDefaultEncodedString() returns the
cached value without incrementing its refcount.  (The only refcount that
keeps it alive is the cache entry.)
msg61547 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-01-22 23:02
The default encoding version is generated lazily, and only from a couple
of places (if I believe my grepping through the py3k sources).
So we can:
 * choose not to care, as the conversion looks rather rare
 * incref the return value of _PyUnicode_AsDefaultEncodedString(), and
convert the 20 or so places in which that function is used to properly
decref the value when done
msg61548 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-22 23:05
> * choose not to care, as the conversion looks rather rare

Yes.

> * incref the return value of _PyUnicode_AsDefaultEncodedString(),
> and convert the 20 or so places in which that function is used to
> properly decref the value when done

No. I suspect you'll find it quite difficult to pick a place where to do
the decref in some cases.
msg61549 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-01-22 23:41
For Py3k you can get rid of the cached default encoded version of the
Unicode object altogether:

This was only needed to make the Unicode/string auto-coercion mechanism
efficient in Python 2.x. In Py3k, you'll only do such conversions at the
IO-boundaries and explicitly, so caching the converted value is no
longer necessary.
msg61550 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-22 23:49
You wish.  In practice (unfortunately) it's still used quite a bit.  It
would be a good project to eradicate the need, but I see it as low priority.
History
Date User Action Args
2008-01-22 23:49:21gvanrossumsetmessages: + msg61550
2008-01-22 23:41:31lemburgsetnosy: + lemburg
messages: + msg61549
2008-01-22 23:05:23gvanrossumsetstatus: open -> closed
messages: + msg61548
2008-01-22 23:02:11pitrousetnosy: + pitrou
messages: + msg61547
2008-01-06 22:29:44adminsetkeywords: - py3k
versions: Python 3.0
2007-12-18 19:14:12gvanrossumsetpriority: high -> normal
keywords: - patch
resolution: rejected
messages: + msg58756
nosy: + gvanrossum
2007-12-18 12:13:40christian.heimescreate