Message 203371 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	serhiy.storchaka, vstinner
Date	2013-11-19.13:00:56
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1384866058.1.0.245171525218.issue19653@psf.upfronthosting.co.za>
In-reply-to

Content
The _PyUnicodeWriter API avoids creation of temporary Unicode strings and has very good performances to build Unicode strings with the PEP 393 (compact unicode string). Attached patch adds a _PyObject_ReprWriter() function to avoid creation of tempory Unicode string while calling repr(obj) on containers like tuple, list or dict. I did something similar for str%args and str.format(args). To avoid the following code, we might add something to PyTypeObject, maybe a new tp_repr_writer field. + if (PyLong_CheckExact(v)) { + return _PyLong_FormatWriter(writer, v, 10, 0); + } + if (PyUnicode_CheckExact(v)) { + return _PyUnicode_ReprWriter(writer, v); + } + if (PyList_CheckExact(v)) { + return _PyList_ReprWriter(writer, v); + } + if (PyTuple_CheckExact(v)) { + return _PyTuple_ReprWriter(writer, v); + } + if (PyList_CheckExact(v)) { + return _PyList_ReprWriter(writer, v); + } + if (PyDict_CheckExact(v)) { + return _PyDict_ReprWriter(writer, v); + } For example, repr(list(range(10))) ('[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]') should only allocate one buffer of 37 bytes and then shink it to 30 bytes. I guess that benchmarks are required to justify such changes.

The _PyUnicodeWriter API avoids creation of temporary Unicode strings and has very good performances to build Unicode strings with the PEP 393 (compact unicode string).

Attached patch adds a _PyObject_ReprWriter() function to avoid creation of tempory Unicode string while calling repr(obj) on containers like tuple, list or dict.

I did something similar for str%args and str.format(args).

To avoid the following code, we might add something to PyTypeObject, maybe a new tp_repr_writer field.

+    if (PyLong_CheckExact(v)) {
+        return _PyLong_FormatWriter(writer, v, 10, 0);
+    }
+    if (PyUnicode_CheckExact(v)) {
+        return _PyUnicode_ReprWriter(writer, v);
+    }
+    if (PyList_CheckExact(v)) {
+        return _PyList_ReprWriter(writer, v);
+    }
+    if (PyTuple_CheckExact(v)) {
+        return _PyTuple_ReprWriter(writer, v);
+    }
+    if (PyList_CheckExact(v)) {
+        return _PyList_ReprWriter(writer, v);
+    }
+    if (PyDict_CheckExact(v)) {
+        return _PyDict_ReprWriter(writer, v);
+    }

For example, repr(list(range(10))) ('[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]')  should only allocate one buffer of 37 bytes and then shink it to 30 bytes.

I guess that benchmarks are required to justify such changes.

History
Date	User	Action	Args
2013-11-19 13:00:58	vstinner	set	recipients: + vstinner, serhiy.storchaka
2013-11-19 13:00:58	vstinner	set	messageid: <1384866058.1.0.245171525218.issue19653@psf.upfronthosting.co.za>
2013-11-19 13:00:58	vstinner	link	issue19653 messages
2013-11-19 13:00:57	vstinner	create