Message 128381 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	eric.smith, ezio.melotti, lemburg, mark.dickinson, ron_adam, vstinner, ysj.ray
Date	2011-02-11.12:48:14
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<1297428495.84.0.702163452654.issue7330@psf.upfronthosting.co.za>
In-reply-to

Content
It looks like your patch fixes #10829: you should add tests for that, you can just reuse the tests of my patch (attached to #10829). --- unicode_format() looks suboptimal. + memset(buffer, ' ', width); + width_unicode = PyUnicode_FromStringAndSize(buffer, width); You should avoid this byte string (buffer) and use memset() on the Unicode string directly. Something like: Py_UNICODE u; Py_ssize_t i; width_unicode = PyUnicode_FromUnicode(NULL, width); u = PyUnicode_AS_UNICODE(width_unicode); for(i=0; i < width; i++) { u = (Py_UNICODE)' '; u++; } You should also avoid the creation of a temporary unicode object (it can be slow if precision is large) using PySequence_GetSlice(). Py_UNICODE_COPY() does already truncate the string because you can pass an arbitrary length. --- I don't like "unicode_format" function name: it sounds like "str.format()" in Python. A suggestion: "unicode_format_align" --- With your patch, "%.200s" truncates the input string to 200 characters, but I think that it should truncate to 200 bytes, as printf does. --- - n += PyUnicode_GET_SIZE(str); + n += width > PyUnicode_GET_SIZE(str) ? width : PyUnicode_GET_SIZE(str); I don't like this change because I hate having to compute manually strings length. It should that it would be easier if you format directly strings with width and precision at step 3, instead of doing it at step 4: so you can just read the length of the formatted string, and it avoids having to handle width/precision in two steps (which may be inconsistent :-/). --- Your patch implements %.100s (and %.100U): we might decide what to do with #10833 before commiting your patch. --- In my opinion, the patch is a little bit too big. We may first commit the fix on the code parsing the width and precision: fix #10829? --- Can you add tests for "%.s"? I would like to know if "%.s" is different than "%s" :-) --- - "must be a sequence, not %200s", + "must be a sequence, not %.200s", Hum, I think that they are many other places where such fix should be done. Nobody noticed this typo before because %.200s nor %200s were implemented (#10833). --- Finally, do you really need to implement %200s, %2.5s and %.100s? I don't know, but I would be ok to commit the patch if you fix it for all of my remarks :-)

It looks like your patch fixes #10829: you should add tests for that, you can just reuse the tests of my patch (attached to #10829).

---

unicode_format() looks suboptimal.

+    memset(buffer, ' ', width);
+    width_unicode = PyUnicode_FromStringAndSize(buffer, width);

You should avoid this byte string (buffer) and use memset() on the Unicode string directly. Something like:

Py_UNICODE *u;
Py_ssize_t i;
width_unicode = PyUnicode_FromUnicode(NULL, width);
u = PyUnicode_AS_UNICODE(width_unicode);
for(i=0; i < width; i++) {
  *u = (Py_UNICODE)' ';
  u++;
}

You should also avoid the creation of a temporary unicode object (it can be slow if precision is large) using PySequence_GetSlice(). Py_UNICODE_COPY() does already truncate the string because you can pass an arbitrary length.

---

I don't like "unicode_format" function name: it sounds like "str.format()" in Python. A suggestion: "unicode_format_align"

---

With your patch, "%.200s" truncates the input string to 200 *characters*, but I think that it should truncate to 200 *bytes*, as printf does.

---

-                n += PyUnicode_GET_SIZE(str);
+                n += width > PyUnicode_GET_SIZE(str) ? width : PyUnicode_GET_SIZE(str);

I don't like this change because I hate having to compute manually strings length. It should that it would be easier if you format directly strings with width and precision at step 3, instead of doing it at step 4: so you can just read the length of the formatted string, and it avoids having to handle width/precision in two steps (which may be inconsistent :-/).

---

Your patch implements %.100s (and %.100U): we might decide what to do with #10833 before commiting your patch.

---

In my opinion, the patch is a little bit too big. We may first commit the fix on the code parsing the width and precision: fix #10829?

---

Can you add tests for "%.s"? I would like to know if "%.s" is different than "%s" :-)

---

-                                 "must be a sequence, not %200s",
+                                 "must be a sequence, not %.200s",

Hum, I think that they are many other places where such fix should be done. Nobody noticed this typo before because %.200s nor %200s were implemented (#10833).


---

Finally, do you really need to implement %200s, %2.5s and %.100s? I don't know, but I would be ok to commit the patch if you fix it for all of my remarks :-)

History
Date	User	Action	Args
2011-02-11 12:48:15	vstinner	set	recipients: + vstinner, lemburg, mark.dickinson, eric.smith, ron_adam, ezio.melotti, ysj.ray
2011-02-11 12:48:15	vstinner	set	messageid: <1297428495.84.0.702163452654.issue7330@psf.upfronthosting.co.za>
2011-02-11 12:48:14	vstinner	link	issue7330 messages
2011-02-11 12:48:14	vstinner	create