Author vstinner
Recipients eric.smith, ezio.melotti, lemburg, mark.dickinson, ron_adam, vstinner, ysj.ray
Date 2011-02-11.12:48:14
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1297428495.84.0.702163452654.issue7330@psf.upfronthosting.co.za>
In-reply-to
Content
It looks like your patch fixes #10829: you should add tests for that, you can just reuse the tests of my patch (attached to #10829).

---

unicode_format() looks suboptimal.

+    memset(buffer, ' ', width);
+    width_unicode = PyUnicode_FromStringAndSize(buffer, width);

You should avoid this byte string (buffer) and use memset() on the Unicode string directly. Something like:

Py_UNICODE *u;
Py_ssize_t i;
width_unicode = PyUnicode_FromUnicode(NULL, width);
u = PyUnicode_AS_UNICODE(width_unicode);
for(i=0; i < width; i++) {
  *u = (Py_UNICODE)' ';
  u++;
}

You should also avoid the creation of a temporary unicode object (it can be slow if precision is large) using PySequence_GetSlice(). Py_UNICODE_COPY() does already truncate the string because you can pass an arbitrary length.

---

I don't like "unicode_format" function name: it sounds like "str.format()" in Python. A suggestion: "unicode_format_align"

---

With your patch, "%.200s" truncates the input string to 200 *characters*, but I think that it should truncate to 200 *bytes*, as printf does.

---

-                n += PyUnicode_GET_SIZE(str);
+                n += width > PyUnicode_GET_SIZE(str) ? width : PyUnicode_GET_SIZE(str);

I don't like this change because I hate having to compute manually strings length. It should that it would be easier if you format directly strings with width and precision at step 3, instead of doing it at step 4: so you can just read the length of the formatted string, and it avoids having to handle width/precision in two steps (which may be inconsistent :-/).

---

Your patch implements %.100s (and %.100U): we might decide what to do with #10833 before commiting your patch.

---

In my opinion, the patch is a little bit too big. We may first commit the fix on the code parsing the width and precision: fix #10829?

---

Can you add tests for "%.s"? I would like to know if "%.s" is different than "%s" :-)

---

-                                 "must be a sequence, not %200s",
+                                 "must be a sequence, not %.200s",

Hum, I think that they are many other places where such fix should be done. Nobody noticed this typo before because %.200s nor %200s were implemented (#10833).


---

Finally, do you really need to implement %200s, %2.5s and %.100s? I don't know, but I would be ok to commit the patch if you fix it for all of my remarks :-)
History
Date User Action Args
2011-02-11 12:48:15vstinnersetrecipients: + vstinner, lemburg, mark.dickinson, eric.smith, ron_adam, ezio.melotti, ysj.ray
2011-02-11 12:48:15vstinnersetmessageid: <1297428495.84.0.702163452654.issue7330@psf.upfronthosting.co.za>
2011-02-11 12:48:14vstinnerlinkissue7330 messages
2011-02-11 12:48:14vstinnercreate