Message 128725 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ysj.ray
Recipients	eric.smith, ezio.melotti, lemburg, mark.dickinson, ron_adam, vstinner, ysj.ray
Date	2011-02-17.14:07:11
SpamBayes Score	3.330669e-16
Marked as misclassified	No
Message-id	<1297951632.53.0.32370830023.issue7330@psf.upfronthosting.co.za>
In-reply-to

Content
Thanks hyapo! > It looks like your patch fixes #10829: you should add tests for that, you can just reuse the tests of my patch (attached to #10829). Sorry, but I think my patch doesn't fix #10829. It seems link another issue. And by applying my patch and add tests from #10829's patch, the tests cannot passed. Or did I missed something? > You should also avoid the creation of a temporary unicode object (it can be slow if precision is large) using PySequence_GetSlice(). Py_UNICODE_COPY() does already truncate the string because you can pass an arbitrary length. In order to use Py_UNICODE_COPY, I have to create a unicode object with required length first. I feel this have the same cost as using PySequence_GetSlice(). If I understand correctly? > With your patch, "%.200s" truncates the input string to 200 characters, but I think that it should truncate to 200 bytes, as printf does. Sorry, I don't understand. The result of PyUnicode_FromFormatV() is a unicode object. Then how to truncate to 200 bytes? I think the %s formatter just indicate that the argument is c-style chars, the result is always unicode string, and the width and precision formatters are to applied after converting c-style chars to string. > I don't like this change because I hate having to compute manually strings length. It should that it would be easier if you format directly strings with width and precision at step 3, instead of doing it at step 4: so you can just read the length of the formatted string, and it avoids having to handle width/precision in two steps (which may be inconsistent :-/). Do you mean combine step 3 and step 4 together? Currently step 3 is just to compute the biggest width value and step 4 is to compute exact width and do the real format work. Only by doing real format we can get the exact width of a string. So I have to compute each width twice in both step 3 and step 4. Is combining the two steps in to one a good idea? > In my opinion, the patch is a little bit too big. We may first commit the fix on the code parsing the width and precision: fix #10829? Again, I guess #10829 need another its own patch to fix. > Can you add tests for "%.s"? I would like to know if "%.s" is different than "%s" :-) Err, '%.s' causes unexpected result both with and without my patch. Maybe it's still another bug?

Thanks hyapo! 

> It looks like your patch fixes #10829: you should add tests for that, you can just reuse the tests of my patch (attached to #10829).

Sorry, but I think my patch doesn't fix #10829. It seems link another issue. And by applying my patch and add tests from #10829's patch, the tests cannot passed. Or did I missed something?


> You should also avoid the creation of a temporary unicode object (it can be slow if precision is large) using PySequence_GetSlice(). Py_UNICODE_COPY() does already truncate the string because you can pass an arbitrary length.

In order to use Py_UNICODE_COPY, I have to create a unicode object with required length first. I feel this have the same cost as using PySequence_GetSlice(). If I understand correctly?


> With your patch, "%.200s" truncates the input string to 200 *characters*, but I think that it should truncate to 200 *bytes*, as printf does.

Sorry, I don't understand. The result of PyUnicode_FromFormatV() is a unicode object. Then how to truncate to 200 *bytes*? I think the %s formatter just indicate that the argument is c-style chars, the result is always unicode string, and the width and precision formatters are to applied after converting c-style chars to string. 


> I don't like this change because I hate having to compute manually strings length. It should that it would be easier if you format directly strings with width and precision at step 3, instead of doing it at step 4: so you can just read the length of the formatted string, and it avoids having to handle width/precision in two steps (which may be inconsistent :-/).

Do you mean combine step 3 and step 4 together? Currently step 3 is just to compute the biggest width value and step 4 is to compute exact width and do the real format work. Only by doing real format we can get the exact width of a string. So I have to compute each width twice in both step 3 and step 4. Is combining the two steps in to one a good idea?


> In my opinion, the patch is a little bit too big. We may first commit the fix on the code parsing the width and precision: fix #10829?

Again, I guess #10829 need another its own patch to fix. 


> Can you add tests for "%.s"? I would like to know if "%.s" is different than "%s" :-)

Err, '%.s' causes unexpected result both with and without my patch. Maybe it's still another bug?

History
Date	User	Action	Args
2011-02-17 14:07:12	ysj.ray	set	recipients: + ysj.ray, lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti
2011-02-17 14:07:12	ysj.ray	set	messageid: <1297951632.53.0.32370830023.issue7330@psf.upfronthosting.co.za>
2011-02-17 14:07:11	ysj.ray	link	issue7330 messages
2011-02-17 14:07:11	ysj.ray	create