Message 128933 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ysj.ray
Recipients	eric.smith, ezio.melotti, lemburg, mark.dickinson, ron_adam, vstinner, ysj.ray
Date	2011-02-21.03:18:04
SpamBayes Score	3.1820575e-09
Marked as misclassified	No
Message-id	<1298258285.71.0.550412991899.issue7330@psf.upfronthosting.co.za>
In-reply-to

Content
> > > With your patch, "%.200s" truncates the input string to 200 characters, but I think that it should truncate to 200 bytes, as printf does. > > > > Sorry, I don't understand. The result of PyUnicode_FromFormatV() is a unicode object. Then how to truncate to 200 bytes? > You can truncate the input char* on the call to PyUnicode_DecodeUTF8: pass a size smaller than strlen(s). Now I wonder how should we treat precision formatters of '%s'. First of all, the PyUnicode_FromFormat() should behave like C printf(). In C printf(), the precision formatter of %s is to specify a maximum width of the displayed result. If final result is longer than that value, it must be truncated. That means the precision is applied on the final result. While python's PyUnicode_FromFormat() is to produce unicode strings, so the width and precision formatter should be applied on the final unicode string result. And the format stage is split into two ones, one is converting each paramater to an unicode string, another one is to put the width and precision formatters on them. So I wonder if we should apply the precision formatter on the converting stage, that is, to PyUnicode_DecodeUTF8(). So in my opinion precision should not be applied to input chars, but output unicodes. I hope I didn't misunderstand something. So haypo, what's your opinion.

> > > With your patch, "%.200s" truncates the input string to 200 *characters*, but I think that it should truncate to 200 *bytes*, as printf does.
> > 
> > Sorry, I don't understand. The result of PyUnicode_FromFormatV() is a unicode object. Then how to truncate to 200 *bytes*?

> You can truncate the input char* on the call to PyUnicode_DecodeUTF8:
pass a size smaller than strlen(s).


Now I wonder how should we treat precision formatters of '%s'. First of all, the PyUnicode_FromFormat() should behave like C printf(). In C printf(), the precision formatter of %s is to specify a maximum width of the displayed result. If final result is longer than that value, it must be truncated. That means the precision is applied on the final result. While python's PyUnicode_FromFormat() is to produce unicode strings, so the width and precision formatter should be applied on the final unicode string result. And the format stage is split into two ones, one is converting each paramater to an unicode string, another one is to put the width and precision formatters on them. So I wonder if we should apply the precision formatter on the converting stage, that is, to PyUnicode_DecodeUTF8(). So in my opinion precision should not be applied to input chars, but output unicodes.

I hope I didn't misunderstand something.

So haypo, what's your opinion.

History
Date	User	Action	Args
2011-02-21 03:18:05	ysj.ray	set	recipients: + ysj.ray, lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti
2011-02-21 03:18:05	ysj.ray	set	messageid: <1298258285.71.0.550412991899.issue7330@psf.upfronthosting.co.za>
2011-02-21 03:18:04	ysj.ray	link	issue7330 messages
2011-02-21 03:18:04	ysj.ray	create