Author ysj.ray
Recipients eric.smith, ezio.melotti, lemburg, mark.dickinson, ron_adam, ysj.ray
Date 2010-08-01.09:27:48
SpamBayes Score 1.18831e-07
Marked as misclassified No
Message-id <1280654873.82.0.481991797255.issue7330@psf.upfronthosting.co.za>
In-reply-to
Content
Here is the patch, it add support to use width and precision formatters in PyUnicode_FromFormat() for type %s, %S, %R, %V, %U, %A, besides fixed two bugs, which at least I believe:


1. According to PyUnicode_FromFormat() doc: http://docs.python.org/dev/py3k/c-api/unicode.html?highlight=pyunicode_fromformat#PyUnicode_FromFormat, the "%A" should produce result of ascii(). But in the existing code, I only find code of  call to ascii(object) and calculate the spaces needed for it, but not appending the ascii() output to result. Also according to my simple test, the %A doesn't work, as the following simple test function:
static PyObject *
getstr(PyObject *self, PyObject *args)
{
    const char *s = "hello world";
    PyObject *unicode = PyUnicode_FromString(s);
    return PyUnicode_FromFormat("%A", unicode);
}
Which should return the result of calling ascii() with the object named *unicode* as its argument. The result should be a unicode object with string "hello world". But it actually return a unicode object with string "%A". This can be fixed by adding the following line:
                   case 'A':
in step 4.


2. another bug, here is a piece of code in Object/unicodeobject.c, PyUnicode_FromFormatV():

797          if (*f == '%') {
798  #ifdef HAVE_LONG_LONG
799              int longlongflag = 0;
800  #endif
801              const char* p = f;
802              width = 0;
803              while (ISDIGIT((unsigned)*f))
804                  width = (width*10) + *f++ - '0';


Here the variable *width* cannot be correctly calculated, because the while loop will not execute, the *f currently is definitely '%'! So the width is always 0. But currently this doesn't cause error, since the following codes will ensure width >= MAX_LONG_CHARS:

834        case 'd': case 'u': case 'i': case 'x':
835            (void) va_arg(count, int);
836  #ifdef HAVE_LONG_LONG
837            if (longlongflag) {
838               if (width < MAX_LONG_LONG_CHARS)
839                    width = MAX_LONG_LONG_CHARS;
840            }
841            else
842  #endif
843                /* MAX_LONG_CHARS is enough to hold a 64-bit integer,
844                 including sign.  Decimal takes the most space.  This
845                 isn't enough for octal.  If a width is specified we
846                 need more (which we allocate later). */
847                if (width < MAX_LONG_CHARS)
848                    width = MAX_LONG_CHARS;

(currently width and precision only apply to integer types:%d, %u, %i, %x, not string and object types:%s, %S, %R, %A, %U, %V )

To fix, the following line:
801              const char* p = f;
should be:
801              const char* p = f++;
just as the similar loop in step 4, and add another line:
                 f--;
after calculate width to adapting the character pointer.


My patch fixed these two problems. Hoping somebody could take a look at it.
History
Date User Action Args
2010-08-01 09:27:54ysj.raysetrecipients: + ysj.ray, lemburg, mark.dickinson, eric.smith, ron_adam, ezio.melotti
2010-08-01 09:27:53ysj.raysetmessageid: <1280654873.82.0.481991797255.issue7330@psf.upfronthosting.co.za>
2010-08-01 09:27:52ysj.raylinkissue7330 messages
2010-08-01 09:27:51ysj.raycreate