Message 108521 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	eric.smith, lemburg, loewis, pitrou, skrah
Date	2010-06-24.15:25:02
SpamBayes Score	2.327905e-06
Marked as misclassified	No
Message-id	<4C2378CD.10805@egenix.com>
In-reply-to	<20100624144713.GA1188@yoda.bytereef.org>

Content
Stefan Krah wrote: > > Stefan Krah <stefan-usenet@bytereef.org> added the comment: > > Antoine Pitrou <report@bugs.python.org> wrote: >>> Thus, >>> ((unsigned char)((c) & 0xff)) and ((unsigned char)(c)) should produce >>> the same results. >> >> If it's the case, it's probably optimized away by the compiler anyway. > > Yes, the asm output is the same. I was more concerned about readability. > > Historically, Py_CHARMASK started as ((c) & 0xff), hence the name. In r34974 > UCHAR_MAX = 2*CHAR_BIT-1 = 255 was enforced. In r61936 the cast was added. > >>> There is no reason not to do the cast when __CHAR_UNSIGNED__ is >>> defined (it will be a no-op). >> >> Agreed. It also reduces the opportunity for bugs :) > > Ok, let's say we use ((unsigned char)((c) & 0xff)) also for __CHAR_UNSIGNED__. > > What should the comment say about the intended argument? I've examined all > occurrences in the tree and almost always Py_CHARMASK is used with either > char arguments or int arguments that are directly derived from a char, so > no EOF issues arise. Why not just make the casts in those cases explicit instead of using Py_CHARMASK ? > Exceptions: > =========== > > Index: Objects/unicodeobject.c > =================================================================== > --- Objects/unicodeobject.c (revision 82192) > +++ Objects/unicodeobject.c (working copy) > @@ -8417,6 +8417,7 @@ > else if (c >= '0' && c <= '9') { > prec = c - '0'; > while (--fmtcnt >= 0) { > + / XXX: c and fmt are Py_UNICODE / > c = Py_CHARMASK(fmt++); If both are Py_UNICODE, why do you need the cast at all ? Note that the above use also appears to be wrong, since if fmt is Py_UNICODE, the following if() will also match if the higher order bytes of the Py_UNICODE value are non-0. > if (c < '0' \|\| c > '9') > break; > > Index: Modules/_json.c > =================================================================== > --- Modules/_json.c (revision 82192) > +++ Modules/_json.c (working copy) > @@ -603,6 +603,7 @@ > } > } > else { > + /* XXX: c is Py_UNICODE */ > char c_char = Py_CHARMASK(c); In this case you should get a compiler warning due to the different signedness of the arguments. Same comment as above: the high order bytes of c are lost in this conversion. Why can't _json.c use one of the existing Unicode conversion APIs ? > chunk = PyString_FromStringAndSize(&c_char, 1); > if (chunk == NULL) { > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue9036> > _______________________________________ > _______________________________________________ > Python-bugs-list mailing list > Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/mal%40egenix.com

Stefan Krah wrote:
> 
> Stefan Krah <stefan-usenet@bytereef.org> added the comment:
> 
> Antoine Pitrou <report@bugs.python.org> wrote:
>>> Thus,
>>> ((unsigned char)((c) & 0xff)) and ((unsigned char)(c)) should produce
>>> the same results.
>>
>> If it's the case, it's probably optimized away by the compiler anyway.
> 
> Yes, the asm output is the same. I was more concerned about readability.
> 
> Historically, Py_CHARMASK started as ((c) & 0xff), hence the name. In r34974
> UCHAR_MAX = 2**CHAR_BIT-1 = 255 was enforced. In r61936 the cast was added.
> 
>>> There is no reason not to do the cast when __CHAR_UNSIGNED__ is
>>> defined (it will be a no-op).
>>
>> Agreed. It also reduces the opportunity for bugs :)
> 
> Ok, let's say we use ((unsigned char)((c) & 0xff)) also for __CHAR_UNSIGNED__.
> 
> What should the comment say about the intended argument? I've examined all
> occurrences in the tree and almost always Py_CHARMASK is used with either
> char arguments or int arguments that are directly derived from a char, so
> no EOF issues arise.

Why not just make the casts in those cases explicit instead of
using Py_CHARMASK ?

> Exceptions:
> ===========
> 
> Index: Objects/unicodeobject.c
> ===================================================================
> --- Objects/unicodeobject.c     (revision 82192)
> +++ Objects/unicodeobject.c     (working copy)
> @@ -8417,6 +8417,7 @@
>                  else if (c >= '0' && c <= '9') {
>                      prec = c - '0';
>                      while (--fmtcnt >= 0) {
> +                        /* XXX: c and *fmt are Py_UNICODE */
>                          c = Py_CHARMASK(*fmt++);

If both are Py_UNICODE, why do you need the cast at all ?

Note that the above use also appears to be wrong, since if *fmt
is Py_UNICODE, the following if() will also match if the higher
order bytes of the Py_UNICODE value are non-0.

>                          if (c < '0' || c > '9')
>                              break;
> 
> Index: Modules/_json.c
> ===================================================================
> --- Modules/_json.c     (revision 82192)
> +++ Modules/_json.c     (working copy)
> @@ -603,6 +603,7 @@
>              }
>          }
>          else {
> +            /* XXX: c is Py_UNICODE */
>              char c_char = Py_CHARMASK(c);

In this case you should get a compiler warning due to the
different signedness of the arguments.

Same comment as above: the high order bytes of c are lost
in this conversion.

Why can't _json.c use one of the existing Unicode conversion
APIs ?

>              chunk = PyString_FromStringAndSize(&c_char, 1);
>              if (chunk == NULL) {
> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue9036>
> _______________________________________
> _______________________________________________
> Python-bugs-list mailing list
> Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/mal%40egenix.com

History
Date	User	Action	Args
2010-06-24 15:25:05	lemburg	set	recipients: + lemburg, loewis, pitrou, eric.smith, skrah
2010-06-24 15:25:04	lemburg	link	issue9036 messages
2010-06-24 15:25:02	lemburg	create