This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: PEP 3137: make bytesobject.c methods
Type: Stage:
Components: Versions: Python 3.0
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: gregory.p.smith, gvanrossum
Priority: normal Keywords: patch

Created on 2007-10-11 08:45 by gregory.p.smith, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bytes-pep3137-methods-04b.diff.txt gregory.p.smith, 2007-10-16 01:05
Messages (12)
msg56341 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2007-10-11 08:45
This makes all existing bytesobject.c methods use the buffer API rather
than explicitly requiring bytes objects as input.  It also fixes input
to append() and remove() that was not strict enough and improves a few
unit tests in that area.

NOTE: this patch likely depends on http://bugs.python.org/issue1260
removing the buffer API from the unicode type in order for all unit
tests to pass (i only tested it with that applied since thats where
we're headed).
msg56351 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2007-10-12 00:56
Patch updated.  It now implements the is*() methods for PyBytes.  It
moves common code into a shared bytes_ctype.c and .h file so that
stringobject.c and bytesobject.c can share as much as possible.  Unit
tests are updated as needed for new method coverage and any behavior
changes.

still TODO: adding the missing methods (listed in a comment in the
patch).  I'm working on it.
msg56352 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-12 02:42
> Patch updated.  It now implements the is*() methods for PyBytes.  It
> moves common code into a shared bytes_ctype.c and .h file so that
> stringobject.c and bytesobject.c can share as much as possible.

Did you move this into the stringlib subdirectory? That's more for
sharing between PyString and PyUnicode, but I think there are more
opportunities for sharing still, and PyString/PyBytes sharing makes
sense separately.
msg56354 - (view) Author: Gregory P. Smith (gps) Date: 2007-10-12 03:20
> > Patch updated.  It now implements the is*() methods for PyBytes.  It
> > moves common code into a shared bytes_ctype.c and .h file so that
> > stringobject.c and bytesobject.c can share as much as possible.
>
> Did you move this into the stringlib subdirectory? That's more for
> sharing between PyString and PyUnicode, but I think there are more
> opportunities for sharing still, and PyString/PyBytes sharing makes
> sense separately.

Good idea, I haven't done that yet. At the moment it lives in
Include/bytes_ctype.h and Object/bytes_ctype.c directly.  stringlib is a
good place for it and is something I pondered but hadn't gotten to.  I'll do
that as I implement the remaining missing PyBytes_ methods to be in the next
update to this patch.

-gps
msg56466 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-15 22:02
Is it worth my time to review this yet?
msg56468 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2007-10-15 22:36
here's the latest update.  this takes care of all methods listed in the
PEP i believe.  please review.

i'm currently working porting some test case code for more methods from
string_tests.py to buffer_tests.py to be shared between PyString and
PyBytes similar to the ones already in the patch that way.
msg56474 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2007-10-15 23:48
Same diff 03 here but it also adds unit test coverage for the new
PyBytes buffer methods.
msg56477 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-16 00:19
Very impressive!

(Apologies if these lines are occasionally out of order.)

+extern PyObject* _bytes_isspace(const char *cptr, const Py_ssize_t len);

IMO all these functions should have names starting with _Py or _Py_,
since they are visible to the linker.

Also, why 'const Py_ssize_t'? It's a scalar!

+/* these store their len sized answer in the given preallocated *result
arg */

Mind using a Capital first letter?

+extern const char isspace__doc__[];

This needs a _Py or _Py_ prefix too.

+extern const unsigned int ctype_table[256];

Now this is no longer static, it also needs a _Py or _Py_ prefix.

+extern const unsigned char ctype_tolower[256];
+extern const unsigned char ctype_toupper[256];

Ditto (the macros are fine though, since they are only visible to code
that #includes this file, which is only a few files).

+Return True if all characters in S are whitespace\n\

Shouldn't that be bytes instead of characters (and consistently
throughout)?  And probably use B for the variable name instead of S.

+/* ------- GPS --------- */

What's this?  Your initials? :-)  I don't think it's needed.  I'm
guessing you 

 /* The nullbytes are used by the stringlib during partition.
  * If partition is removed from bytes, nullbytes and its helper

This comment block refers to something that ain't gonna happen
(partition being removed).  IMO the whole comment block can go, it's a
red herring.

+                /* XXX: this depends on a signed integer overflow to < 0 */
+                /* C compilers, including gcc, do -NOT- guarantee this. */

(And repeated twice more.)  Wouldn't it be better to write this in a way
that doesn't require this XXX comment?  ISTR that we already had one
area where we had to fight with gcc because it had proved to itself that
something could never be negative, even though it could be due to
overflow.  The GCC authors won. :-(

+/* TODO(gps):

Don't you mean XXX(gps)? :-)

+ * These methods still need implementing (porting over from
stringobject.c):
+ *
+ * IN PROGRESS:
+ * .capitalize(), .title(), .upper(), .lower(), .center(), .zfill(),
+ * .expandtabs(), .ljust(), .rjust(), .swapcase(), .splitlines(), 
+ */
+

Hmmm... Aren't these done now?

+    /* XXX(gps): the docstring above says any iterable int will do but the
+     * bytes_setslice code only accepts something supporting PEP 3118.
+     * Is a list or tuple of 0 <= ints <= 255 also supposed to work? */

Yes, it is, and it's a bug that it currently doesn't.
msg56479 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2007-10-16 00:53
Ok, I believe I've fixed everything you commented on, accidentally
included in initials used as a search placeholder included. :)

> +                /* XXX: this depends on a signed integer overflow to
< >0 */
> +                /* C compilers, including gcc, do -NOT- guarantee
>this. */

> (And repeated twice more.)  Wouldn't it be better to write this in a way
> that doesn't require this XXX comment?  ISTR that we already had one
> area where we had to fight with gcc because it had proved to itself that
> something could never be negative, even though it could be due to
> overflow.  The GCC authors won. :-(

This code was copied as is from stringobject.c, I just added the XXX
notes upon reading it.  There is an existing unittest to try and make
sure the code has compiled properly with these tests (its not
comprehensive, it'll only test one of these three test cases). 
Regardless, yes, it needs fixing.  I didn't want to take that on that as
part of this patch.
msg56480 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2007-10-16 01:05
04b is the same as 04, i just fixed the docstrings that i had missed in
stringlib/transmogrify.h to use 'B' instead of 'S' and say they return a
"modified copy of B" instead of a "string"

wording could probably be improved further if anyone has any ideas.
msg56487 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-16 03:27
Good! Check it in before I change my mind. :-)

The words can be tweaked later.

> 04b is the same as 04, i just fixed the docstrings that i had missed in
> stringlib/transmogrify.h to use 'B' instead of 'S' and say they return a
> "modified copy of B" instead of a "string"
>
> wording could probably be improved further if anyone has any ideas.
msg56495 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2007-10-16 06:33
Committed revision 58493
History
Date User Action Args
2022-04-11 14:56:27adminsetgithub: 45602
2008-01-06 22:29:45adminsetkeywords: - py3k
versions: Python 3.0
2007-10-16 06:33:39gregory.p.smithsetfiles: - bytes-pep3137-methods-04.diff.txt
2007-10-16 06:33:22gregory.p.smithsetstatus: open -> closed
resolution: accepted
messages: + msg56495
2007-10-16 03:27:03gvanrossumsetmessages: + msg56487
2007-10-16 01:05:07gregory.p.smithsetfiles: + bytes-pep3137-methods-04b.diff.txt
messages: + msg56480
2007-10-16 00:53:53gregory.p.smithsetfiles: - bytes-pep3137-methods-03-with-tests.diff.txt
2007-10-16 00:53:50gregory.p.smithsetfiles: - bytes-pep3137-methods-03.diff.txt
2007-10-16 00:53:39gregory.p.smithsetfiles: + bytes-pep3137-methods-04.diff.txt
messages: + msg56479
2007-10-16 00:19:34gvanrossumsetmessages: + msg56477
2007-10-15 23:49:00gregory.p.smithsetfiles: + bytes-pep3137-methods-03-with-tests.diff.txt
messages: + msg56474
2007-10-15 22:36:36gregory.p.smithsetfiles: - bytes-pep3137-methods-02.diff.txt
2007-10-15 22:36:28gregory.p.smithsetfiles: + bytes-pep3137-methods-03.diff.txt
messages: + msg56468
2007-10-15 22:02:16gvanrossumsetmessages: + msg56466
2007-10-12 03:35:50gregory.p.smithsetnosy: - gps
2007-10-12 03:21:41gregory.p.smithsetfiles: - unnamed
2007-10-12 03:20:59gpssetfiles: + unnamed
nosy: + gps
messages: + msg56354
2007-10-12 02:42:33gvanrossumsetmessages: + msg56352
2007-10-12 00:57:05gregory.p.smithsetfiles: - bytes-methods-use-buffer-api-01.diff.txt
2007-10-12 00:56:55gregory.p.smithsetfiles: + bytes-pep3137-methods-02.diff.txt
messages: + msg56351
title: PEP 3137: make bytesobject.c methods use PEP 3118 buffer API -> PEP 3137: make bytesobject.c methods
2007-10-11 16:52:28gvanrossumsetassignee: gvanrossum
nosy: + gvanrossum
2007-10-11 08:45:43gregory.p.smithcreate