msg142660 - (view) |
Author: John O'Connor (jcon) |
Date: 2011-08-22 02:58 |
bytearray() is using a less efficient implementation of split() than its similar friends bytes and unicode.
I added a couple extra checks to return early in {bytes,unicode} split().
- if length is 0
- if left strip removed all bytes
Looking at just how similar these 3 implementations are I feel that we could also unify/generalize them into one generic helper implementation and leave the object marshaling up to the type specific methods.
|
msg142882 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-08-24 13:32 |
+ if (len == 0) {
+ if (PyByteArray_CheckExact(self)) {
+ Py_INCREF(self);
+ return (PyObject *)self;
+ }
+ return PyByteArray_FromStringAndSize(NULL, 0);
+ }
This looks like a dubious micro-optimization. If len == 0,
all loops will exit early anyway (same for similar snippets in bytesobject.c and unicodeobject.c).
+ if (i == 0 && j == len && PyByteArray_CheckExact(self)) {
+ Py_INCREF(self);
+ return (PyObject *)self;
+ }
bytearray objects are mutable, so you can't return the same object without breaking expected semantics. e.g.:
>>> a = bytearray()
>>> b = a.strip()
>>> b.append(49)
>>> a
bytearray(b'')
>>> b
bytearray(b'1')
> diff -r b5ccdf7c032a Python/bltinmodule.c
> Binary file Python/bltinmodule.c has changed
Uh, what's this? bltinmodule.c shouldn't be considered a binary file.
It probably means you added a NUL byte in it by mistake.
> Looking at just how similar these 3 implementations are I feel that we
> could also unify/generalize them into one generic helper implementation
> and leave the object marshaling up to the type specific methods.
You could put it in Objects/stringlib.
|
msg142941 - (view) |
Author: John O'Connor (jcon) |
Date: 2011-08-24 22:53 |
You are right about those lines in particular; 'dubious' as you say.
Though, the point overall was that the general implementation was noticeably
faster (regardless of those smaller changes).
However, I do think that the single check for len == 0 saves time
particularly
from do_xstrip where we call _getbuffer().
I see. I must have been trigger happy.
> > diff -r b5ccdf7c032a Python/bltinmodule.c
> > Binary file Python/bltinmodule.c has changed
>
> Uh, what's this? bltinmodule.c shouldn't be considered a binary file.
> It probably means you added a NUL byte in it by mistake.
>
> I didn't touch that file at all. I'm not sure what that is. :-P
> Looking at just how similar these 3 implementations are I feel that we
> > could also unify/generalize them into one generic helper implementation
> > and leave the object marshaling up to the type specific methods.
>
> You could put it in Objects/stringlib.
>
I figured. I will when I am able to get to it.
- John
|
msg142942 - (view) |
Author: John O'Connor (jcon) |
Date: 2011-08-24 22:57 |
Antoine Pitrou <pitrou@free.fr> added the comment:
> This looks like a dubious micro-optimization. If len == 0,
> all loops will exit early anyway (same for similar snippets in bytesobject.c and unicodeobject.c).
You are right about those lines in particular; 'dubious' as you say.
Though, the point overall was that the general implementation was noticeably
faster (regardless of those smaller changes).
However, I do think that the single check for len == 0 saves time particularly
from do_xstrip where we call _getbuffer().
> bytearray objects are mutable, so you can't return the same object without breaking expected semantics. e.g.:
I see. I must have been trigger happy.
> diff -r b5ccdf7c032a Python/bltinmodule.c
> Binary file Python/bltinmodule.c has changed
> Uh, what's this? bltinmodule.c shouldn't be considered a binary file.
> It probably means you added a NUL byte in it by mistake.
I didn't touch that file at all. I'm not sure what that is. :-P
> You could put it in Objects/stringlib.
I figured. I will when I am able to get to it.
|
msg143822 - (view) |
Author: John O'Connor (jcon) |
Date: 2011-09-10 01:05 |
Moved {l,r,}strip to stringlib as well as the bloom filters from unicodeobject. I think it cleans things up nicely. Now there is one implementation for all 3 types. This patch also improves performance for bytearray and slightly for bytes. I see no delta for unicodeobject.
Added a test case for the mistake I made before (returning inc ref for mutable type).
grep -R couldn't find any other references to _PyUnicode_XStrip so I removed it. Though, someone with better knowledge of this should confirm.
|
msg144752 - (view) |
Author: John O'Connor (jcon) |
Date: 2011-10-01 22:41 |
The patch no longer applies cleanly. Is there enough interest in this to justify rebasing?
|
msg144871 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-10-04 10:40 |
> The patch no longer applies cleanly. Is there enough interest in this to justify rebasing?
Yes, I think it's worth it.
|
msg145379 - (view) |
Author: John O'Connor (jcon) |
Date: 2011-10-12 00:38 |
New patch.
Please double check the removal of _PyUnicode_XStrip.
|
msg145445 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-10-13 00:20 |
> Please double check the removal of _PyUnicode_XStrip.
I think it's ok. It is private, undocumented, and the only references
returned by Google Code Search are in Python itself.
I'll review the patch later, thank you.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:20 | admin | set | github: 57016 |
2015-04-14 20:21:27 | msoucy | set | nosy:
+ msoucy
|
2013-07-05 23:23:35 | christian.heimes | set | versions:
+ Python 3.4 |
2011-10-13 00:20:03 | pitrou | set | messages:
+ msg145445 title: Move strip() to stringlib -> Optimization/refactoring for {bytearray, bytes, unicode}.strip() |
2011-10-12 00:42:24 | jcon | set | title: Optimization/refactoring for {bytearray, bytes, unicode}.strip() -> Move strip() to stringlib |
2011-10-12 00:39:00 | jcon | set | files:
+ stringlib_strip2.patch
messages:
+ msg145379 |
2011-10-04 10:40:42 | pitrou | set | messages:
+ msg144871 title: Optimization/refactoring for {bytearray,bytes,unicode}.strip() -> Optimization/refactoring for {bytearray, bytes, unicode}.strip() |
2011-10-01 22:41:32 | jcon | set | messages:
+ msg144752 title: Optimizations for {bytearray,bytes,unicode}.strip() -> Optimization/refactoring for {bytearray,bytes,unicode}.strip() |
2011-09-10 01:06:30 | jcon | set | files:
- strip_perf.patch |
2011-09-10 01:05:41 | jcon | set | files:
+ stringlib_strip.patch
messages:
+ msg143822 |
2011-08-24 22:57:39 | jcon | set | messages:
+ msg142942 |
2011-08-24 22:56:08 | jcon | set | files:
- unnamed |
2011-08-24 22:53:52 | jcon | set | files:
+ unnamed
messages:
+ msg142941 |
2011-08-24 13:32:12 | pitrou | set | messages:
+ msg142882 |
2011-08-22 02:59:00 | jcon | create | |