This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients BreamoreBoy, ezio.melotti, kushal.das, loewis, serhiy.storchaka, thomaslee, vstinner
Date 2012-10-11.20:38:24
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CAMpsgwb-cGFdxLM8b_riX+4_ZU1U_4e2WhEtXJnkUwcpA91WbQ@mail.gmail.com>
In-reply-to <201210111136.10301.storchaka@gmail.com>
Content
> You can hybridize them. First just compare chars and if not match then use
> memcmp(). This speed up the case of repeated chars.

Oh, you're patch is simple and it's amazing fast! I compare unicode with
Python 2.7, 3.2, 3.4 and 3.4 patched, and bytes with 2.7. Using your patch,
Python 3.4 is the fastest implemented in most cases.

Common platform:
CPU model: Intel(R) Core(TM) i5 CPU 661 @ 3.33GHz
Bits: int=32, long=32, long long=64, pointer=32
Platform: Linux-3.2.0-31-generic-pae-i686-with-debian-wheezy-sid

Platform of campaign 2.7-bytes:
Python unicode implementation: UTF-16
Python version: 2.7.3+ (2.7:19d37c8d1882+, Oct 9 2012, 14:37:36) [GCC 4.6.3]
CFLAGS: -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall
-Wstrict-prototypes
SCM: hg revision=ad51ed93377c tag=tip branch=default date="2012-10-11 00:11
-0700"
Date: 2012-10-11 14:41:49

Platform of campaign 2.7-unicode:
Python unicode implementation: UTF-16
Python version: 2.7.3+ (2.7:19d37c8d1882+, Oct 9 2012, 14:37:36) [GCC 4.6.3]
CFLAGS: -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall
-Wstrict-prototypes
SCM: hg revision=ad51ed93377c tag=tip branch=default date="2012-10-11 00:11
-0700"
Date: 2012-10-11 14:42:55

Platform of campaign 3.2-wide:
Python unicode implementation: UCS-4
Python version: 3.2.3+ (3.2:f7615ee43318, Sep 27 2012, 15:00:15) [GCC 4.6.3]
CFLAGS: -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
SCM: hg revision=ad51ed93377c tag=tip branch=default date="2012-10-11 00:11
-0700"
Date: 2012-10-11 14:41:30

Platform of campaign 3.4:
Python unicode implementation: PEP 393
Python version: 3.4.0a0 (default:ad51ed93377c, Oct 11 2012, 14:40:51) [GCC
4.6.3]
CFLAGS: -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
SCM: hg revision=ad51ed93377c tag=tip branch=default date="2012-10-11 00:11
-0700"
Date: 2012-10-11 14:40:52

Platform of campaign 3.4-patch:
Date: 2012-10-11 14:40:25
Python version: 3.4.0a0 (default:ad51ed93377c+, Oct 11 2012, 14:33:04) [GCC
4.6.3]
CFLAGS: -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
SCM: hg revision=ad51ed93377c+ tag=tip branch=default date="2012-10-11
00:11 -0700"
Python unicode implementation: PEP 393

----------------+-----------------+-----------------+-----------------+-----------------+----------------
Tests | 2.7-bytes | 2.7-unicode | 3.2-wide | 3.4 | 3.4-patch
----------------+-----------------+-----------------+-----------------+-----------------+----------------
all | 7.83 ms (+552%) | 2.05 ms (+71%) | 3.45 ms (+188%) | 15 ms (+1152%) |
1.2 ms (*)
replace 50% | 4.14 ms (+135%) | 1.76 ms (*) | 3.17 ms (+81%) | 7.76 ms
(+342%) | 4.18 ms (+138%)
replace 10% | 1.21 ms (*) | 1.52 ms (+26%) | 3.01 ms (+150%) | 2.01 ms
(+67%) | 1.23 ms
replace 1% | 490 us | 1.55 ms (+217%) | 2.94 ms (+501%) | 589 us (+20%) |
489 us (*)
replace 2 chars | 398 us | 1.47 ms (+271%) | 2.89 ms (+632%) | 398 us | 395
us (*)
----------------+-----------------+-----------------+-----------------+-----------------+----------------
Total | 14.1 ms (+88%) | 8.34 ms (+11%) | 15.5 ms (+106%) | 25.8 ms (+244%)
| 7.49 ms (*)
----------------+-----------------+-----------------+-----------------+-----------------+----------------

**

Compare 3.2, 3.4 and 3.4 patched:

----------------+-------------+-----------------+---------------
Tests | 3.2-wide | 3.4 | 3.4-patch
----------------+-------------+-----------------+---------------
all | 3.45 ms (*) | 15 ms (+335%) | 1.2 ms (-65%)
replace 50% | 3.17 ms (*) | 7.76 ms (+145%) | 4.18 ms (+32%)
replace 10% | 3.01 ms (*) | 2.01 ms (-33%) | 1.23 ms (-59%)
replace 1% | 2.94 ms (*) | 589 us (-80%) | 489 us (-83%)
replace 2 chars | 2.89 ms (*) | 398 us (-86%) | 395 us (-86%)
----------------+-------------+-----------------+---------------
Total | 15.5 ms (*) | 25.8 ms (+67%) | 7.49 ms (-52%)
----------------+-------------+-----------------+---------------

The patch should be completed to optimize also other Unicode kinds.
History
Date User Action Args
2012-10-11 20:38:25vstinnersetrecipients: + vstinner, loewis, thomaslee, ezio.melotti, BreamoreBoy, serhiy.storchaka, kushal.das
2012-10-11 20:38:25vstinnerlinkissue16061 messages
2012-10-11 20:38:24vstinnercreate