classification
Title: Faster unicode-escape and raw-unicode-escape codecs
Type: performance Stage: patch review
Components: Interpreter Core, Unicode Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, ezio.melotti, haypo, lemburg, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: 3.3regression, patch

Created on 2012-10-26 22:48 by serhiy.storchaka, last changed 2016-09-07 14:19 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
faster_unicode_escape.patch serhiy.storchaka, 2012-10-26 22:48 review
faster_unicode_escape_4.patch serhiy.storchaka, 2016-06-19 20:10 review
faster_unicode_escape_5.patch haypo, 2016-09-02 12:09 review
Messages (11)
msg173901 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-26 22:48
The proposed patch optimizes unicode-escape and raw-unicode-escape codecs.  Coders still slower than in 3.2, but much faster than in 3.3.  Further speedup is possible with the use of stringlib, but I think that this is enough.  The code unified and simplified (251 insertions, 345 deletions).

Benchmark results (on AMD Athlon 64 X2 4600+):

Py2.7        Py3.2        Py3.3        Py3.4+patch

193 (+11%)   325 (-34%)   66 (+224%)   214    decode  unicode-escape  'A'*10000
138 (+72%)   241 (-1%)    154 (+55%)   238    decode  unicode-escape  '\x80'*10000
193 (+10%)   323 (-34%)   72 (+194%)   212    decode  unicode-escape    '\x80'+'A'*9999
160 (+59%)   273 (-7%)    169 (+51%)   255    decode  unicode-escape  '\u0100'*10000
193 (-7%)    324 (-44%)   61 (+195%)   180    decode  unicode-escape    '\u0100'+'A'*9999
138 (+67%)   242 (-5%)    135 (+71%)   231    decode  unicode-escape    '\u0100'+'\x80'*9999
160 (+59%)   274 (-7%)    169 (+51%)   255    decode  unicode-escape  '\u8000'*10000
193 (-7%)    323 (-44%)   61 (+195%)   180    decode  unicode-escape    '\u8000'+'A'*9999
138 (+67%)   242 (-5%)    135 (+71%)   231    decode  unicode-escape    '\u8000'+'\x80'*9999
160 (+60%)   276 (-7%)    169 (+51%)   256    decode  unicode-escape    '\u8000'+'\u0100'*9999
178 (+42%)   275 (-8%)    177 (+43%)   253    decode  unicode-escape  '\U00010000'*10000
192 (+30%)   323 (-23%)   61 (+310%)   250    decode  unicode-escape    '\U00010000'+'A'*9999
139 (+35%)   243 (-23%)   119 (+57%)   187    decode  unicode-escape    '\U00010000'+'\x80'*9999
161 (+38%)   273 (-19%)   150 (+48%)   222    decode  unicode-escape    '\U00010000'+'\u0100'*9999
161 (+38%)   273 (-19%)   150 (+48%)   222    decode  unicode-escape    '\U00010000'+'\u8000'*9999

558 (-62%)   427 (-50%)   82 (+161%)   214    decode  raw-unicode-escape  'A'*10000
560 (-62%)   425 (-50%)   75 (+183%)   212    decode  raw-unicode-escape  '\x80'*10000
558 (-62%)   425 (-50%)   75 (+183%)   212    decode  raw-unicode-escape    '\x80'+'A'*9999
178 (+75%)   235 (+32%)   108 (+188%)  311    decode  raw-unicode-escape  '\u0100'*10000
555 (-62%)   424 (-50%)   61 (+248%)   212    decode  raw-unicode-escape    '\u0100'+'A'*9999
559 (-62%)   424 (-50%)   61 (+248%)   212    decode  raw-unicode-escape    '\u0100'+'\x80'*9999
179 (+74%)   235 (+32%)   108 (+188%)  311    decode  raw-unicode-escape  '\u8000'*10000
555 (-62%)   424 (-50%)   61 (+248%)   212    decode  raw-unicode-escape    '\u8000'+'A'*9999
558 (-62%)   425 (-50%)   61 (+248%)   212    decode  raw-unicode-escape    '\u8000'+'\x80'*9999
178 (+75%)   235 (+32%)   108 (+188%)  311    decode  raw-unicode-escape    '\u8000'+'\u0100'*9999
200 (+18%)   249 (-5%)    132 (+79%)   236    decode  raw-unicode-escape  '\U00010000'*10000
554 (-58%)   423 (-46%)   61 (+277%)   230    decode  raw-unicode-escape    '\U00010000'+'A'*9999
558 (-59%)   424 (-46%)   61 (+277%)   230    decode  raw-unicode-escape    '\U00010000'+'\x80'*9999
178 (+46%)   235 (+11%)   100 (+160%)  260    decode  raw-unicode-escape    '\U00010000'+'\u0100'*9999
178 (+44%)   235 (+9%)    100 (+157%)  257    decode  raw-unicode-escape    '\U00010000'+'\u8000'*9999


182 (+137%)  215 (+101%)  148 (+192%)  432    encode  unicode-escape  'A'*10000
582 (-10%)   617 (-16%)   470 (+11%)   521    encode  unicode-escape  '\x80'*10000
182 (+131%)  215 (+96%)   148 (+184%)  421    encode  unicode-escape    '\x80'+'A'*9999
624 (-7%)    967 (-40%)   558 (+4%)    579    encode  unicode-escape  '\u0100'*10000
183 (-19%)   215 (-31%)   132 (+12%)   148    encode  unicode-escape    '\u0100'+'A'*9999
584 (-23%)   617 (-27%)   464 (-3%)    451    encode  unicode-escape    '\u0100'+'\x80'*9999
627 (-8%)    968 (-40%)   557 (+4%)    579    encode  unicode-escape  '\u8000'*10000
183 (-19%)   215 (-31%)   148 (+0%)    148    encode  unicode-escape    '\u8000'+'A'*9999
584 (-23%)   617 (-27%)   490 (-8%)    451    encode  unicode-escape    '\u8000'+'\x80'*9999
629 (-8%)    969 (-40%)   555 (+4%)    578    encode  unicode-escape    '\u8000'+'\u0100'*9999
931 (-39%)   939 (-39%)   602 (-5%)    572    encode  unicode-escape  '\U00010000'*10000
183 (+7%)    215 (-9%)    180 (+9%)    196    encode  unicode-escape    '\U00010000'+'A'*9999
584 (-9%)    617 (-13%)   482 (+11%)   534    encode  unicode-escape    '\U00010000'+'\x80'*9999
630 (-14%)   962 (-43%)   565 (-4%)    544    encode  unicode-escape    '\U00010000'+'\u0100'*9999
630 (-14%)   964 (-44%)   565 (-4%)    544    encode  unicode-escape    '\U00010000'+'\u8000'*9999

332 (+1459%) 330 (+1468%) 333 (+1454%) 5175   encode  raw-unicode-escape  'A'*10000
332 (+1589%) 329 (+1604%) 333 (+1584%) 5607   encode  raw-unicode-escape  '\x80'*10000
336 (+1569%) 334 (+1579%) 333 (+1584%) 5607   encode  raw-unicode-escape    '\x80'+'A'*9999
904 (-38%)   911 (-39%)   557 (+0%)    558    encode  raw-unicode-escape  '\u0100'*10000
336 (+15%)   335 (+16%)   197 (+97%)   388    encode  raw-unicode-escape    '\u0100'+'A'*9999
335 (+16%)   335 (+16%)   197 (+97%)   388    encode  raw-unicode-escape    '\u0100'+'\x80'*9999
904 (-38%)   913 (-39%)   557 (+0%)    558    encode  raw-unicode-escape  '\u8000'*10000
335 (+16%)   335 (+16%)   197 (+96%)   387    encode  raw-unicode-escape    '\u8000'+'A'*9999
335 (+16%)   335 (+16%)   196 (+98%)   388    encode  raw-unicode-escape    '\u8000'+'\x80'*9999
912 (-39%)   909 (-39%)   554 (+1%)    558    encode  raw-unicode-escape    '\u8000'+'\u0100'*9999
966 (-40%)   997 (-42%)   584 (-0%)    583    encode  raw-unicode-escape  '\U00010000'*10000
336 (-42%)   335 (-41%)   213 (-8%)    196    encode  raw-unicode-escape    '\U00010000'+'A'*9999
336 (-42%)   335 (-41%)   213 (-8%)    196    encode  raw-unicode-escape    '\U00010000'+'\x80'*9999
911 (-43%)   911 (-43%)   570 (-8%)    522    encode  raw-unicode-escape    '\U00010000'+'\u0100'*9999
911 (-43%)   913 (-43%)   570 (-8%)    522    encode  raw-unicode-escape    '\U00010000'+'\u8000'*9999
msg185869 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-04-03 00:27
unicode-escape and raw-unicode-escape decoders now use the PyUnicodeWriter API. Can you please compare performances of your patch to PyUnicodeWriter API? Decoders overallocate the buffer.

According to a comment in the decoder, overallocating is never needed (and will be slower). Your patch does not overallocate the buffer. The decoder should probably be adjusted to disable overallocation.

Can you please update your patch on the encoder to the last development version?
msg268866 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-19 20:10
Victor's patch harvested most fruits, but there is a place for further optimization.

Benchmark results for new patch:

Py3.2        Py3.3        Py3.6        Py3.6+patch

451 (-47%)   77 (+209%)   140 (+70%)   238    decode  unicode-escape  'A'*10000
269 (-14%)   161 (+44%)   187 (+24%)   232    decode  unicode-escape  '\x80'*10000
453 (-48%)   85 (+178%)   181 (+30%)   236    decode  unicode-escape    '\x80'+'A'*9999
295 (-4%)    185 (+54%)   229 (+24%)   284    decode  unicode-escape  '\u0100'*10000
452 (-47%)   75 (+221%)   213 (+13%)   241    decode  unicode-escape    '\u0100'+'A'*9999
275 (-11%)   149 (+64%)   187 (+30%)   244    decode  unicode-escape    '\u0100'+'\x80'*9999
297 (-4%)    185 (+54%)   230 (+23%)   284    decode  unicode-escape  '\u8000'*10000
452 (-47%)   75 (+221%)   213 (+13%)   241    decode  unicode-escape    '\u8000'+'A'*9999
275 (-11%)   149 (+64%)   187 (+30%)   244    decode  unicode-escape    '\u8000'+'\x80'*9999
295 (-3%)    185 (+54%)   230 (+24%)   285    decode  unicode-escape    '\u8000'+'\u0100'*9999
318 (-29%)   203 (+11%)   220 (+2%)    225    decode  unicode-escape  '\U00010000'*10000
452 (-51%)   72 (+207%)   163 (+36%)   221    decode  unicode-escape    '\U00010000'+'A'*9999
275 (-31%)   128 (+49%)   160 (+19%)   191    decode  unicode-escape    '\U00010000'+'\x80'*9999
295 (-36%)   164 (+16%)   201 (-5%)    190    decode  unicode-escape    '\U00010000'+'\u0100'*9999
297 (-36%)   166 (+14%)   199 (-5%)    190    decode  unicode-escape    '\U00010000'+'\u8000'*9999

559 (-62%)   88 (+143%)   194 (+10%)   214    decode  raw-unicode-escape  'A'*10000
555 (-62%)   88 (+142%)   195 (+9%)    213    decode  raw-unicode-escape  '\x80'*10000
559 (-62%)   88 (+142%)   195 (+9%)    213    decode  raw-unicode-escape    '\x80'+'A'*9999
265 (+29%)   133 (+156%)  212 (+61%)   341    decode  raw-unicode-escape  '\u0100'*10000
563 (-54%)   77 (+235%)   195 (+32%)   258    decode  raw-unicode-escape    '\u0100'+'A'*9999
559 (-54%)   77 (+234%)   194 (+32%)   257    decode  raw-unicode-escape    '\u0100'+'\x80'*9999
269 (+27%)   138 (+147%)  208 (+64%)   341    decode  raw-unicode-escape  '\u8000'*10000
562 (-54%)   77 (+235%)   193 (+34%)   258    decode  raw-unicode-escape    '\u8000'+'A'*9999
559 (-54%)   77 (+234%)   194 (+32%)   257    decode  raw-unicode-escape    '\u8000'+'\x80'*9999
265 (+29%)   138 (+147%)  208 (+64%)   341    decode  raw-unicode-escape    '\u8000'+'\u0100'*9999
281 (-13%)   152 (+61%)   228 (+7%)    244    decode  raw-unicode-escape  '\U00010000'*10000
562 (-65%)   74 (+164%)   200 (-2%)    195    decode  raw-unicode-escape    '\U00010000'+'A'*9999
557 (-65%)   74 (+162%)   200 (-3%)    194    decode  raw-unicode-escape    '\U00010000'+'\x80'*9999
265 (-2%)    122 (+114%)  184 (+42%)   261    decode  raw-unicode-escape    '\U00010000'+'\u0100'*9999
269 (-3%)    122 (+113%)  185 (+41%)   260    decode  raw-unicode-escape    '\U00010000'+'\u8000'*9999


195 (+136%)  109 (+323%)  258 (+79%)   461    encode  unicode-escape  'A'*10000
673 (-23%)   522 (-1%)    254 (+103%)  516    encode  unicode-escape  '\x80'*10000
197 (+134%)  132 (+248%)  247 (+86%)   460    encode  unicode-escape    '\x80'+'A'*9999
869 (-22%)   627 (+9%)    333 (+105%)  682    encode  unicode-escape  '\u0100'*10000
197 (-19%)   124 (+28%)   158 (+1%)    159    encode  unicode-escape    '\u0100'+'A'*9999
669 (-35%)   493 (-12%)   236 (+83%)   432    encode  unicode-escape    '\u0100'+'\x80'*9999
866 (-20%)   628 (+10%)   333 (+108%)  692    encode  unicode-escape  '\u8000'*10000
197 (-19%)   125 (+27%)   158 (+1%)    159    encode  unicode-escape    '\u8000'+'A'*9999
669 (-35%)   492 (-12%)   236 (+83%)   433    encode  unicode-escape    '\u8000'+'\x80'*9999
869 (-20%)   627 (+11%)   324 (+114%)  694    encode  unicode-escape    '\u8000'+'\u0100'*9999
870 (-1%)    897 (-4%)    501 (+72%)   861    encode  unicode-escape  '\U00010000'*10000
197 (+20%)   139 (+70%)   234 (+1%)    236    encode  unicode-escape    '\U00010000'+'A'*9999
668 (-27%)   533 (-9%)    249 (+96%)   487    encode  unicode-escape    '\U00010000'+'\x80'*9999
869 (-12%)   646 (+18%)   344 (+122%)  764    encode  unicode-escape    '\U00010000'+'\u0100'*9999
864 (-12%)   643 (+19%)   344 (+122%)  762    encode  unicode-escape    '\U00010000'+'\u8000'*9999

391 (+1310%) 333 (+1556%) 575 (+859%)  5514   encode  raw-unicode-escape  'A'*10000
391 (+1229%) 334 (+1456%) 576 (+802%)  5198   encode  raw-unicode-escape  '\x80'*10000
391 (+1402%) 335 (+1653%) 579 (+914%)  5873   encode  raw-unicode-escape    '\x80'+'A'*9999
869 (-25%)   687 (-5%)    356 (+83%)   652    encode  raw-unicode-escape  '\u0100'*10000
391 (+46%)   158 (+260%)  214 (+166%)  569    encode  raw-unicode-escape    '\u0100'+'A'*9999
391 (+46%)   158 (+260%)  214 (+166%)  569    encode  raw-unicode-escape    '\u0100'+'\x80'*9999
873 (-25%)   682 (-4%)    356 (+83%)   652    encode  raw-unicode-escape  '\u8000'*10000
391 (+46%)   158 (+260%)  214 (+166%)  569    encode  raw-unicode-escape    '\u8000'+'A'*9999
391 (+46%)   157 (+262%)  214 (+166%)  569    encode  raw-unicode-escape    '\u8000'+'\x80'*9999
869 (-25%)   688 (-5%)    345 (+90%)   656    encode  raw-unicode-escape    '\u8000'+'\u0100'*9999
917 (+4%)    859 (+11%)   532 (+79%)   952    encode  raw-unicode-escape  '\U00010000'*10000
392 (-15%)   182 (+84%)   260 (+28%)   334    encode  raw-unicode-escape    '\U00010000'+'A'*9999
392 (-15%)   182 (+83%)   260 (+28%)   333    encode  raw-unicode-escape    '\U00010000'+'\x80'*9999
870 (-15%)   672 (+10%)   355 (+108%)  738    encode  raw-unicode-escape    '\U00010000'+'\u0100'*9999
871 (-16%)   672 (+9%)    355 (+106%)  730    encode  raw-unicode-escape    '\U00010000'+'\u8000'*9999
msg274233 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-09-02 11:38
Unicode escape encodecs were modified by the issue #25353 to use the _PyBytesWriter API. Sadly, I didn't benchmark my change before pushing it :-/

Your patch basically reverts my change.

> Py3.2        Py3.3        Py3.6        Py3.6+patch
> 195 (+136%)  109 (+323%)  258 (+79%)   461    encode  unicode-escape  'A'*10000
> 391 (+1310%) 333 (+1556%) 575 (+859%)  5514   encode  raw-unicode-escape  'A'*10000

I'm surprised that the revert makes raw-unicode-escape encoder so much faster. Does it mean that the _PyBytesWriter API is so inefficient?

The most efficient case for _PyBytesWriter is when you only call _PyBytesWriter_Alloc() and _PyBytesWriter_Finish() and the output string has exactly the allocated length. It should be the case when 'A'*10000 is encoded, no?
msg274236 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-09-02 12:09
I rebased faster_unicode_escape_4.patch and made tiny changes:

* Rename WRITECHAR macro to WRITE_ASCII_CHAR()
* Add WRITE_CHAR() macro to avoid "goto writechar;"
* Drop the "store" label: use WRITE_CHAR() macro instead, expect that getcode() only returns valid unicode characters (<= MAX_UNICODE)
* For \UHHHHHHHH format: since MAX_UNICODE is 0x10ffff, hardcode the first two digits as 0, and add an assertion on MAX_UNICODE value
* PEP 7: add {...} on if/else blocks
msg274238 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-02 14:27
> Unicode escape encodecs were modified by the issue #25353 to use the
> _PyBytesWriter API. Sadly, I didn't benchmark my change before pushing it
> :-/

You can benchmark it now by checking out revisions with your patch and just 
before. But AFAIK the performance was not changed since 3.3 and the effect of 
your patch is the difference between columns 3.3 and 3.6 (very good).

I used scripts from https://bitbucket.org/storchaka/cpython-stuff/src/default/
bench/ .

> Your patch basically reverts my change.
> 
> > Py3.2        Py3.3        Py3.6        Py3.6+patch
> > 195 (+136%)  109 (+323%)  258 (+79%)   461    encode  unicode-escape 
> > 'A'*10000 391 (+1310%) 333 (+1556%) 575 (+859%)  5514   encode 
> > raw-unicode-escape  'A'*10000

> I'm surprised that the revert makes raw-unicode-escape encoder so much
> faster. Does it mean that the _PyBytesWriter API is so inefficient?

I don't remember all details, but it seems that after applying all 
optimizations _PyBytesWriter became just not needed (unlike to 
_PyUnicodeWriter that is used for widening a buffer).

The awesome difference in encoding for ascii-only data is not related to using 
_PyBytesWriter. It is caused by reordering checks in the inner loop.

> * Rename WRITECHAR macro to WRITE_ASCII_CHAR()

This is not correct name. This macro is used for writing non-ascii characters 
too.

> * Add WRITE_CHAR() macro to avoid "goto writechar;"
> * Drop the "store" label: use WRITE_CHAR() macro instead,

Did you benchmark this change? I afraid that this inflates execution code size 
and can have negative impact on the performance.
msg274239 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-09-02 14:33
> Did you benchmark this change? I afraid that this inflates execution code size and can have negative impact on the performance.

I consider that readability (maintainability) matters more than such micro optimization.
msg274681 - (view) Author: Roundup Robot (python-dev) Date: 2016-09-07 00:08
New changeset ad5a28ace615 by Victor Stinner in branch 'default':
Optimize unicode_escape and raw_unicode_escape
https://hg.python.org/cpython/rev/ad5a28ace615
msg274682 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-09-07 00:09
Since it's almost the 3.6 beta 1, I chose to push the change right now. I'm sure that it's faster, I trust your benchmarks ;-)

Thanks Serhiy for this nice enhancement.


> > * Rename WRITECHAR macro to WRITE_ASCII_CHAR()

> This is not correct name. This macro is used for writing non-ascii characters too.

Oh, I fixed this in the pushed change.
msg274780 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-07 08:51
Thanks Victor! I benchmarked your patch. There is no regression in comparison with my patch. In few cases your patch is even faster!

Unpatched    Patch v.4    Patch v.5

148 (+76%)   235 (+11%)   260    decode  unicode-escape  'A'*10000
197 (+30%)   257 (+0%)    257    decode  unicode-escape  '\x80'*10000
195 (+32%)   232 (+11%)   258    decode  unicode-escape    '\x80'+'A'*9999
227 (+39%)   308 (+2%)    315    decode  unicode-escape  '\u0100'*10000
197 (+56%)   241 (+27%)   307    decode  unicode-escape    '\u0100'+'A'*9999
201 (+35%)   264 (+3%)    271    decode  unicode-escape    '\u0100'+'\x80'*9999
227 (+39%)   308 (+2%)    315    decode  unicode-escape  '\u8000'*10000
197 (+56%)   241 (+27%)   307    decode  unicode-escape    '\u8000'+'A'*9999
201 (+35%)   264 (+3%)    271    decode  unicode-escape    '\u8000'+'\x80'*9999
227 (+39%)   308 (+2%)    315    decode  unicode-escape    '\u8000'+'\u0100'*9999
200 (+26%)   245 (+2%)    251    decode  unicode-escape  '\U00010000'*10000
192 (+38%)   230 (+15%)   265    decode  unicode-escape    '\U00010000'+'A'*9999
167 (+26%)   203 (+4%)    211    decode  unicode-escape    '\U00010000'+'\x80'*9999
194 (+31%)   248 (+2%)    254    decode  unicode-escape    '\U00010000'+'\u0100'*9999
194 (+31%)   247 (+3%)    254    decode  unicode-escape    '\U00010000'+'\u8000'*9999

197 (+9%)    214 (+0%)    215    decode  raw-unicode-escape  'A'*10000
197 (+9%)    214 (+0%)    214    decode  raw-unicode-escape  '\x80'*10000
197 (+9%)    214 (+0%)    214    decode  raw-unicode-escape    '\x80'+'A'*9999
216 (+68%)   365 (-1%)    363    decode  raw-unicode-escape  '\u0100'*10000
181 (+43%)   262 (-1%)    259    decode  raw-unicode-escape    '\u0100'+'A'*9999
181 (+43%)   264 (-2%)    258    decode  raw-unicode-escape    '\u0100'+'\x80'*9999
216 (+68%)   365 (-1%)    363    decode  raw-unicode-escape  '\u8000'*10000
181 (+43%)   261 (-1%)    259    decode  raw-unicode-escape    '\u8000'+'A'*9999
181 (+43%)   263 (-2%)    258    decode  raw-unicode-escape    '\u8000'+'\x80'*9999
216 (+68%)   365 (-1%)    363    decode  raw-unicode-escape    '\u8000'+'\u0100'*9999
245 (+29%)   313 (+1%)    315    decode  raw-unicode-escape  '\U00010000'*10000
211 (+10%)   195 (+19%)   232    decode  raw-unicode-escape    '\U00010000'+'A'*9999
211 (+10%)   195 (+19%)   233    decode  raw-unicode-escape    '\U00010000'+'\x80'*9999
192 (+51%)   287 (+1%)    289    decode  raw-unicode-escape    '\U00010000'+'\u0100'*9999
192 (+51%)   287 (+1%)    289    decode  raw-unicode-escape    '\U00010000'+'\u8000'*9999


269 (+73%)   424 (+10%)   465    encode  unicode-escape  'A'*10000
266 (+108%)  591 (-6%)    553    encode  unicode-escape  '\x80'*10000
298 (+55%)   423 (+9%)    463    encode  unicode-escape    '\x80'+'A'*9999
358 (+93%)   695 (-0%)    692    encode  unicode-escape  '\u0100'*10000
190 (+13%)   215 (+0%)    215    encode  unicode-escape    '\u0100'+'A'*9999
235 (+109%)  520 (-5%)    492    encode  unicode-escape    '\u0100'+'\x80'*9999
342 (+102%)  695 (-1%)    691    encode  unicode-escape  '\u8000'*10000
190 (+13%)   215 (+0%)    215    encode  unicode-escape    '\u8000'+'A'*9999
235 (+109%)  520 (-5%)    492    encode  unicode-escape    '\u8000'+'\x80'*9999
367 (+89%)   698 (-1%)    694    encode  unicode-escape    '\u8000'+'\u0100'*9999
531 (+124%)  915 (+30%)   1190   encode  unicode-escape  '\U00010000'*10000
196 (+20%)   235 (+0%)    236    encode  unicode-escape    '\U00010000'+'A'*9999
237 (+104%)  506 (-4%)    484    encode  unicode-escape    '\U00010000'+'\x80'*9999
325 (+111%)  681 (+1%)    687    encode  unicode-escape    '\U00010000'+'\u0100'*9999
325 (+117%)  681 (+3%)    704    encode  unicode-escape    '\U00010000'+'\u8000'*9999

578 (+853%)  5672 (-3%)   5507   encode  raw-unicode-escape  'A'*10000
578 (+731%)  4761 (+1%)   4806   encode  raw-unicode-escape  '\x80'*10000
581 (+760%)  5218 (-4%)   4995   encode  raw-unicode-escape    '\x80'+'A'*9999
365 (+96%)   714 (+0%)    714    encode  raw-unicode-escape  '\u0100'*10000
226 (+72%)   389 (+0%)    389    encode  raw-unicode-escape    '\u0100'+'A'*9999
226 (+72%)   389 (+0%)    389    encode  raw-unicode-escape    '\u0100'+'\x80'*9999
373 (+91%)   715 (-0%)    714    encode  raw-unicode-escape  '\u8000'*10000
226 (+72%)   389 (+0%)    389    encode  raw-unicode-escape    '\u8000'+'A'*9999
226 (+72%)   389 (+0%)    389    encode  raw-unicode-escape    '\u8000'+'\x80'*9999
366 (+96%)   718 (+0%)    719    encode  raw-unicode-escape    '\u8000'+'\u0100'*9999
537 (+110%)  879 (+28%)   1128   encode  raw-unicode-escape  '\U00010000'*10000
214 (+37%)   293 (+0%)    294    encode  raw-unicode-escape    '\U00010000'+'A'*9999
214 (+37%)   293 (+0%)    294    encode  raw-unicode-escape    '\U00010000'+'\x80'*9999
342 (+96%)   669 (+0%)    669    encode  raw-unicode-escape    '\U00010000'+'\u0100'*9999
342 (+96%)   669 (+0%)    669    encode  raw-unicode-escape    '\U00010000'+'\u8000'*9999

Could you please add NEWS and What's New entries?
msg274815 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-09-07 14:19
Feel free to document the change. It's not my patch, it's yours :-)
History
Date User Action Args
2016-09-07 14:19:28hayposetmessages: + msg274815
2016-09-07 08:51:02serhiy.storchakasetmessages: + msg274780
2016-09-07 00:09:55hayposetstatus: open -> closed
resolution: fixed
messages: + msg274682
2016-09-07 00:08:15python-devsetnosy: + python-dev
messages: + msg274681
2016-09-02 14:33:44hayposetmessages: + msg274239
2016-09-02 14:27:36serhiy.storchakasetmessages: + msg274238
2016-09-02 12:09:14hayposetfiles: + faster_unicode_escape_5.patch

messages: + msg274236
2016-09-02 11:38:39hayposetmessages: + msg274233
2016-06-19 20:10:20serhiy.storchakasetfiles: + faster_unicode_escape_4.patch

messages: + msg268866
versions: + Python 3.6, - Python 3.4
2013-04-03 00:27:54hayposetmessages: + msg185869
2012-10-26 22:48:25serhiy.storchakacreate