Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster unicode-escape and raw-unicode-escape codecs #60538

Closed
serhiy-storchaka opened this issue Oct 26, 2012 · 11 comments
Closed

Faster unicode-escape and raw-unicode-escape codecs #60538

serhiy-storchaka opened this issue Oct 26, 2012 · 11 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-unicode

Comments

@serhiy-storchaka
Copy link
Member

BPO 16334
Nosy @malemburg, @pitrou, @vstinner, @benjaminp, @ezio-melotti, @serhiy-storchaka
Files
  • faster_unicode_escape.patch
  • faster_unicode_escape_4.patch
  • faster_unicode_escape_5.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-09-07.00:09:55.298>
    created_at = <Date 2012-10-26.22:48:25.812>
    labels = ['interpreter-core', 'expert-unicode', 'performance']
    title = 'Faster unicode-escape and raw-unicode-escape codecs'
    updated_at = <Date 2016-09-07.14:19:28.398>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2016-09-07.14:19:28.398>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-09-07.00:09:55.298>
    closer = 'vstinner'
    components = ['Interpreter Core', 'Unicode']
    creation = <Date 2012-10-26.22:48:25.812>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['27740', '43475', '44348']
    hgrepos = []
    issue_num = 16334
    keywords = ['patch', '3.3regression']
    message_count = 11.0
    messages = ['173901', '185869', '268866', '274233', '274236', '274238', '274239', '274681', '274682', '274780', '274815']
    nosy_count = 7.0
    nosy_names = ['lemburg', 'pitrou', 'vstinner', 'benjamin.peterson', 'ezio.melotti', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue16334'
    versions = ['Python 3.6']

    @serhiy-storchaka
    Copy link
    Member Author

    The proposed patch optimizes unicode-escape and raw-unicode-escape codecs. Coders still slower than in 3.2, but much faster than in 3.3. Further speedup is possible with the use of stringlib, but I think that this is enough. The code unified and simplified (251 insertions, 345 deletions).

    Benchmark results (on AMD Athlon 64 X2 4600+):

    Py2.7 Py3.2 Py3.3 Py3.4+patch

    193 (+11%) 325 (-34%) 66 (+224%) 214 decode unicode-escape 'A'*10000
    138 (+72%) 241 (-1%) 154 (+55%) 238 decode unicode-escape '\x80'*10000
    193 (+10%) 323 (-34%) 72 (+194%) 212 decode unicode-escape '\x80'+'A'*9999
    160 (+59%) 273 (-7%) 169 (+51%) 255 decode unicode-escape '\u0100'*10000
    193 (-7%) 324 (-44%) 61 (+195%) 180 decode unicode-escape '\u0100'+'A'*9999
    138 (+67%) 242 (-5%) 135 (+71%) 231 decode unicode-escape '\u0100'+'\x80'*9999
    160 (+59%) 274 (-7%) 169 (+51%) 255 decode unicode-escape '\u8000'*10000
    193 (-7%) 323 (-44%) 61 (+195%) 180 decode unicode-escape '\u8000'+'A'*9999
    138 (+67%) 242 (-5%) 135 (+71%) 231 decode unicode-escape '\u8000'+'\x80'*9999
    160 (+60%) 276 (-7%) 169 (+51%) 256 decode unicode-escape '\u8000'+'\u0100'*9999
    178 (+42%) 275 (-8%) 177 (+43%) 253 decode unicode-escape '\U00010000'*10000
    192 (+30%) 323 (-23%) 61 (+310%) 250 decode unicode-escape '\U00010000'+'A'*9999
    139 (+35%) 243 (-23%) 119 (+57%) 187 decode unicode-escape '\U00010000'+'\x80'*9999
    161 (+38%) 273 (-19%) 150 (+48%) 222 decode unicode-escape '\U00010000'+'\u0100'*9999
    161 (+38%) 273 (-19%) 150 (+48%) 222 decode unicode-escape '\U00010000'+'\u8000'*9999

    558 (-62%) 427 (-50%) 82 (+161%) 214 decode raw-unicode-escape 'A'*10000
    560 (-62%) 425 (-50%) 75 (+183%) 212 decode raw-unicode-escape '\x80'*10000
    558 (-62%) 425 (-50%) 75 (+183%) 212 decode raw-unicode-escape '\x80'+'A'*9999
    178 (+75%) 235 (+32%) 108 (+188%) 311 decode raw-unicode-escape '\u0100'*10000
    555 (-62%) 424 (-50%) 61 (+248%) 212 decode raw-unicode-escape '\u0100'+'A'*9999
    559 (-62%) 424 (-50%) 61 (+248%) 212 decode raw-unicode-escape '\u0100'+'\x80'*9999
    179 (+74%) 235 (+32%) 108 (+188%) 311 decode raw-unicode-escape '\u8000'*10000
    555 (-62%) 424 (-50%) 61 (+248%) 212 decode raw-unicode-escape '\u8000'+'A'*9999
    558 (-62%) 425 (-50%) 61 (+248%) 212 decode raw-unicode-escape '\u8000'+'\x80'*9999
    178 (+75%) 235 (+32%) 108 (+188%) 311 decode raw-unicode-escape '\u8000'+'\u0100'*9999
    200 (+18%) 249 (-5%) 132 (+79%) 236 decode raw-unicode-escape '\U00010000'*10000
    554 (-58%) 423 (-46%) 61 (+277%) 230 decode raw-unicode-escape '\U00010000'+'A'*9999
    558 (-59%) 424 (-46%) 61 (+277%) 230 decode raw-unicode-escape '\U00010000'+'\x80'*9999
    178 (+46%) 235 (+11%) 100 (+160%) 260 decode raw-unicode-escape '\U00010000'+'\u0100'*9999
    178 (+44%) 235 (+9%) 100 (+157%) 257 decode raw-unicode-escape '\U00010000'+'\u8000'*9999

    182 (+137%) 215 (+101%) 148 (+192%) 432 encode unicode-escape 'A'*10000
    582 (-10%) 617 (-16%) 470 (+11%) 521 encode unicode-escape '\x80'*10000
    182 (+131%) 215 (+96%) 148 (+184%) 421 encode unicode-escape '\x80'+'A'*9999
    624 (-7%) 967 (-40%) 558 (+4%) 579 encode unicode-escape '\u0100'*10000
    183 (-19%) 215 (-31%) 132 (+12%) 148 encode unicode-escape '\u0100'+'A'*9999
    584 (-23%) 617 (-27%) 464 (-3%) 451 encode unicode-escape '\u0100'+'\x80'*9999
    627 (-8%) 968 (-40%) 557 (+4%) 579 encode unicode-escape '\u8000'*10000
    183 (-19%) 215 (-31%) 148 (+0%) 148 encode unicode-escape '\u8000'+'A'*9999
    584 (-23%) 617 (-27%) 490 (-8%) 451 encode unicode-escape '\u8000'+'\x80'*9999
    629 (-8%) 969 (-40%) 555 (+4%) 578 encode unicode-escape '\u8000'+'\u0100'*9999
    931 (-39%) 939 (-39%) 602 (-5%) 572 encode unicode-escape '\U00010000'*10000
    183 (+7%) 215 (-9%) 180 (+9%) 196 encode unicode-escape '\U00010000'+'A'*9999
    584 (-9%) 617 (-13%) 482 (+11%) 534 encode unicode-escape '\U00010000'+'\x80'*9999
    630 (-14%) 962 (-43%) 565 (-4%) 544 encode unicode-escape '\U00010000'+'\u0100'*9999
    630 (-14%) 964 (-44%) 565 (-4%) 544 encode unicode-escape '\U00010000'+'\u8000'*9999

    332 (+1459%) 330 (+1468%) 333 (+1454%) 5175 encode raw-unicode-escape 'A'*10000
    332 (+1589%) 329 (+1604%) 333 (+1584%) 5607 encode raw-unicode-escape '\x80'*10000
    336 (+1569%) 334 (+1579%) 333 (+1584%) 5607 encode raw-unicode-escape '\x80'+'A'*9999
    904 (-38%) 911 (-39%) 557 (+0%) 558 encode raw-unicode-escape '\u0100'*10000
    336 (+15%) 335 (+16%) 197 (+97%) 388 encode raw-unicode-escape '\u0100'+'A'*9999
    335 (+16%) 335 (+16%) 197 (+97%) 388 encode raw-unicode-escape '\u0100'+'\x80'*9999
    904 (-38%) 913 (-39%) 557 (+0%) 558 encode raw-unicode-escape '\u8000'*10000
    335 (+16%) 335 (+16%) 197 (+96%) 387 encode raw-unicode-escape '\u8000'+'A'*9999
    335 (+16%) 335 (+16%) 196 (+98%) 388 encode raw-unicode-escape '\u8000'+'\x80'*9999
    912 (-39%) 909 (-39%) 554 (+1%) 558 encode raw-unicode-escape '\u8000'+'\u0100'*9999
    966 (-40%) 997 (-42%) 584 (-0%) 583 encode raw-unicode-escape '\U00010000'*10000
    336 (-42%) 335 (-41%) 213 (-8%) 196 encode raw-unicode-escape '\U00010000'+'A'*9999
    336 (-42%) 335 (-41%) 213 (-8%) 196 encode raw-unicode-escape '\U00010000'+'\x80'*9999
    911 (-43%) 911 (-43%) 570 (-8%) 522 encode raw-unicode-escape '\U00010000'+'\u0100'*9999
    911 (-43%) 913 (-43%) 570 (-8%) 522 encode raw-unicode-escape '\U00010000'+'\u8000'*9999

    @serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode performance Performance or resource usage labels Oct 26, 2012
    @vstinner
    Copy link
    Member

    vstinner commented Apr 3, 2013

    unicode-escape and raw-unicode-escape decoders now use the PyUnicodeWriter API. Can you please compare performances of your patch to PyUnicodeWriter API? Decoders overallocate the buffer.

    According to a comment in the decoder, overallocating is never needed (and will be slower). Your patch does not overallocate the buffer. The decoder should probably be adjusted to disable overallocation.

    Can you please update your patch on the encoder to the last development version?

    @serhiy-storchaka
    Copy link
    Member Author

    Victor's patch harvested most fruits, but there is a place for further optimization.

    Benchmark results for new patch:

    Py3.2 Py3.3 Py3.6 Py3.6+patch

    451 (-47%) 77 (+209%) 140 (+70%) 238 decode unicode-escape 'A'*10000
    269 (-14%) 161 (+44%) 187 (+24%) 232 decode unicode-escape '\x80'*10000
    453 (-48%) 85 (+178%) 181 (+30%) 236 decode unicode-escape '\x80'+'A'*9999
    295 (-4%) 185 (+54%) 229 (+24%) 284 decode unicode-escape '\u0100'*10000
    452 (-47%) 75 (+221%) 213 (+13%) 241 decode unicode-escape '\u0100'+'A'*9999
    275 (-11%) 149 (+64%) 187 (+30%) 244 decode unicode-escape '\u0100'+'\x80'*9999
    297 (-4%) 185 (+54%) 230 (+23%) 284 decode unicode-escape '\u8000'*10000
    452 (-47%) 75 (+221%) 213 (+13%) 241 decode unicode-escape '\u8000'+'A'*9999
    275 (-11%) 149 (+64%) 187 (+30%) 244 decode unicode-escape '\u8000'+'\x80'*9999
    295 (-3%) 185 (+54%) 230 (+24%) 285 decode unicode-escape '\u8000'+'\u0100'*9999
    318 (-29%) 203 (+11%) 220 (+2%) 225 decode unicode-escape '\U00010000'*10000
    452 (-51%) 72 (+207%) 163 (+36%) 221 decode unicode-escape '\U00010000'+'A'*9999
    275 (-31%) 128 (+49%) 160 (+19%) 191 decode unicode-escape '\U00010000'+'\x80'*9999
    295 (-36%) 164 (+16%) 201 (-5%) 190 decode unicode-escape '\U00010000'+'\u0100'*9999
    297 (-36%) 166 (+14%) 199 (-5%) 190 decode unicode-escape '\U00010000'+'\u8000'*9999

    559 (-62%) 88 (+143%) 194 (+10%) 214 decode raw-unicode-escape 'A'*10000
    555 (-62%) 88 (+142%) 195 (+9%) 213 decode raw-unicode-escape '\x80'*10000
    559 (-62%) 88 (+142%) 195 (+9%) 213 decode raw-unicode-escape '\x80'+'A'*9999
    265 (+29%) 133 (+156%) 212 (+61%) 341 decode raw-unicode-escape '\u0100'*10000
    563 (-54%) 77 (+235%) 195 (+32%) 258 decode raw-unicode-escape '\u0100'+'A'*9999
    559 (-54%) 77 (+234%) 194 (+32%) 257 decode raw-unicode-escape '\u0100'+'\x80'*9999
    269 (+27%) 138 (+147%) 208 (+64%) 341 decode raw-unicode-escape '\u8000'*10000
    562 (-54%) 77 (+235%) 193 (+34%) 258 decode raw-unicode-escape '\u8000'+'A'*9999
    559 (-54%) 77 (+234%) 194 (+32%) 257 decode raw-unicode-escape '\u8000'+'\x80'*9999
    265 (+29%) 138 (+147%) 208 (+64%) 341 decode raw-unicode-escape '\u8000'+'\u0100'*9999
    281 (-13%) 152 (+61%) 228 (+7%) 244 decode raw-unicode-escape '\U00010000'*10000
    562 (-65%) 74 (+164%) 200 (-2%) 195 decode raw-unicode-escape '\U00010000'+'A'*9999
    557 (-65%) 74 (+162%) 200 (-3%) 194 decode raw-unicode-escape '\U00010000'+'\x80'*9999
    265 (-2%) 122 (+114%) 184 (+42%) 261 decode raw-unicode-escape '\U00010000'+'\u0100'*9999
    269 (-3%) 122 (+113%) 185 (+41%) 260 decode raw-unicode-escape '\U00010000'+'\u8000'*9999

    195 (+136%) 109 (+323%) 258 (+79%) 461 encode unicode-escape 'A'*10000
    673 (-23%) 522 (-1%) 254 (+103%) 516 encode unicode-escape '\x80'*10000
    197 (+134%) 132 (+248%) 247 (+86%) 460 encode unicode-escape '\x80'+'A'*9999
    869 (-22%) 627 (+9%) 333 (+105%) 682 encode unicode-escape '\u0100'*10000
    197 (-19%) 124 (+28%) 158 (+1%) 159 encode unicode-escape '\u0100'+'A'*9999
    669 (-35%) 493 (-12%) 236 (+83%) 432 encode unicode-escape '\u0100'+'\x80'*9999
    866 (-20%) 628 (+10%) 333 (+108%) 692 encode unicode-escape '\u8000'*10000
    197 (-19%) 125 (+27%) 158 (+1%) 159 encode unicode-escape '\u8000'+'A'*9999
    669 (-35%) 492 (-12%) 236 (+83%) 433 encode unicode-escape '\u8000'+'\x80'*9999
    869 (-20%) 627 (+11%) 324 (+114%) 694 encode unicode-escape '\u8000'+'\u0100'*9999
    870 (-1%) 897 (-4%) 501 (+72%) 861 encode unicode-escape '\U00010000'*10000
    197 (+20%) 139 (+70%) 234 (+1%) 236 encode unicode-escape '\U00010000'+'A'*9999
    668 (-27%) 533 (-9%) 249 (+96%) 487 encode unicode-escape '\U00010000'+'\x80'*9999
    869 (-12%) 646 (+18%) 344 (+122%) 764 encode unicode-escape '\U00010000'+'\u0100'*9999
    864 (-12%) 643 (+19%) 344 (+122%) 762 encode unicode-escape '\U00010000'+'\u8000'*9999

    391 (+1310%) 333 (+1556%) 575 (+859%) 5514 encode raw-unicode-escape 'A'*10000
    391 (+1229%) 334 (+1456%) 576 (+802%) 5198 encode raw-unicode-escape '\x80'*10000
    391 (+1402%) 335 (+1653%) 579 (+914%) 5873 encode raw-unicode-escape '\x80'+'A'*9999
    869 (-25%) 687 (-5%) 356 (+83%) 652 encode raw-unicode-escape '\u0100'*10000
    391 (+46%) 158 (+260%) 214 (+166%) 569 encode raw-unicode-escape '\u0100'+'A'*9999
    391 (+46%) 158 (+260%) 214 (+166%) 569 encode raw-unicode-escape '\u0100'+'\x80'*9999
    873 (-25%) 682 (-4%) 356 (+83%) 652 encode raw-unicode-escape '\u8000'*10000
    391 (+46%) 158 (+260%) 214 (+166%) 569 encode raw-unicode-escape '\u8000'+'A'*9999
    391 (+46%) 157 (+262%) 214 (+166%) 569 encode raw-unicode-escape '\u8000'+'\x80'*9999
    869 (-25%) 688 (-5%) 345 (+90%) 656 encode raw-unicode-escape '\u8000'+'\u0100'*9999
    917 (+4%) 859 (+11%) 532 (+79%) 952 encode raw-unicode-escape '\U00010000'*10000
    392 (-15%) 182 (+84%) 260 (+28%) 334 encode raw-unicode-escape '\U00010000'+'A'*9999
    392 (-15%) 182 (+83%) 260 (+28%) 333 encode raw-unicode-escape '\U00010000'+'\x80'*9999
    870 (-15%) 672 (+10%) 355 (+108%) 738 encode raw-unicode-escape '\U00010000'+'\u0100'*9999
    871 (-16%) 672 (+9%) 355 (+106%) 730 encode raw-unicode-escape '\U00010000'+'\u8000'*9999

    @vstinner
    Copy link
    Member

    vstinner commented Sep 2, 2016

    Unicode escape encodecs were modified by the issue bpo-25353 to use the _PyBytesWriter API. Sadly, I didn't benchmark my change before pushing it :-/

    Your patch basically reverts my change.

    Py3.2 Py3.3 Py3.6 Py3.6+patch
    195 (+136%) 109 (+323%) 258 (+79%) 461 encode unicode-escape 'A'*10000
    391 (+1310%) 333 (+1556%) 575 (+859%) 5514 encode raw-unicode-escape 'A'*10000

    I'm surprised that the revert makes raw-unicode-escape encoder so much faster. Does it mean that the _PyBytesWriter API is so inefficient?

    The most efficient case for _PyBytesWriter is when you only call _PyBytesWriter_Alloc() and _PyBytesWriter_Finish() and the output string has exactly the allocated length. It should be the case when 'A'*10000 is encoded, no?

    @vstinner
    Copy link
    Member

    vstinner commented Sep 2, 2016

    I rebased faster_unicode_escape_4.patch and made tiny changes:

    • Rename WRITECHAR macro to WRITE_ASCII_CHAR()
    • Add WRITE_CHAR() macro to avoid "goto writechar;"
    • Drop the "store" label: use WRITE_CHAR() macro instead, expect that getcode() only returns valid unicode characters (<= MAX_UNICODE)
    • For \UHHHHHHHH format: since MAX_UNICODE is 0x10ffff, hardcode the first two digits as 0, and add an assertion on MAX_UNICODE value
    • PEP-7: add {...} on if/else blocks

    @serhiy-storchaka
    Copy link
    Member Author

    Unicode escape encodecs were modified by the issue bpo-25353 to use the
    _PyBytesWriter API. Sadly, I didn't benchmark my change before pushing it
    :-/

    You can benchmark it now by checking out revisions with your patch and just
    before. But AFAIK the performance was not changed since 3.3 and the effect of
    your patch is the difference between columns 3.3 and 3.6 (very good).

    I used scripts from https://bitbucket.org/storchaka/cpython-stuff/src/default/
    bench/ .

    Your patch basically reverts my change.

    > Py3.2 Py3.3 Py3.6 Py3.6+patch
    > 195 (+136%) 109 (+323%) 258 (+79%) 461 encode unicode-escape
    > 'A'*10000 391 (+1310%) 333 (+1556%) 575 (+859%) 5514 encode
    > raw-unicode-escape 'A'*10000

    I'm surprised that the revert makes raw-unicode-escape encoder so much
    faster. Does it mean that the _PyBytesWriter API is so inefficient?

    I don't remember all details, but it seems that after applying all
    optimizations _PyBytesWriter became just not needed (unlike to
    _PyUnicodeWriter that is used for widening a buffer).

    The awesome difference in encoding for ascii-only data is not related to using
    _PyBytesWriter. It is caused by reordering checks in the inner loop.

    • Rename WRITECHAR macro to WRITE_ASCII_CHAR()

    This is not correct name. This macro is used for writing non-ascii characters
    too.

    • Add WRITE_CHAR() macro to avoid "goto writechar;"
    • Drop the "store" label: use WRITE_CHAR() macro instead,

    Did you benchmark this change? I afraid that this inflates execution code size
    and can have negative impact on the performance.

    @vstinner
    Copy link
    Member

    vstinner commented Sep 2, 2016

    Did you benchmark this change? I afraid that this inflates execution code size and can have negative impact on the performance.

    I consider that readability (maintainability) matters more than such micro optimization.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 7, 2016

    New changeset ad5a28ace615 by Victor Stinner in branch 'default':
    Optimize unicode_escape and raw_unicode_escape
    https://hg.python.org/cpython/rev/ad5a28ace615

    @vstinner
    Copy link
    Member

    vstinner commented Sep 7, 2016

    Since it's almost the 3.6 beta 1, I chose to push the change right now. I'm sure that it's faster, I trust your benchmarks ;-)

    Thanks Serhiy for this nice enhancement.

    > * Rename WRITECHAR macro to WRITE_ASCII_CHAR()

    This is not correct name. This macro is used for writing non-ascii characters too.

    Oh, I fixed this in the pushed change.

    @vstinner vstinner closed this as completed Sep 7, 2016
    @serhiy-storchaka
    Copy link
    Member Author

    Thanks Victor! I benchmarked your patch. There is no regression in comparison with my patch. In few cases your patch is even faster!

    Unpatched Patch v.4 Patch v.5

    148 (+76%) 235 (+11%) 260 decode unicode-escape 'A'*10000
    197 (+30%) 257 (+0%) 257 decode unicode-escape '\x80'*10000
    195 (+32%) 232 (+11%) 258 decode unicode-escape '\x80'+'A'*9999
    227 (+39%) 308 (+2%) 315 decode unicode-escape '\u0100'*10000
    197 (+56%) 241 (+27%) 307 decode unicode-escape '\u0100'+'A'*9999
    201 (+35%) 264 (+3%) 271 decode unicode-escape '\u0100'+'\x80'*9999
    227 (+39%) 308 (+2%) 315 decode unicode-escape '\u8000'*10000
    197 (+56%) 241 (+27%) 307 decode unicode-escape '\u8000'+'A'*9999
    201 (+35%) 264 (+3%) 271 decode unicode-escape '\u8000'+'\x80'*9999
    227 (+39%) 308 (+2%) 315 decode unicode-escape '\u8000'+'\u0100'*9999
    200 (+26%) 245 (+2%) 251 decode unicode-escape '\U00010000'*10000
    192 (+38%) 230 (+15%) 265 decode unicode-escape '\U00010000'+'A'*9999
    167 (+26%) 203 (+4%) 211 decode unicode-escape '\U00010000'+'\x80'*9999
    194 (+31%) 248 (+2%) 254 decode unicode-escape '\U00010000'+'\u0100'*9999
    194 (+31%) 247 (+3%) 254 decode unicode-escape '\U00010000'+'\u8000'*9999

    197 (+9%) 214 (+0%) 215 decode raw-unicode-escape 'A'*10000
    197 (+9%) 214 (+0%) 214 decode raw-unicode-escape '\x80'*10000
    197 (+9%) 214 (+0%) 214 decode raw-unicode-escape '\x80'+'A'*9999
    216 (+68%) 365 (-1%) 363 decode raw-unicode-escape '\u0100'*10000
    181 (+43%) 262 (-1%) 259 decode raw-unicode-escape '\u0100'+'A'*9999
    181 (+43%) 264 (-2%) 258 decode raw-unicode-escape '\u0100'+'\x80'*9999
    216 (+68%) 365 (-1%) 363 decode raw-unicode-escape '\u8000'*10000
    181 (+43%) 261 (-1%) 259 decode raw-unicode-escape '\u8000'+'A'*9999
    181 (+43%) 263 (-2%) 258 decode raw-unicode-escape '\u8000'+'\x80'*9999
    216 (+68%) 365 (-1%) 363 decode raw-unicode-escape '\u8000'+'\u0100'*9999
    245 (+29%) 313 (+1%) 315 decode raw-unicode-escape '\U00010000'*10000
    211 (+10%) 195 (+19%) 232 decode raw-unicode-escape '\U00010000'+'A'*9999
    211 (+10%) 195 (+19%) 233 decode raw-unicode-escape '\U00010000'+'\x80'*9999
    192 (+51%) 287 (+1%) 289 decode raw-unicode-escape '\U00010000'+'\u0100'*9999
    192 (+51%) 287 (+1%) 289 decode raw-unicode-escape '\U00010000'+'\u8000'*9999

    269 (+73%) 424 (+10%) 465 encode unicode-escape 'A'*10000
    266 (+108%) 591 (-6%) 553 encode unicode-escape '\x80'*10000
    298 (+55%) 423 (+9%) 463 encode unicode-escape '\x80'+'A'*9999
    358 (+93%) 695 (-0%) 692 encode unicode-escape '\u0100'*10000
    190 (+13%) 215 (+0%) 215 encode unicode-escape '\u0100'+'A'*9999
    235 (+109%) 520 (-5%) 492 encode unicode-escape '\u0100'+'\x80'*9999
    342 (+102%) 695 (-1%) 691 encode unicode-escape '\u8000'*10000
    190 (+13%) 215 (+0%) 215 encode unicode-escape '\u8000'+'A'*9999
    235 (+109%) 520 (-5%) 492 encode unicode-escape '\u8000'+'\x80'*9999
    367 (+89%) 698 (-1%) 694 encode unicode-escape '\u8000'+'\u0100'*9999
    531 (+124%) 915 (+30%) 1190 encode unicode-escape '\U00010000'*10000
    196 (+20%) 235 (+0%) 236 encode unicode-escape '\U00010000'+'A'*9999
    237 (+104%) 506 (-4%) 484 encode unicode-escape '\U00010000'+'\x80'*9999
    325 (+111%) 681 (+1%) 687 encode unicode-escape '\U00010000'+'\u0100'*9999
    325 (+117%) 681 (+3%) 704 encode unicode-escape '\U00010000'+'\u8000'*9999

    578 (+853%) 5672 (-3%) 5507 encode raw-unicode-escape 'A'*10000
    578 (+731%) 4761 (+1%) 4806 encode raw-unicode-escape '\x80'*10000
    581 (+760%) 5218 (-4%) 4995 encode raw-unicode-escape '\x80'+'A'*9999
    365 (+96%) 714 (+0%) 714 encode raw-unicode-escape '\u0100'*10000
    226 (+72%) 389 (+0%) 389 encode raw-unicode-escape '\u0100'+'A'*9999
    226 (+72%) 389 (+0%) 389 encode raw-unicode-escape '\u0100'+'\x80'*9999
    373 (+91%) 715 (-0%) 714 encode raw-unicode-escape '\u8000'*10000
    226 (+72%) 389 (+0%) 389 encode raw-unicode-escape '\u8000'+'A'*9999
    226 (+72%) 389 (+0%) 389 encode raw-unicode-escape '\u8000'+'\x80'*9999
    366 (+96%) 718 (+0%) 719 encode raw-unicode-escape '\u8000'+'\u0100'*9999
    537 (+110%) 879 (+28%) 1128 encode raw-unicode-escape '\U00010000'*10000
    214 (+37%) 293 (+0%) 294 encode raw-unicode-escape '\U00010000'+'A'*9999
    214 (+37%) 293 (+0%) 294 encode raw-unicode-escape '\U00010000'+'\x80'*9999
    342 (+96%) 669 (+0%) 669 encode raw-unicode-escape '\U00010000'+'\u0100'*9999
    342 (+96%) 669 (+0%) 669 encode raw-unicode-escape '\U00010000'+'\u8000'*9999

    Could you please add NEWS and What's New entries?

    @vstinner
    Copy link
    Member

    vstinner commented Sep 7, 2016

    Feel free to document the change. It's not my patch, it's yours :-)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-unicode
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants