This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Speed up ASCII decoding
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: lemburg Nosy List: lemburg, loewis
Priority: normal Keywords: patch

Created on 2001-04-18 05:37 by loewis, last changed 2022-04-10 16:03 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
unicode_ascii.patch loewis, 2001-04-21 12:15
unicode_ascii.patch2 loewis, 2001-04-21 14:29 Alternative patch
Messages (8)
msg36413 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2001-04-18 05:37
In code that supports both byte and unicode strings,
mixing unicode strings with plain character constants
is frequent. E.g. both sre_compile and xmlproc look for
specific characters in an input string. Every usage of
such a character requires default decoding, which will
create a temporary Unicode object.

This patch caches Unicode objects that represent ASCII
characters. On the benchmark

import time
u = u""
t=time.time()
for i in xrange(1000000):
    u+"("
print time.time()-t

it shows a 10% speed-up.
msg36414 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2001-04-18 06:04
Logged In: YES 
user_id=21627

Attach patch.
msg36415 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2001-04-18 08:54
Logged In: YES 
user_id=38388

I knew this would come one day :-) 

The patch looks OK, but please also add proper init and
finalize code so that unicode_ascii[] gets cleared up
properly when the interpreter shuts down (this is important
for uses of Python in e.g. mod_snake).
msg36416 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2001-04-18 12:51
Logged In: YES 
user_id=21627

Committed as 2.83 of unicodeobject.c, with the requested
addition of init/fini code.
msg36417 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2001-04-21 12:15
Logged In: YES 
user_id=21627

Reopened, since the previous patch broke test_unicodedata.

In this version, the cache is only consulted in DecodeASCII,
since PyUnicode_FromUnicode must not share objects. It also
has the requested init/fini code.
msg36418 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2001-04-21 14:29
Logged In: YES 
user_id=21627

I've added an alternative patch, which does return shared
objects from PyUnicode_FromUnicode, and corrects the two
places where the result of PyUnicode_FromUnicode did modify
the resulting object.
msg36419 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2001-04-23 11:26
Logged In: YES 
user_id=38388

Thanks for the update. Digging a little deeper into the
possibilities of sharing Unicode objects I found that there
are some important issues to be taken into consideration
which require a little more work on the sharing code.

I will work on this during the week and get back to you next
week.
msg36420 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2001-04-23 14:44
Logged In: YES 
user_id=38388

Checked in a modified patch.
History
Date User Action Args
2022-04-10 16:03:58adminsetgithub: 34362
2001-04-18 05:37:32loewiscreate