Message 215610 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	ezio.melotti, josh.r, pitrou, python-dev, serhiy.storchaka, vstinner
Date	2014-04-05.13:57:16
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<6916527.b9FPbGfPdm@raxxla>
In-reply-to	<CAMpsgwZ26uPrE=cu5hdiex8FJ1xECvD+rtj3x2FaPuLov=ARmg@mail.gmail.com>

Content
субота, 05-кві-2014 12:49:22 ви написали: > STINNER Victor added the comment: > > Serhiy wrote: > > fast_translate.patch works only with ASCII input string and ASCII 1:1 > > mapping. Is this actually typical case? > I just checked the Python stdlib: as expected, all usages of > str.translate() except of email.quoprimime use ASCII 1:1. Because str.translate() is much slower than a series of str.replace() (which already is optimized), some usages of str.translate() was rewritten to use str.replace(). See for example html.escape(). This is about what this issue. > My > optimization is only used if the input string is ASCII, but I expect > that most strings are just ASCI. In most (if not all) these cases input string can be non-ASCII. > bench_translate.py: benchmark ASCII 1:1 but also ASCII 1:1 with deletion. Could you please provide bench_translate.py? > It will probably require more complex "cache". You may take a look at > charmap codec which has such more complex cache (cache with 3 levels), see > my message msg215301. I were going to do this on next step. Full cache can grow up to 1114112 characters, so I planned to cache only BMP characters (cache with 2 levels). You commit too fast, I am late for you. ;)

субота, 05-кві-2014 12:49:22 ви написали:
> STINNER Victor added the comment:
> 
> Serhiy wrote:
> > fast_translate.patch works only with ASCII input string and ASCII 1:1
> > mapping. Is this actually typical case?
> I just checked the Python stdlib: as expected, all usages of
> str.translate() except of email.quoprimime use ASCII 1:1.

Because str.translate() is much slower than a series of str.replace() (which 
already is optimized), some usages of str.translate() was rewritten to use 
str.replace(). See for example html.escape(). This is about what this issue.

> My
> optimization is only used if the input string is ASCII, but I expect
> that most strings are just ASCI.

In most (if not all) these cases input string can be non-ASCII.

> bench_translate.py: benchmark ASCII 1:1 but also ASCII 1:1 with deletion.

Could you please provide bench_translate.py?

> It will probably require more complex "cache". You may take a look at
> charmap codec which has such more complex cache (cache with 3 levels), see
> my message msg215301.

I were going to do this on next step. Full cache can grow up to 1114112 
characters, so I planned to cache only BMP characters (cache with 2 levels).

You commit too fast, I am late for you. ;)

History
Date	User	Action	Args
2014-04-05 13:57:16	serhiy.storchaka	set	recipients: + serhiy.storchaka, pitrou, vstinner, ezio.melotti, python-dev, josh.r
2014-04-05 13:57:16	serhiy.storchaka	link	issue21118 messages
2014-04-05 13:57:16	serhiy.storchaka	create