Author rsc
Recipients belopolsky, benjamin.peterson, donlorenzo, rsc, zanella
Date 2008-05-08.15:44:54
SpamBayes Score 6.21846e-06
Marked as misclassified No
Message-id <20080508154749.DEBBB1E8C5C@holo.morphisms.net>
In-reply-to <d38f5330805080808x267f7131p8a69507694f6060e@mail.gmail.com>
Content
> You don't need to get so defensive.  I did not raise a performance
> problem, I was simply responding to Rafael's "AFAIK the lookup on
> dictionaries is faster than on lists" comment.  I did not say that you
> *should* rewrite your patch the way I suggested, only that you *can*
> use new language features to simplify the code.

I was responding to the entire thread more than your mail.
I'm frustrated because the only substantial discussion has
focused on details of how to implement set lookup the fastest
in a function that likely doesn't matter for speed.

> In any case, I am -0 on the patch.  The current documentation says:

Now these are the kinds of comments I was hoping for.
Thank you.

>    Return *string* with all non-alphanumerics backslashed; this is useful if you
>    want to match an arbitrary literal string that may have regular expression
>    metacharacters in it.

Sure; the documentation is wrong too.

> I did not see a compelling use case presented for the change.  

The usual convention in regular expressions is that escaping
a word character means you intend a special meaning, and
underscore is a word character.  Even though the current re
module does accept \_ as synonymous with _ (just as it accepts
\q as synonymous with q), it is no more correct to escape _ than
to escape q.

I think it is fine to escape all non-word characters, but someone
else suggested that it would be easier when moving to larger
character sets to escape just the special ones.  I'm happy with
either version.

My argument is only that Python should behave the same in 
this respect as other systems that use substantially the same
regular expressions.

> since there is no mechanism to assure that _special indeed
> contains all re metacharacters, it may present a maintenance problem
> if additional metacharacters are added in the future.

The test suite will catch these easily, since it checks that 
re.escape(c) matches c for all characters c.  But again, I'm happy
with escaping all ASCII non-word characters.

Russ
History
Date User Action Args
2008-05-08 15:45:41rscsetspambayes_score: 6.21846e-06 -> 6.21846e-06
recipients: + rsc, belopolsky, benjamin.peterson, zanella, donlorenzo
2008-05-08 15:45:38rsclinkissue2650 messages
2008-05-08 15:45:32rsccreate