Message 144097 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	wombat
Recipients	Santiago.Romero, belopolsky, benjamin.peterson, cgwalters, dexen, doughellmann, eric.araujo, ezio.melotti, fperez, loewis, mark.dickinson, mcepl, nwerneck, orsenthil, r.david.murray, rhettinger, vstinner, wombat
Date	2011-09-15.19:52:00
SpamBayes Score	7.9020124e-13
Marked as misclassified	No
Message-id	<1316116321.43.0.883789811618.issue1170@psf.upfronthosting.co.za>
In-reply-to

Content
> That can be done programmatically using the unicodedata module. > The regex module (that will hopefully be include in 3.3) is > also able to match characters that belongs to specific categories. Ezio: Thanks. (New to me, actually) Is this what you mean?: http://www.regular-expressions.info/unicode.html For the purposes of patching shlex, should we use regex instead of sets of characters (or strings) to test for membership in shlex.wordterminators? (Or should we create a different class member? Unfortunately, I guess shlex.wordchars has to be left as some kind of container object to maintain backwards compatibility.) Something like that would definitely solve the problem nicely. > Andrew: Thanks for your contribution, but your patch cannot > go into 2.7, as we don’t add new features in stable versions Eric: That's fine. I just posted here because this page currently gets the top hit when searching for "shlex unicode". If you think it's appropriate to repost my message for python version 3.4, let me know. The issue with shlex.wordchars that I raised is valid for any version of python. I'm not sure my solution is optimal. (I like the regex idea).

> That can be done programmatically using the unicodedata module.  
> The regex module (that will hopefully be include in 3.3) is 
> also able to match characters that belongs to specific categories.

Ezio:  Thanks.  (New to me, actually)  Is this what you mean?:
http://www.regular-expressions.info/unicode.html
For the purposes of patching shlex, should we use regex instead of sets of characters (or strings) to test for membership in shlex.wordterminators?  (Or should we create a different class member?  Unfortunately, I guess shlex.wordchars has to be left as some kind of container object to maintain backwards compatibility.)
Something like that would definitely solve the problem nicely.

> Andrew: Thanks for your contribution, but your patch cannot 
> go into 2.7, as we don’t add new features in stable versions

Eric: That's fine.  I just posted here because this page currently gets the top hit when searching for "shlex unicode".  If you think it's appropriate to repost my message for python version 3.4, let me know.  The issue with shlex.wordchars that I raised is valid for any version of python.  I'm not sure my solution is optimal.  (I like the regex idea).

History
Date	User	Action	Args
2011-09-15 19:52:01	wombat	set	recipients: + wombat, loewis, rhettinger, mark.dickinson, belopolsky, orsenthil, vstinner, dexen, benjamin.peterson, cgwalters, mcepl, ezio.melotti, eric.araujo, doughellmann, r.david.murray, nwerneck, fperez, Santiago.Romero
2011-09-15 19:52:01	wombat	set	messageid: <1316116321.43.0.883789811618.issue1170@psf.upfronthosting.co.za>
2011-09-15 19:52:00	wombat	link	issue1170 messages
2011-09-15 19:52:00	wombat	create