Message132232
How about the following patch and tests...
Per: http://msdn.microsoft.com/en-us/library/aa369212(v=vs.85).aspx
"""The Identifier data type is a text string. Identifiers may contain the
ASCII characters A-Z (a-z), digits, underscores (_), or periods (.). However, every identifier must begin with either a letter or an underscore."""
So the spec would say that colons are NOT allowed. Editing some entries in the File table of an MSI (using Orca from the MSI SDK) and running the validation confirms that.
All the following were flagged as errors:
'KDiff3EXE;"ASDF@#$', 'chmFile-', 'pdfFile(', 'hgbook]', 'TortoisePlinkEXE]', 'Hg.Cämd'
I also did some speed testing (just in case non/regex might be slow)
Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from timeit import timeit
>>> setup = 'import string\nidentifier_chars = string.ascii_letters + string.digits + "._"\ntmp_str = []'
>>> timeit("re.sub(r'[^a-zA-Z_\.]', '_', 'somefilename.txt')", setup = "import re")
4.434621757767205
>>> setup = 'import string\nidentifier_chars = string.ascii_letters + string.digits + "._"\ntmp_str = []'
>>> timeit('"".join([c if c in identifier_chars else "_" for c in "somefilename.txt"])', setup)
3.3757537425069906
>>> |
|
Date |
User |
Action |
Args |
2011-03-26 12:24:19 | markm | set | recipients:
+ markm, loewis, cdavid |
2011-03-26 12:24:19 | markm | set | messageid: <1301142259.03.0.117975383386.issue2694@psf.upfronthosting.co.za> |
2011-03-26 12:24:18 | markm | link | issue2694 messages |
2011-03-26 12:24:18 | markm | create | |
|