classification
Title: msilib file names check too strict ?
Type: enhancement Stage: needs patch
Components: Windows Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: cdavid, loewis, markm
Priority: normal Keywords: patch

Created on 2008-04-26 04:18 by cdavid, last changed 2011-03-30 05:33 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
make_id_fix_and_test.patch markm, 2011-03-26 12:24 Patch to fix msilib.make_id() and test it review
Messages (8)
msg65834 - (view) Author: Cournapeau David (cdavid) Date: 2008-04-26 04:18
Hi,

I wanted to build a msi using the build_msi distutils command for one of
my package, but at some point, it fails, at the function make_id, at
line 177 in mstlib/__init__.py, for a file named aixc++.py. The regex
indeed refuses any character which is not alphanumeric: is msi itself
really that strict, or could this check be relaxed ?
msg65842 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-04-26 12:19
Indeed, the primary keys in many tables must be Identifiers, see

http://msdn2.microsoft.com/en-us/library/aa369212(VS.85).aspx

make_id tries to synthesize an identifier from a file name, and fails
for your file names.
msg65845 - (view) Author: Cournapeau David (cdavid) Date: 2008-04-26 15:56
Ok, thanks for the information.

It may good to have a bit more informative error, though, such as saying
which characters are allowed when checking against a regex ?
msg65846 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-04-26 16:02
Actually, the algorithm should be fixed to generate a valid identifier
for any input.

Would you like to work on a fix?
msg65847 - (view) Author: Cournapeau David (cdavid) Date: 2008-04-26 16:13
It's not that I don't want to work on it, but I don't know anything
about msi, except that some windows users of my packages request it  :)
So I would need some indication on what to fix exactly

Do I understand right that dist_msi builds a database of the files, and
that the identifiers could be named differently than the filenames
themselves, as long as they are unique ?
msg65848 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-04-26 16:23
> Do I understand right that dist_msi builds a database of the files, and
> that the identifiers could be named differently than the filenames
> themselves, as long as they are unique ?

Correct. As a design objective, I try to use identifiers close to the
file names, to simplify debugging of the MSI file (Microsoft itself
typically uses UUIDs instead).

In short, just make make_id generate valid identifiers. An algorithm
on top of that will make them unique in case of conflicts.

Regards,
Martin
msg132232 - (view) Author: Mark Mc Mahon (markm) * Date: 2011-03-26 12:24
How about the following patch and tests...

Per: http://msdn.microsoft.com/en-us/library/aa369212(v=vs.85).aspx
"""The Identifier data type is a text string. Identifiers may contain the
ASCII characters A-Z (a-z), digits, underscores (_), or periods (.). However, every identifier must begin with either a letter or an underscore."""

So the spec would say that colons are NOT allowed. Editing some entries in the File table of an MSI (using Orca from the MSI SDK) and running the validation confirms that.

All the following were flagged as errors:
'KDiff3EXE;"ASDF@#$', 'chmFile-', 'pdfFile(', 'hgbook]', 'TortoisePlinkEXE]', 'Hg.Cämd'

I also did some speed testing (just in case non/regex might be slow)
Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from timeit import timeit
>>> setup = 'import string\nidentifier_chars = string.ascii_letters + string.digits + "._"\ntmp_str = []'
>>> timeit("re.sub(r'[^a-zA-Z_\.]', '_', 'somefilename.txt')", setup = "import re")
4.434621757767205
>>> setup = 'import string\nidentifier_chars = string.ascii_letters + string.digits + "._"\ntmp_str = []'
>>> timeit('"".join([c if c in identifier_chars else "_" for c in "somefilename.txt"])', setup)
3.3757537425069906
>>>
msg132543 - (view) Author: Mark Mc Mahon (markm) * Date: 2011-03-29 22:14
This issue has been fixed by changes made in issue7639 and issue11696
History
Date User Action Args
2011-03-30 05:33:38loewissetstatus: open -> closed
resolution: fixed
2011-03-29 22:14:07markmsetmessages: + msg132543
2011-03-26 12:24:18markmsetfiles: + make_id_fix_and_test.patch

nosy: + markm
messages: + msg132232

keywords: + patch
2010-01-13 01:58:56brian.curtinsetpriority: normal
stage: needs patch
versions: + Python 2.7, - Python 2.5
2008-04-26 16:23:53loewissetmessages: + msg65848
2008-04-26 16:13:55cdavidsetmessages: + msg65847
2008-04-26 16:02:31loewissetmessages: + msg65846
2008-04-26 15:56:06cdavidsetmessages: + msg65845
2008-04-26 12:19:34loewissetnosy: + loewis
messages: + msg65842
2008-04-26 04:18:42cdavidcreate