This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: MSILIB truncates last character in summary information stream
Type: behavior Stage: resolved
Components: Library (Lib), Windows Versions: Python 3.2
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: Nosy List: Idoa, Kevin.Phillips, berker.peksag, loewis, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2014-09-11 16:53 by Kevin.Phillips, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (4)
msg226789 - (view) Author: Kevin Phillips (Kevin.Phillips) Date: 2014-09-11 16:53
I recently exploited a subtle bug with the msilib module's GetProperty method on the SummaryInformation class. When retrieving string-typed properties from the stream the last character in the string gets truncated, replaced by a null-byte.

I am using Python v3.2.5 64bit on Windows 7, and I've managed to reproduce the error with the following code snippet:

filename = "sample.msp"
patch_database = msilib.OpenDatabase(filename, msilib.MSIDBOPEN_READONLY | msilib.MSIDBOPEN_PATCHFILE)
summary_info = patch_database.GetSummaryInformation(20)
print (summary_info.GetProperty(msilib.PID_REVNUMBER))

The PID_REVNUMBER returns the patch-GUID for the Windows Installer patch file. In this example the GUID is returned properly however the character string is supposed to be delimited by curly braces - { }. Examination of the returned byte array shows that the leading curly brace is included by the final curly brace is not. Closer examination also shows that the last character in the byte array is \x00.

While it is possible, in this situation, to circumvent the bug by simply removing the trailing bytes and replacing them with a closing curly brace, this may not be so easy to work around for general character strings if the last character in the sequence is not static. As such I'd highly recommend fixing this in the source for the msilib module.
msg226791 - (view) Author: Kevin Phillips (Kevin.Phillips) Date: 2014-09-11 17:21
I should mention that I did discover some source code on GitHub, presumably for this wrapper module, and I believe I found a few questionable parts in the logic for this library that may help explain the cause of this problem:

https://github.com/python-git/python/blob/master/PC/_msi.c

Here are some snippets from the summary_getproperty function that may help:

    char sbuf[1000];
    char *sval = sbuf;
    DWORD ssize = sizeof(sval);

First, here, I believe the ssize value is not going to be initialized as one would expect at first glance. Since sval is a pointer and sizeof() returns the size of it's parameter, in this case a pointer, ssize will represent the pointer size on the target platform. In this case, on a 64bit OS, it's likely going to be set to 8, which I suspect the expected value for ssize is going to be 1000 - the size of the static array sbuff.

    status = MsiSummaryInfoGetProperty(si->h, field, &type, &ival, 
	&fval, sval, &ssize);
    if (status == ERROR_MORE_DATA) {
	sval = malloc(ssize);
        status = MsiSummaryInfoGetProperty(si->h, field, &type, &ival, 
    	    &fval, sval, &ssize);
    }

Now, in this snippet it tries to load the string value from the file, and if the input buffer is not of sufficient size it reallocates the pointer to an appropriate size. Given the fact that ssize is probably initialized to 8 (or less) it is pretty safe to assume the first call to MsiSummaryInfoGetProperty changes this value to the length of the data stored in the file. Closer examination of the documentation for this Windows API function reveals that in this case ssize will be set to the number of characters required 'not including the terminating null character'. (http://msdn.microsoft.com/en-us/library/aa370409(v=vs.85).aspx)

    result = PyString_FromStringAndSize(sval, ssize);

Then finally I noticed this function call. I suspect the problem may lie here. I'm not too familiar with the Python SDK but it sounds like when using Unicode strings - as all string are in Python 3 - you are supposed to use PyUnicode_FromStringAndSize instead. So perhaps there might be something getting lost in the translation, either via the use of this function or perhaps subsequent marshalling of the string object into the Python runtime, due to the encoding changes... not sure. It could be something as simple as one of the function calls used therein assume that the 'size' count includes a null-byte while others don't. It's hard to say without digging into the libs some more.

I hope this helps.
msg233532 - (view) Author: Ido Alus (Idoa) Date: 2015-01-06 12:35
Encountered this issue also in python 2.7.7.

last character is replaced with \x00. reproduction is exactly as stated by Kevin.Phillips
msg272067 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-08-05 22:42
Thanks for the report. This is a duplicate of issue 1104.
History
Date User Action Args
2022-04-11 14:58:07adminsetgithub: 66585
2016-08-05 22:42:29berker.peksagsetstatus: open -> closed

nosy: + berker.peksag
messages: + msg272067

resolution: duplicate
stage: resolved
2015-01-06 12:35:21Idoasetnosy: + Idoa
messages: + msg233532
2014-09-11 19:52:37pitrousetnosy: + loewis
2014-09-11 18:30:50BreamoreBoysetnosy: + tim.golden, zach.ware, steve.dower
2014-09-11 17:21:51Kevin.Phillipssetmessages: + msg226791
2014-09-11 16:53:29Kevin.Phillipscreate