classification
Title: Call CoInitializeEx on startup
Type: enhancement Stage:
Components: Windows Versions: Python 3.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: steve.dower Nosy List: brett.cannon, eryksun, mhammond, nnemkin, paul.moore, steve.dower, theller, tim.golden, zach.ware
Priority: low Keywords:

Created on 2016-06-29 18:19 by steve.dower, last changed 2016-07-18 03:53 by steve.dower. This issue is now closed.

Messages (20)
msg269537 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-06-29 18:19
I'd like to enable calling CoInitializeEx on Python startup for 3.6 (and into the future). See https://msdn.microsoft.com/en-us/library/windows/desktop/ms695279.aspx

This would enable us to use more advanced Windows features within Python that require COM, such as better integration with the shell or features such as AMSI (issue26137, https://msdn.microsoft.com/en-us/library/windows/desktop/dn889587.aspx). The fact that AMSI is a security feature makes it important that it be enabled by default and not be able to be disabled.

Calling CoInitializeEx has no impact on code that isn't already calling it, however, since it can only be called once per thread and you can't change the apartment type, it could break existing code that calls it directly (but only if it tries to use a different apartment type).

My proposal is to call CoInitializeEx(NULL, COINIT_MULTITHREADED) by default, with "-X:STA" to call CoInitializeEx(NULL, COINIT_APARTMENTTHREADED) instead. (Single Threaded Apartment is the commonly used acronym for COINIT_APARTMENTTHREADED.) This forces the decision onto the user rather than letting libraries do it, but since libraries may have conflicting requirements, this hardly makes things worse. It also means we can rely on COM being enabled for any features we may want to enable within Python.

I've nosied people who I expect/hope to have an opinion, so let me know what you think.
msg269542 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-06-29 18:43
Steve has also told me this would enable querying the OS for what the default web browser is.
msg269544 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-06-29 18:46
It would enable wrapping up anything from this list too, and more: https://msdn.microsoft.com/en-us/library/windows/desktop/bb774328(v=vs.85).aspx

Plenty of cool potential features in there :)
msg269548 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2016-06-29 19:22
Things I know that call CoInitialize - pywin32/pythoncom and comtypes. I assume the proposal is to call CoInitializeEx in a way that won't break those?

I'm not sure I see how this would affect the user (i.e. Python code). Brett mentions detecting the user's browser and Steve points to the shell interfaces. But would accessing those require C support (or Python interface code complex enough that using pywin32 or comtypes would be a better option)? In practice, I don't see this change having much impact on the end user.

I'm +0 on this change, regardless - it's harmless enough and offers at least some level of benefit. If it allowed user code to access COM without needing a 3rd party dependency, I'd be +1, but I don't think that's being proposed here.
msg269549 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-06-29 19:26
> If it allowed user code to access COM without needing a 3rd party dependency, I'd be +1, but I don't think that's being proposed here.

It's a prerequisite to adding features to the stdlib that access COM (whether or not COM is directly exposed to the user). But no, there's no new (useful) functionality being proposed in this particular issue.
msg269550 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2016-06-29 19:33
As it happens, all the code I use which calls CoInitialise[Ex] does so 
with STA. But do I understand correctly that, if you implement this, 
there's no way for me to select MTA? If so I would consider that a major 
drawback.
msg269551 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-06-29 19:45
pythoncom and comtypes use the value of sys.coinit_flags when imported, and otherwise default to calling CoInitializeEx(NULL, COINIT_APARTMENTTHREADED). Setting this value should ease problems, but something like -X:STA is still necessary. Note that the launcher allows passing arguments in a shebang.
msg269554 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-06-29 21:18
> But do I understand correctly that, if you implement this, there's no way for me to select MTA?

MTA would be the default, with no -X argument. But we could support a no-op "-X:MTA" as well.

Because of the potential for use in security features, I don't want a state where COM is not enabled at all. That's status quo and no change/discussion would be necessary :)
msg269555 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-06-29 21:19
I'm also okay to discuss whether MTA or STA should be the default, but I'll also be seeking advice from work colleagues on this who know COM really well.
msg269556 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-06-29 21:22
> pythoncom and comtypes use the value of sys.coinit_flags when imported

Good to know. Assume we'll add that as well.

Also, with respect to threading, we'd want to initialize on all new threads too. That will require a way to specify that a new thread should be STA/MTA. I'll think about this before proposing a solution (though since it's inherently an OS-specific specialization, having a separate function for this is a fairly safe way to do it).
msg269558 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2016-06-30 01:00
I've a few reservations here:

* CoInitialize will load a number of COM DLLs into the process, which isn't free and will have some memory and performance costs for programs that don't use COM. I see around 10 such DLLs loaded.

* pythoncom uses sys.coinit_flags because some COM objects simply don't work with the wrong apartment model. IOW, it is the objects you want to use on a particular thread that dictates the model you should use for that thread. Thus scripts written to interact with a particular COM object could set this flag before importing pythoncom so the correct threading model is setup. If this is done at Python startup, the script has lost the chance to influence this - insisting that Python be run with a particular set of flags for the script to work sounds painful.

* pythoncom defaults to COINIT_APARTMENTTHREADED as the apartment threading model is a special snowflake - if you need to use apartment model objects, the main thread *must* be apartment threading (even though other threads can use free threading.) COM objects with a UI (eg, MSOffice, IE) typically required apartment threading. Most new objects probably allow free threading, but I think we want to be careful about defaulting to a model that might avoid common objects from being used without an obscure command-line param.

* This may well break things like pythonwin until they also grow support for the new param - but new params for GUI applications somewhat suck as people tend to start them from an icon instead of the command-line.

* Each thread that wants to use COM must also make this call. If *any* object you want to use uses apartment model threading, then you really have no choice other than to init the main thread with this model - but you are then free to spin up other threads using free-threading, and life is good. So to make this sane, you really want to expose CoInitialize to Python code so new threads can do the right thing. So in this case, I wonder why we don't just expose it and let it be called manually?

IOW, it seems the potential risks of this outweighs the cost of requiring it to be called manually in a controlled way.
msg269566 - (view) Author: Nikita Nemkin (nnemkin) Date: 2016-06-30 09:15
COM should be initialized on demand by C modules that need it. They might need STA or MTA, it's their choice to make.

Python core (ceval and builtins) doesn't need COM and shouldn't impose COM threading models on any threads it creates. Things like -X:STA are not Python's concern at all. Python code (interpreted) can't access COM objects directly, it always goes through C modules, which know better.

Also, COM using apps are very likely to be GUI apps and need STA main thread. MTA by default makes no sense.

PS. AMSI is insane. In what world would Pyhton interpteter send my code for analysis to who knows where and without even an option to disable it (because "security")?
msg269571 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2016-06-30 10:04
Hmm, this'll teach me to rely on my memory rather than checking :-)

It seems to me that core code that needs COM can use it by wrapping the code in CoInitializeEx(sys.coinit_flags)...CoUninitialize(). That will either work fine (I don't know where you got the idea that CoInitializeEx can only be called once per thread - AFAICT the the documentation you linked to clearly states that's not the case) or it will fail because someone else has initialised the thread with an incompatible model. Either try the other model (if you don't actually care) or report the issue and stop - it's the user's code that is relying on conflicting functions, so let the user decide how to deal with it.

Do you have a use case in mind where something like the above wouldn't work?

BTW, I agree that AMSI sounds like a very weird thing to want to add to the Python interpreter - maybe it might make sense to have a special build "checking" interpreter that could be used in appropriate circumstances, but I'd be very uncomfortable adding something like that to the core binary or making it "enabled by default and not be able to be disabled". So that's not a good motivating use case for me...
msg269591 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-06-30 15:58
Mark:
> CoInitialize will load a number of COM DLLs into the process, which isn't free and will have some memory and performance costs for programs that don't use COM. I see around 10 such DLLs loaded.

Very good point. Most of those should already be loaded in the OS, which reduces the cost, but it's still not free. Given some other work I've been trying to do to reduce footprint, this is definitely a step backwards :(

> This may well break things like pythonwin until they also grow support for the new param 

I expect that, which is why I'm only proposing it for 3.6 onwards. While adding support for a new major version of Python should be fairly cheap, it isn't entirely free and so it's the right time to add new complications.

> I wonder why we don't just expose it and let it be called manually?

Because that prevents us adding features to the core interpreter that require COM before a user could choose to initialize it. Also, that's the current state of the world, and this is a proposal to change it - hence I don't want to start with that as an assumption since it makes this whole discussion moot :)

Nikita:
> COM should be initialized on demand by C modules that need it. They might need STA or MTA, it's their choice to make.

And if the C module is the core interpreter (see issue26137 for an example of where this would be necessary)? If we require user code to initialize COM, this whole proposal is moot as that is the current state of the world and it does not require any changes to achieve.

> COM using apps are very likely to be GUI apps and need STA main thread. MTA by default makes no sense.

Fair point. CoInitializeEx defaults to MTA, and pythonwin follows this, while IronPython defaults to STA (since COM initialization is unavoidable in their context), so the default we should use is not obvious. But I will mention that there are plenty of ways that COM is useful without writing a GUI app, so I don't think we should assume that only GUI apps are going to use it.

> PS. AMSI ...

Your PS is basically answered in issue26137 - I want to keep the discussion of the merits of AMSI separate from this issue :)

Paul:
> It seems to me that core code that needs COM can use it by wrapping the code in CoInitializeEx(sys.coinit_flags)...CoUninitialize(). 

This doesn't work when COM objects have to be kept around. In the AMSI case, the threading model is irrelevant, but you need to keep reusing the same context for each call, which means you can't keep initializing COM (I tried it - it crashes, I believe because you get a new memory allocator and/or arena each time, but I didn't diagnose the crash thoroughly).


There may be a workaround for cases where we can't keep reinitializing COM (add a background non-Python thread and do all the calls from there), but the complexity is fairly high and the performance impact is greater. Worth investigating further, since the general feeling seems to be against change.
msg269595 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2016-06-30 16:25
What about instead of unconditionally calling CoInitializeEx in all cases, add a Py_EnsureCOM(flags) C API function?  The flags param would be any flags that the caller must have, would default to 0, and would be combined with sys.coinit_flags before calling CoInitializeEx.  If CoInitializeEx had already been called, the flags are compared with what were used in the call and an error is raised in case of conflict.  Otherwise, CoInitializeEx is called and the flags are returned.  It could also be exposed to Python code as sys.ensure_com, though in the case of something like AMSI being enabled, sys.ensure_com would always either be a no-op or raise an error.


Disclaimer: I know exactly nothing about COM, except what I've read in this issue and the AMSI issue.  If this suggestion is unworkable, please ignore it entirely!
msg269596 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2016-06-30 16:27
> This doesn't work when COM objects have to be kept around. In the AMSI case...

OK, so that's a limitation. Is there any *other* use case for keeping COM objects (that are created by the core) around? If not, then like it or not, this is a problem for AMSI, not for a general "initialise COM" proposal.

Basically, I'm saying that it's only worth splitting this proposal out from the AMSI one if there's a benefit (to offset the costs) for code other than AMSI. And there seems to be no such use case.
msg269597 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-06-30 16:43
> What about instead of unconditionally calling CoInitializeEx in all cases, add a Py_EnsureCOM(flags) C API function?

This is essentially what CoInitializeEx does anyway - if the flags don't match the existing ones, it returns an error. So all we gain is a core function that directly calls CoInitializeEx. (I'm not saying that wouldn't be valuable, but it's also entirely suitable for a 3rd-party package since it requires 3rd-party code to call it.)


> Is there any *other* use case for keeping COM objects (that are created by the core) around?

Haven't found one yet :) IIRC, there's a decent chunk of certificate store APIs that are only available via COM, so if we wanted to enable something like signed scripts (or possibly even migrate ssl to verify using the right APIs on Windows) then we'd probably need it for that. Maybe - though doubtful - it would be feasible to only initialize COM around each connection.

Basically all of the new "cool" APIs added since Windows 8 are COM-based too, so if we wanted to support launching/interacting with modern-style apps (or being embedded in one), or some of the credential support. It's a big hypothetical right now, but given the current state is "we can't use COM because we don't initialize it", we're definitely missing potential opportunities.

If there was an easy way to say "we now initialize COM all the time" then the potential would open up, but since it doesn't look like there's an easy way to do that we'll have to keen on avoiding the new APIs unless there's sufficient value to add the complexity of background threads.
msg269621 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2016-06-30 19:34
I presume by "we" you mean "the core"? There's nothing to stop 3rd party code using COM APIs.

The only downside to using COM in (user) Python code at the moment is the need for a dependency on pywin32 (robust, mature, but a big dependency) or comtypes (relatively lightweight, but less mature/well used). Hardly a huge burden.
msg269637 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2016-07-01 01:11
> > This may well break things like pythonwin until they also grow support
> > for the new param 

> I expect that, which is why I'm only proposing it for 3.6 onwards. While
> adding support for a new major version of Python should be fairly cheap,
> it isn't entirely free and so it's the right time to add new complications.

My point with that is that pythonwin is a GUI app rarely started by the command-line. It isn't that adding the cmdline support is difficult, more that it's difficult for users to specify it. This will be true for any GUI installed into the start menu (eg, idle)

> Nikita:
> > COM should be initialized on demand by C modules that need it. They might need STA or
> > MTA, it's their choice to make.

> And if the C module is the core interpreter (see issue26137 for an example of where
> this would be necessary)? If we require user code to initialize COM, this whole
> proposal is moot as that is the current state of the world and it does not require
> any changes to achieve.

I'd be surprised if issue26137 ended up unconditionally doing a malware scan on everything Python ever executes. Thus, I don't see why "I'd like to enable calling CoInitializeEx on Python startup for 3.6" is necessary - just attempting to initialize it immediately before that feature is invoked would be fine and may sidestep the entire issue. Instead of command-line flags to control COM initialization we should add new flags to disable these new features that require COM (and thus also implicitly control whether COM is initialized or not.)

IOW, I think it makes sense for the core to initialize COM immediately before it needs to use COM, under the assumption that executing "python" or "python myscript.py" isn't going to need to do that by default. I think initializing COM by default at process startup on the off-chance that some COM-using feature will be invoked is more problematic.
msg270686 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-07-18 03:53
Mark's argument is strong, so I'm withdrawing this proposal.

Thanks for the discussion and comments, everyone!
History
Date User Action Args
2016-07-18 03:53:42steve.dowersetstatus: open -> closed
resolution: rejected
messages: + msg270686
2016-07-01 12:54:17thellersetnosy: + theller
2016-07-01 01:11:03mhammondsetmessages: + msg269637
2016-06-30 19:34:18paul.mooresetmessages: + msg269621
2016-06-30 16:43:38steve.dowersetmessages: + msg269597
2016-06-30 16:27:24paul.mooresetmessages: + msg269596
2016-06-30 16:25:05zach.waresetmessages: + msg269595
2016-06-30 15:58:14steve.dowersetmessages: + msg269591
2016-06-30 10:04:12paul.mooresetmessages: + msg269571
2016-06-30 09:15:59nnemkinsetnosy: + nnemkin
messages: + msg269566
2016-06-30 01:00:14mhammondsetmessages: + msg269558
2016-06-29 21:22:47steve.dowersetmessages: + msg269556
2016-06-29 21:19:31steve.dowersetmessages: + msg269555
2016-06-29 21:18:54steve.dowersetmessages: + msg269554
2016-06-29 19:45:26eryksunsetmessages: + msg269551
2016-06-29 19:33:19tim.goldensetmessages: + msg269550
2016-06-29 19:26:16steve.dowersetmessages: + msg269549
2016-06-29 19:22:57paul.mooresetmessages: + msg269548
2016-06-29 18:46:55steve.dowersetmessages: + msg269544
2016-06-29 18:43:15brett.cannonsetmessages: + msg269542
2016-06-29 18:40:56brett.cannonsetnosy: + brett.cannon
2016-06-29 18:23:37steve.dowerlinkissue26137 dependencies
2016-06-29 18:19:37steve.dowercreate