classification
Title: Allow ssl certificates to be specified from memory rather than files.
Type: enhancement Stage: needs patch
Components: Library (Lib), SSL Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: christian.heimes Nosy List: asvetlov, brandon-rhodes, christian.heimes, dstufft, giampaolo.rodola, haypo, jcea, jgehrcke, kristjan.jonsson, martius, njs, orsenthil, pitrou
Priority: normal Keywords: patch

Created on 2012-11-16 15:10 by kristjan.jonsson, last changed 2017-07-14 05:31 by njs.

Files
File name Uploaded Description Edit
sslpatch1.patch kristjan.jonsson, 2012-11-16 15:10
sslpatch2.patch kristjan.jonsson, 2013-01-27 02:55
ssl.patch kristjan.jonsson, 2013-06-26 22:50 review
ssl2.patch kristjan.jonsson, 2013-06-27 23:15 review
ssl2.patch kristjan.jonsson, 2013-11-14 12:11 review
python_16487_jgehrcke_01.patch jgehrcke, 2016-08-13 12:58 review
Pull Requests
URL Status Linked Edit
PR 2449 open jgehrcke, 2017-06-27 18:52
Messages (48)
msg175691 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-11-16 15:10
The _ssl module (and indeed the openssl lib) relies heaviliy on actual filesystem locations to load certificates.  A client or a server may not want to rely on physical filesystem locations to load certificates for authentication or verification.  Physical disc files are cumbersome and present a management burden in the presence of multiple processes.

This patch adds extensions to the _ssl.c file which allows certificates, keys and certification chains to be provided by file contents, rather than file name.  

The ctx.load_cert_chain and ctx.load_verify_locations take additional arguments to specify the data on this form.

the ssl.wrap_socket does not add arguments, rather the function is polymorphic in that the conents of the certfil/keyfile are examined and treated as file-data if beginning with -----BEGIN.  the ca_certs is similarly treated as a list of file contents, if it is a list, (rather than a string)

This patch is the result of work at CCP for deploying ssl clients and servers in an isolated environment without having to resort to temporary disk files.
msg175692 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-11-16 15:10
(the patch contains a local change to set the location of the 'external' dir, please disregard this.
msg175701 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-16 18:30
> (the patch contains a local change to set the location of the
> 'external' dir, please disregard this.

Can you upload a cleaned up patch then? :)
msg175709 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-11-16 19:41
Can't right now. It's only relevant for pcbuild anyway so you can test it for Unix if you want without fear. Don't worry, I always give my patches a cleanup before committing them. 
Meanwhile, I'd welcome comments on the substance.
msg175741 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-17 14:35
> Can't right now. It's only relevant for pcbuild anyway so you can test 
> it for Unix if you want without fear. Don't worry, I always give my
> patches a cleanup before committing them.

I don't want to review a patch if you tell me that it has problematic stuff in it. Please first upload a cleaned up patch.
msg175779 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-17 17:12
By the way, without looking too much at the patch, I think it would be cleaner to let the certificate functions accept file-like objects, rather than adding keydata/certdata arguments.
msg175806 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-11-17 20:24
The please don't. I'm actually growing tired of the way you seem to always latch onto every submission I make almost with hostility. I will find someone else to look this over.
msg175809 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-17 21:12
> The please don't. I'm actually growing tired of the way you seem to
> always latch onto every submission I make almost with hostility. I
> will find someone else to look this over.

I fully realize my reply was blunt. But perhaps you could try to
understand that people care about the quality of the codebase and it is
why they care about reviewing patches before letting them in. Asking for
reviews of patches that you *know* contain unneeded cruft is rude.
Disregarding their comments and saying "I'll ask someone else because I
don't want to deal with you anymore" is even worse.

This certainly happens often to you, because (IME) you are often pushy
with your patches. So, yes, you must expect a certain amount of
defensive replies when being so. Just be prepared to deal with them.
msg180734 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2013-01-27 02:06
Kristján, are you pursuing this yet?. Can we move on?
msg180736 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-01-27 02:49
I wasn't pushing it anymore because there seemed to be no interest and Antoine attitude made me lose interest in contributing this.  No one liked to comment on my approach except Antoine and I had grown tired of his persistent negativity and aloof manners.  I might as well say that here that it is this kind of feedback that has turned me away from contributing to the project on several occations.  At CCP we have a heavily patched version of python that we have modified throug time to deal with real world projects.  I (and my company) genuinely like to contribute some of those improvements back to the trunk and regularly make efforts to do so (e.g. by taking the time and effort to port to python 3) but often they are met with with almost hostile negativity.  I hope, for the sake of Python, that I am the only contributor meeting this kind of attitude.

Antoine, please take the time to read what I posted here again.  I had a working patch, used in production. I uploaded, it, but when I got back home realized that an unrelated change was part of it.  Unable to change it from there I asked politely that you disregard that "unrelated" change while looking at the _substance_ of patch.  As you know, I am a committer and will polish things before checking them in, as anyone does.
How is that rude?  

The patch still stands. I on a poor internet in a Chinese hotel and will attempt to download and edit the patch so that the irrelevant change to the PC build solution should no longer need to hinder anyone from reading it.
msg180737 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-01-27 02:55
Manually edited the .patch file  The build changes are gone but should probably go in separately in some form later.
msg190997 - (view) Author: Brandon Rhodes (brandon-rhodes) * Date: 2013-06-11 23:22
Kristján, your patch is a wonderful idea—I am about to commit production code that will have to create tens of thousands of temporary files during operation, one file each time SSL is started up on a socket, which could be avoided if something like this patch were applied. I had always assumed that it was simply a limitation of the underlying SSL library! An interface that takes a filename like this, instead of a live file-like object, seems so un-Pythonic that I assumed the only reason for it was a limitation in OpenSSL. Thank you very much for looking under the covers and discovering that this was not the case!

I do, though, feel a slight twinge when we add Even More Parameters to a Standard Library routine but in such a way that it cannot be used with an existing parameter — as here in your patch, where we gain a parameter like `certdata` that cannot be (should not be?) used at the same time as `certfile`. It seems redundant to have two names for the same parameter to the underlying library, and makes it look like the routine needs more information than it really does.

Since my own instinct was to think, ten minutes ago, "Maybe I can pass a StringIO, since it says it wants a fine!", I am very much in support of the idea of keeping only the existing parameters, but making them accept both strings (which, for compatibility, would continue to be interpreted as filenames) and file-like objects as arguments. I think this would have a great deal of symmetry with how other parts of the Standard Library work, while keeping this patch's central value of making it possible for those of us with cert-heavy code to avoid the creation of thousands of files a minute.

Again, thank you VERY much for discovering that OpenSSL can do this, and I will try to provide whatever encouragement I can as you try to shepherd this past the other committers.
msg191018 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-06-12 09:25
Hi there.
Thanks for your comments.  This is the kind of discussion I was hoping to have about my draft patch.

I too have reservations about adding arguments.  In the version of this that we have in house, we actually don't use a "certdata" argument, but rather this:
 - 'certfile' can be either a filename or a string containing the data.  The start of the string is examined to determine if is the latter

The reason I didn't do it this way in the patch is because:
a) it could be confusing and possibly un-pythonic to have that level of polymorphism employed
b) There are possible edge cases.  What if you had a filename that started with the magic tokens (unlikely but technically possible).

I also would prefer passing in the certificates as raw bytes data, rather than file-like objects.  there is no reason for them to be files, this is not data that is read on demand.  libssl reads the entire contents of these files in one gulp, so there is no efficiency to be gained through any sort of pipelining by providing a file-like-object..  Adding the plumbing inside the library to do python "read()" calls is completely unnecessary when the caller can simply do that.

The only possible reason would be to resolve the above ambiguity, i.e. to allow the api to try to invoke a "read()" function on the 'file' to determine if it is a file-like object.
msg191828 - (view) Author: Brandon Rhodes (brandon-rhodes) * Date: 2013-06-25 01:05
Kristján, you are certainly correct that a single-argument that can be either a filename or a cert is inappropriate; we should not be peeking inside of strings to guess what they contain.

And I think you also have a good point about Pythonic-ness when it comes to polymorphism; routines that are sensitive to object type have been going more and more out of style for the past twenty years, and for good reason.

But, having ceded those points, I still think string-or-file-like-object is the correct approach here, simply because it is the overwhelming approach of the Standard Library; everyone is used to it, and will already "know the ropes" of that approach; and it prevents noisying up the interface with redundant arguments. Were we doing this over again, we would simply not allow a filename at all, and let the user open the file if they needed to. But since a filename is allowed, it feels like the Official Standard Library Approach to also allow a file-like object.

Some examples:

zipfile.ZipFile: "Open a ZIP file, where file can be either a path to a file (a string) or a file-like object."

http://docs.python.org/2/library/zipfile#zipfile.ZipFile

binhex.hexbin: "Decode a binhex file input. input may be a filename or a file-like object supporting read() and close() methods."

http://docs.python.org/release/2.7.5/library/binhex.html#binhex.hexbin

xml.dom.minidom.parse: "filename_or_file may be either a file name, or a file-like object."

http://docs.python.org/2/library/xml.dom.minidom.html#xml.dom.minidom.parse

mailbox.Mailbox.add: "Parameter message may be a Message instance, an email.message.Message instance, a string, or a file-like object (which should be open in text mode)."

http://docs.python.org/2/library/mailbox.html#mailbox.Mailbox.add

pickletools.dis: "pickle can be a string or a file-like object."

http://docs.python.org/2/library/pickletools.html#pickletools.dis

I suggest that these precedents, along with others that I believe we could find with a more exhaustive search of the Standard Library, are sufficient to suggest that in this case the least-surprise approach is a single argument that's either a filename or file-like object. I would suggest reviewing quickly the code for the above examples and following their example for how to distinguish most cleanly between a filename and file-like object; I wonder if they call any common code to get the contents out, or each do it completely by themselves? :)

Thanks again for wanting to add this to the SSL module, it will be a *great* addition that solves an important use case!
msg191838 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-06-25 08:34
ok, I concede that a file-like object makes sense from a polymorphism point of view.  It makes no sense from a streaming point of view.  A caller can then wrap their data into a StringIO instance.

I'll rework the patch in the manner you suggest.
msg191847 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-06-25 11:04
Agreed, a file-like object is the way to go.
I don't think you need to write the logic in C, by the way. You can write a high-level function and defer to a low-level C func for the basic API wrapping.
msg191859 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-06-25 13:59
Antoine, are you suggesting that we remove the current c-level capability to use file system files (using open()) and just go with raw bytes data at the C api level, and then do the 'filename or filelike object' in Python land?
msg191860 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-06-25 14:39
> Antoine, are you suggesting that we remove the current c-level
> capability to use file system files (using open()) and just go with
> raw bytes data at the C api level, and then do the 'filename or
> filelike object' in Python land?

Yes, I think that's reasonable. Hopefully it won't break any existing uses.
msg191862 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-06-25 14:52
I didn't know about this issue and have worked on a similar feature in #18138.
msg191868 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-06-25 16:51
Ha, funny. Now it's time to reconciliate your respective patches :)
msg191922 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-06-26 22:50
Here is an updated patch.
We now support file-like objects.
New helper functions try to turn file arguments into either Py_buffer objects containing the read data, or PyBytesObject argument with the file system encoding of the path.
A file-like object is recognized by the successful execution of a read() method call.
docs and tests updated.
msg191923 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-06-27 00:48
Thx Kristján!

My patch maps PyUnicode to PEM encoded cert data and objects with Py_Buffer support to DER encoded cert data. Perhaps you like to you the same concept in your patch to support TextIO and BytesIO read() methods. Feel free to reuse as much of my patch as you like.

You don't check ERR_GET_LIB() in some places.

Could you please add a comment about extra_chain_cert in PySSL_CTX_use_certificate_chain_mem() and SSL_CTX_get_cert_store() in PySSL_CTX_load_verify_certs_mem()? The difference between the extra chain and the trusted store got me confused more than once. As far as I recall the trusted store is suppose to contain only trusted root CA. The extra chain is uesd to provide intermediate certs for the cert chain between a root CA and the service's cert.
msg191924 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-06-27 00:59
PS: I like to have DER cert support for #17134. I'd rather not convert DER to PEM.
msg191936 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-06-27 09:38
Thanks for your comments Christian.
"You don't check ERR_GET_LIB() in some places."
Do you have a particular place in mind?

About DER.  As I understand, currently _ssl only supports PEM.  If that is the case, then supporting DER should, IMHO, be a separate patch.  It will probably involve adding an argument to the functions.

Both DER and PEM files are binary formats.  the PEM is base64 encoded ascii with some strict ascii headers and footers.  As such, the encoding is quite explicit.
I am not sure that automatic conversion from unicode to ascii should be undertaken on the C level, particularly if the intention later is to support DER, which is pure binary and where "string" has no place.

I'll have a look at your patch as well, haven't gotten round to do that yet.
msg191937 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-06-27 09:54
I found two places:

if (ERR_GET_REASON(err) == X509_R_CERT_ALREADY_IN_HASH_TABLE) {
if (ERR_GET_REASON(err) == PEM_R_BAD_BASE64_DECODE)


AFAIK the _ssl module only supports PEM certs for loading. On the other hands cert data can only be retrieved as dict representation or binary DER data, e.g. getpeercert(binary_form=True) -> DER bytes. It's a bit of a puzzle to me.

It feels a bit strange to treat PEM certs as binary data, especially since the SSL module treats PEM as ASCII unicode. For example DER_cert_to_PEM_cert() accepts bytes and returns str, PEM_cert_to_DER_cert() converts str to bytes.
msg191938 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-06-27 10:00
> It feels a bit strange to treat PEM certs as binary data, especially
> since the SSL module treats PEM as ASCII unicode. For example
> DER_cert_to_PEM_cert() accepts bytes and returns str,
> PEM_cert_to_DER_cert() converts str to bytes.

I agree that PEM is logically text (i.e. unicode). Also, having the
unicode == PEM, bytes == DER distinction sounds reasonably practical.
msg191941 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-06-27 10:43
Ok, thanks.
The consistency argument is strong, also Antoine's suggestion to use the return type of read() as a discriminant.

also please excuse me because I am not a habitual user of Python 3 and haven't become used to the str/binary dichotomy yet.  I didn´t know, for instance, about io.BytesIO until I found to my horror that io.StringIO.read() would output unicode (i.e str).
msg191942 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-06-27 10:53
EVE Online is still using Python 2.7? You gotta hurry up or Guido will beat you with Dropbox's 3.x port. :)
msg191957 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-06-27 17:17
2.7 is the pinnacle of pythonic achievement.  Particularly our branch of it :)
One day we'll move, I'm sure, when there is an opportune moment.  For example, if we were to start supporting a new game, a new platform. But for now, if it ain't broke, we won't fix it.

In fact, today I was asked about python for the ps4... It might be worth taking a look at porting p3k for that and making a clean break :)
msg191970 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-06-27 23:15
Okay, I have updated the patch as suggested.
string mode means PEM encoding, binary mode DER encoding.
Note that DER encoding is more limited, there is no way to concatentate private keys and certificates into one file (the PEM decoder searches the file until it finds the proper data it is looking for, certificate or private key).

Also, the unittests don't test for DER encoded private key, because there is no way to generate them currently.  The conversion functions in ssl only cope with certificates.  Although adding them, changing them would be trivial.  These functions should also probably know how to split a PEM file into sections so that they can be individually converted.
msg202800 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-11-13 23:32
What's the status of this patch? I need a way to load CA certs from memory for another patch of mine.
msg202828 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-11-14 11:47
Update the latest patch to the current state of python.
msg202829 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-11-14 11:48
The patch should be valid, please try it out.
msg202832 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-11-14 12:04
Your patch looks like Benjamin's fix for issue #17828 and not like a SSL improvement. :)
msg202833 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-11-14 12:11
Haha, indeed.  what nonsense.
Here is the correct one.
msg246488 - (view) Author: Martin Richard (martius) * Date: 2015-07-09 13:01
Hi,

I would like to update this patch so it can finally land in cpython, hopefully 3.6.

tl;dr of the thread:
In a nutshell, the latest patch from Kristján Valur Jónsson updates
SSLContext.load_cert_chain(certfile, keyfile=None, password=None) and
SSLContext.load_verify_locations(cafile=None, capath=None)

so certfile, keyfile and cafile can be either a string representing a path to a file or a file-like object.

The discussion seems to favor this API (pass file-like objects) rather than using new arguments (certdata, keydata) to pass string or bytes objects.

However, Christian Heimes proposed a patch (which landed in 3.4) which adds a cadata argument to load_verify_locations().


So, what should we do?
- allow certfile, keyfile and cafile to be paths or file-like objects,
- add certdata and keydata to load_cert_chain() to be consistent with load_verify_locations(), 
- do both.

I'd go the the 2nd solution to be consistent with the API and keep things simple.
msg246489 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-07-09 13:14
I thing adding "keydata" and "certdata" makes things more complicated, on the contrary. You start having an API with many optional arguments but some of them are exclusive with each other (because you can only specify a single key and cert chain).

The "cafile", "capath", "cadata" in load_verify_locations() are cumulative, but they would be exclusive in load_cert_chain(): there's not much symmetry.
msg246492 - (view) Author: Martin Richard (martius) * Date: 2015-07-09 13:40
You are right.

And if certfile and keyfile (args of load_cert_chain()) accept file-like objects, we agree that cafile (load_verify_location()) should accept them too?
msg246494 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-07-09 13:42
Le 09/07/2015 15:40, Martin Richard a écrit :
> 
> And if certfile and keyfile (args of load_cert_chain()) accept
file-like objects, we agree that cafile (load_verify_location()) should
accept them too?

It could, but that's a separate issue. Let's stay focused on what is
necessary to solve this use case, IMO (i.e. specify non-file-backed data
as a certification chain or private key).
msg246495 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-07-09 13:43
Sorry, I didn't take time to read the whole discussion. For me, it's a good idea to accept a filename or a file object in the same parameter. Having two exclusive parameters for the same thing (ex: CA) doesn't smell like a great API.
msg246496 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2015-07-09 13:48
I'd rather introduce new types and have the function accept either a string (for path to fiel) or a X509 object and a PKey object. It's more flexible and secure. With a private key type we can properly support crypto ENGINEs and wipe memory when the object gets deallocated.
msg246555 - (view) Author: Martin Richard (martius) * Date: 2015-07-10 10:41
I'm not sure I know how to do this correctly: I lack of experience both
with openssl C API and writing python modules in C.

It may be more flexible, but unless the key is protected/crypted somehow,
one would need a string or bytes buffer to hold the key when creating the
private key object: not much secure. Don't you think that it should be
addressed in a separate issue?

2015-07-09 15:48 GMT+02:00 Christian Heimes <report@bugs.python.org>:

>
> Christian Heimes added the comment:
>
> I'd rather introduce new types and have the function accept either a
> string (for path to fiel) or a X509 object and a PKey object. It's more
> flexible and secure. With a private key type we can properly support crypto
> ENGINEs and wipe memory when the object gets deallocated.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue16487>
> _______________________________________
>
msg272585 - (view) Author: Jan-Philip Gehrcke (jgehrcke) * Date: 2016-08-13 12:58
Hello!

Like everybody in this thread I would love to see this land and have prepared a new patch, hoping that we can process this still for 3.6.

Antoine summarized the core task here very well:

> Let's stay focused on what is
> necessary to solve this use case, IMO (i.e. specify non-file-backed data
> as a certification chain or private key).

What follows is what I believe is the consensus in this thread about a minimum
viable solution (points mentioned earlier, enriched with some detail):

- The `certfile` and `keyfile` arguments to `SSLContext.load_cert_chain()` can
  both be either a file path or a file object.

- `str` or `bytes` values to both, `certfile` and `keyfile`, are interpreted
  as paths.

- The `password` behavior must stay as is, independent of `certfile`/`keyfile`
  being file object or path.

- When handing file paths to `load_cert_chain()`, the behavior must stay as is,
  in all detail.

- When handing file objects to `load_cert_chain()`,

  - use the OpenSSL API call `PEM_read_bio_PrivateKey()` for loading the key
    data (reading PEM data is what's currently supported).

  - support the case where the `certfile` file object first presents a private
    key and then (a) certificate(s), as currently documented:

    """
    Often the private key is stored in the same file as the certificate; in
    this case, only the certfile parameter to SSLContext.load_cert_chain() and
    wrap_socket() needs to be passed. If the private key is stored with the
    certificate, it should come before the first certificate in the
    certificate chain:
    """

    Ref:
      - https://docs.python.org/3/library/ssl.html#combined-key-and-certificate

  - resemble the OpenSSL API call `SSL_CTX_use_certificate_chain_file()`
    behavior for loading the certificate data (as this is what's currently
    used in the `SSLContext.load_cert_chain()` implementation in _ssl.c. There
    is no direct correspondence for this function reading the data directly
    from memory BIOs. Relevant documentation for resembling that behavior:

    """
    SSL_CTX_use_certificate_chain_file() adds the first certificate found in
    the file to the certificate store. The other certificates are added to the
    store of chain certificates using SSL_CTX_add1_chain_cert.
    """

    """
    For a longer chain, the client must send the complete chain [...]. This
    can only be accomplished by either adding the intermediate CA certificates
    into the trusted certificate store for the SSL_CTX object [...], or by
    adding the chain certificates using the SSL_CTX_add_extra_chain_cert
    function,
    """

    """
    When building its own certificate chain, an OpenSSL client/server will try
    to fill in missing certificates from CAfile/CApath, if the certificate
    chain was not explicitly specified (see SSL_CTX_add_extra_chain_cert,
    SSL_CTX_use_certificate.
    """

    Refs:
      - https://github.com/python/cpython/blob/0852878a81edd5c16776a68ce34c45cca233deae/Modules/_ssl.c#L2824
      - https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_use_certificate.html
      - https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_set_client_cert_cb.html
      - https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_load_verify_locations.html

    When inspecting the OpenSSL code for SSL_CTX_use_certificate_chain_file(),
    some of the above statements are not entirely correct.

    After inspection, to me the safest bet seems to keep the memory/BIO-based
    load_cert_chain() implementation as close as possible to

    SSL_CTX_use_certificate_chain_file() from OpenSSL 1.0.2h:

      https://fossies.org/dox/openssl-1.0.2h/ssl__rsa_8c_source.html#l00685

    use_certificate_chain_file() in OpenSSL 1.1.0-pre6:

      https://fossies.org/dox/openssl-1.1.0-pre6/ssl__rsa_8c_source.html#l00599

    Using this as a template (instead of coming up with a solution purely
    based on OpenSSL docs) ensures that this part of the current
    `SSLContext.load_cert_chain()` docs will be remain correct:

    """
    The certfile [...] in PEM format containing the certificate as well as any
    number of CA certificates needed to establish the certificate’s
    authenticity.
    """


More behavioral detail that I planned with:

- `certfile` and `keyfile`, if given both, must both be a file path or both be
  a file object. A mixture is undefined behavior and leads to an error
  depending on the specifics of the mixture.

- If `certfile` and `keyfile` are not file paths, expect them to have a read()
  method and just call it. Let a TypeError or AttributeError be handled by the
  caller (as done elsewhere in the stdlib).

- Use the existing Python MemoryBIO abstractions introduced by Geert Jansen in this
  beautiful patch: https://github.com/python/cpython/commit/08802a11fdb844a5aaec3a1dc67d8281d582d22b

  That is, prepare these BIOs in Python code and pass them to an underlying C
  function that consumes BIOs.


A summary of a few things that are in my opinion beyond a minimal
implementation (but probably useful in the future!):

- Private key deserialization using d2i_PKCS8PrivateKey_bio(). The current
  implementation only supports SSL_FILETYPE_PEM and some of the semantics of
  the load_cert_chain() method is specifically adjusted to PEM features.

- Certificate deserialization supporting formats other than PEM (the current
  implementation uses SSL_CTX_use_certificate_chain_file() which only supports
  PEM).

- New types for certificates and keys.

Supporting additional serialization formats greatly increases the complexity
of the task at hand here. We can save that for later.


An important note on backwards compatibility:

>> Antoine, are you suggesting that we remove the current c-level
>> capability to use file system files (using open()) and just go with
>> raw bytes data at the C api level, and then do the 'filename or
>> filelike object' in Python land?
>
> Yes, I think that's reasonable. Hopefully it won't break any existing uses.

I absolutely agree that Python code should be responsible for doing the
distinction between file paths and file objects. However, I do not agree with

> remove the current c-level capability to use file system files

After following the code I am rather sure that we should leave the current C
implementation (`_ssl__SSLContext_load_cert_chain_impl()` in Modules/_ssl.c)
as untouched as possible, and not try to integrate the new logic into that
very same function.

Instead, I figured, we should use Python code to delegate to one of two
different C functions after the distinction between file path and file
object has been made, whereas one of the two C functions is the one 
we used so far.

The central argument for having two separate C code paths is that old code
must proceed using OpenSSL's SSL_CTX_use_certificate_chain_file().

Hence, I opted for implementing SSLContext.load_cert_chain() in Python,
delegating the actual work

* to either the current load_cert_chain C implementation if file paths were
  given or

* to a new C implementation resembling SSL_CTX_use_certificate_chain_file(),
  after read()ing the file object(s) and creating the corresponding
  MemoryBIO object(s) containing the cert/key data.


Comments on tests:

* The diff seems radical due to restructuring. I have manually confirmed that
  the old tests pass against the new code.

* Major goal of the test refactoring was to take as much of the old test code
  as possible (now `_test_load_cert_chain_core()`) and automatically test it
  with three different input data types:

     * file paths, as before (now `test_load_cert_chain_core_filepaths()`)
     * ByteIO buffers (`test_load_cert_chain_core_byte_buffers()`)
     * StringIO buffers (`test_load_cert_chain_core_text_buffers()`)

* Also, there are new test methods specifically targeting those behavioral
  details that the current load_cert_chain() implementation and the new
  implementation do _not_ have in common. This mainly accounts for new
  (better, IMO) error messages created by the OpenSSL API calls in the
  BIO-based implementation.

* In just _one_ of the more complex tests involving actual client/server
  communication, I have replaced `SIGNED_CERTFILE` (a path) with
  `bytebuf(SIGNED_CERTFILE)`.


Now I'd appreciate your review!

Jan-Philip
msg275288 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-09-09 08:17
Thanks for your patch. I have left some comments (sorry for the delay).
msg277173 - (view) Author: Jan-Philip Gehrcke (jgehrcke) * Date: 2016-09-21 16:58
Thanks Christian, much appreciated. Just responded to your review.
msg281457 - (view) Author: Jan-Philip Gehrcke (jgehrcke) * Date: 2016-11-22 09:42
Christian, Senthil, would appreciate if I got another round of feedback (in the review thread) :-)
msg293908 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2017-05-18 02:30
Hi Jan-Philip, we might be close on this one. 
Could you convert your latest patch into PR against (https://github.com/python/cpython). Is will help to record you as the author as we can discuss the patch in python sprints and get this in. :-) Thanks!
msg297068 - (view) Author: Jan-Philip Gehrcke (jgehrcke) * Date: 2017-06-27 18:59
Hey Senthil and Christian!

> Could you convert your latest patch into PR against https://github.com/python/cpython

That was fun. There we go: https://github.com/python/cpython/pull/2449

I hope I was not too late with that for the 3.7 development.
History
Date User Action Args
2017-07-14 05:31:36njssetnosy: + njs
2017-06-27 18:59:34jgehrckesetmessages: + msg297068
2017-06-27 18:52:54jgehrckesetpull_requests: + pull_request2506
2017-05-18 02:30:52orsenthilsetversions: - Python 3.6
2017-05-18 02:30:41orsenthilsetmessages: + msg293908
2016-11-22 09:42:49jgehrckesetnosy: + orsenthil
messages: + msg281457
2016-09-21 16:58:48jgehrckesetmessages: + msg277173
2016-09-15 07:49:25christian.heimessetassignee: christian.heimes
components: + SSL
2016-09-09 08:17:38christian.heimessetmessages: + msg275288
versions: + Python 3.7, - Python 3.4
2016-08-13 12:58:31jgehrckesetfiles: + python_16487_jgehrcke_01.patch
nosy: + jgehrcke
messages: + msg272585

2016-06-12 11:23:08christian.heimessetassignee: christian.heimes -> (no value)
2015-07-10 10:41:22martiussetmessages: + msg246555
2015-07-09 13:48:13christian.heimessetmessages: + msg246496
2015-07-09 13:43:21hayposetmessages: + msg246495
2015-07-09 13:42:36pitrousetmessages: + msg246494
2015-07-09 13:40:22martiussetmessages: + msg246492
2015-07-09 13:14:12pitrousetmessages: + msg246489
2015-07-09 13:01:53martiussetnosy: + martius

messages: + msg246488
versions: + Python 3.6
2013-11-20 16:47:45christian.heimessetassignee: christian.heimes
2013-11-14 13:32:53hayposetnosy: + haypo
2013-11-14 12:11:55kristjan.jonssonsetfiles: - ssl2.patch
2013-11-14 12:11:37kristjan.jonssonsetfiles: + ssl2.patch

messages: + msg202833
2013-11-14 12:04:29christian.heimessetmessages: + msg202832
2013-11-14 11:48:39kristjan.jonssonsetmessages: + msg202829
2013-11-14 11:47:52kristjan.jonssonsetfiles: + ssl2.patch

messages: + msg202828
2013-11-13 23:32:50christian.heimessetmessages: + msg202800
2013-09-08 03:53:43asvetlovlinkissue18915 superseder
2013-07-31 21:53:36dstufftsetnosy: + dstufft
2013-06-27 23:15:29kristjan.jonssonsetfiles: + ssl2.patch

messages: + msg191970
2013-06-27 17:17:14kristjan.jonssonsetmessages: + msg191957
2013-06-27 10:53:38christian.heimessetmessages: + msg191942
2013-06-27 10:43:46kristjan.jonssonsetmessages: + msg191941
2013-06-27 10:00:31pitrousetmessages: + msg191938
2013-06-27 09:54:22christian.heimessetmessages: + msg191937
2013-06-27 09:38:29kristjan.jonssonsetmessages: + msg191936
2013-06-27 00:59:19christian.heimessetmessages: + msg191924
2013-06-27 00:48:02christian.heimessetmessages: + msg191923
2013-06-26 22:50:03kristjan.jonssonsetfiles: + ssl.patch

messages: + msg191922
2013-06-25 16:51:30pitrousetmessages: + msg191868
2013-06-25 14:52:29christian.heimessetmessages: + msg191862
2013-06-25 14:39:19pitrousetmessages: + msg191860
2013-06-25 14:00:58christian.heimessetnosy: + christian.heimes
2013-06-25 13:59:45kristjan.jonssonsetmessages: + msg191859
2013-06-25 11:04:08pitrousetmessages: + msg191847
stage: patch review -> needs patch
2013-06-25 08:34:42kristjan.jonssonsetmessages: + msg191838
2013-06-25 01:05:37brandon-rhodessetmessages: + msg191828
2013-06-12 09:25:11kristjan.jonssonsetmessages: + msg191018
2013-06-11 23:22:14brandon-rhodessetnosy: + brandon-rhodes
messages: + msg190997
2013-01-27 02:55:27kristjan.jonssonsetfiles: + sslpatch2.patch

messages: + msg180737
2013-01-27 02:49:41kristjan.jonssonsetmessages: + msg180736
2013-01-27 02:06:51jceasetnosy: + jcea

messages: + msg180734
title: Allow ssl certificates to be speficfied from memory rather than files. -> Allow ssl certificates to be specified from memory rather than files.
2012-11-18 20:59:27giampaolo.rodolasetnosy: + giampaolo.rodola
2012-11-17 21:12:52pitrousetmessages: + msg175809
2012-11-17 20:24:40kristjan.jonssonsetmessages: + msg175806
2012-11-17 17:12:29pitrousetmessages: + msg175779
2012-11-17 14:35:06pitrousetmessages: + msg175741
2012-11-17 14:19:53asvetlovsetnosy: + asvetlov
2012-11-16 19:41:54kristjan.jonssonsetmessages: + msg175709
2012-11-16 18:30:18pitrousetnosy: + pitrou
messages: + msg175701

components: + Library (Lib), - Extension Modules
stage: patch review
2012-11-16 15:10:39kristjan.jonssonsetmessages: + msg175692
2012-11-16 15:10:15kristjan.jonssoncreate