Message272585
Hello!
Like everybody in this thread I would love to see this land and have prepared a new patch, hoping that we can process this still for 3.6.
Antoine summarized the core task here very well:
> Let's stay focused on what is
> necessary to solve this use case, IMO (i.e. specify non-file-backed data
> as a certification chain or private key).
What follows is what I believe is the consensus in this thread about a minimum
viable solution (points mentioned earlier, enriched with some detail):
- The `certfile` and `keyfile` arguments to `SSLContext.load_cert_chain()` can
both be either a file path or a file object.
- `str` or `bytes` values to both, `certfile` and `keyfile`, are interpreted
as paths.
- The `password` behavior must stay as is, independent of `certfile`/`keyfile`
being file object or path.
- When handing file paths to `load_cert_chain()`, the behavior must stay as is,
in all detail.
- When handing file objects to `load_cert_chain()`,
- use the OpenSSL API call `PEM_read_bio_PrivateKey()` for loading the key
data (reading PEM data is what's currently supported).
- support the case where the `certfile` file object first presents a private
key and then (a) certificate(s), as currently documented:
"""
Often the private key is stored in the same file as the certificate; in
this case, only the certfile parameter to SSLContext.load_cert_chain() and
wrap_socket() needs to be passed. If the private key is stored with the
certificate, it should come before the first certificate in the
certificate chain:
"""
Ref:
- https://docs.python.org/3/library/ssl.html#combined-key-and-certificate
- resemble the OpenSSL API call `SSL_CTX_use_certificate_chain_file()`
behavior for loading the certificate data (as this is what's currently
used in the `SSLContext.load_cert_chain()` implementation in _ssl.c. There
is no direct correspondence for this function reading the data directly
from memory BIOs. Relevant documentation for resembling that behavior:
"""
SSL_CTX_use_certificate_chain_file() adds the first certificate found in
the file to the certificate store. The other certificates are added to the
store of chain certificates using SSL_CTX_add1_chain_cert.
"""
"""
For a longer chain, the client must send the complete chain [...]. This
can only be accomplished by either adding the intermediate CA certificates
into the trusted certificate store for the SSL_CTX object [...], or by
adding the chain certificates using the SSL_CTX_add_extra_chain_cert
function,
"""
"""
When building its own certificate chain, an OpenSSL client/server will try
to fill in missing certificates from CAfile/CApath, if the certificate
chain was not explicitly specified (see SSL_CTX_add_extra_chain_cert,
SSL_CTX_use_certificate.
"""
Refs:
- https://github.com/python/cpython/blob/0852878a81edd5c16776a68ce34c45cca233deae/Modules/_ssl.c#L2824
- https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_use_certificate.html
- https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_set_client_cert_cb.html
- https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_load_verify_locations.html
When inspecting the OpenSSL code for SSL_CTX_use_certificate_chain_file(),
some of the above statements are not entirely correct.
After inspection, to me the safest bet seems to keep the memory/BIO-based
load_cert_chain() implementation as close as possible to
SSL_CTX_use_certificate_chain_file() from OpenSSL 1.0.2h:
https://fossies.org/dox/openssl-1.0.2h/ssl__rsa_8c_source.html#l00685
use_certificate_chain_file() in OpenSSL 1.1.0-pre6:
https://fossies.org/dox/openssl-1.1.0-pre6/ssl__rsa_8c_source.html#l00599
Using this as a template (instead of coming up with a solution purely
based on OpenSSL docs) ensures that this part of the current
`SSLContext.load_cert_chain()` docs will be remain correct:
"""
The certfile [...] in PEM format containing the certificate as well as any
number of CA certificates needed to establish the certificate’s
authenticity.
"""
More behavioral detail that I planned with:
- `certfile` and `keyfile`, if given both, must both be a file path or both be
a file object. A mixture is undefined behavior and leads to an error
depending on the specifics of the mixture.
- If `certfile` and `keyfile` are not file paths, expect them to have a read()
method and just call it. Let a TypeError or AttributeError be handled by the
caller (as done elsewhere in the stdlib).
- Use the existing Python MemoryBIO abstractions introduced by Geert Jansen in this
beautiful patch: https://github.com/python/cpython/commit/08802a11fdb844a5aaec3a1dc67d8281d582d22b
That is, prepare these BIOs in Python code and pass them to an underlying C
function that consumes BIOs.
A summary of a few things that are in my opinion beyond a minimal
implementation (but probably useful in the future!):
- Private key deserialization using d2i_PKCS8PrivateKey_bio(). The current
implementation only supports SSL_FILETYPE_PEM and some of the semantics of
the load_cert_chain() method is specifically adjusted to PEM features.
- Certificate deserialization supporting formats other than PEM (the current
implementation uses SSL_CTX_use_certificate_chain_file() which only supports
PEM).
- New types for certificates and keys.
Supporting additional serialization formats greatly increases the complexity
of the task at hand here. We can save that for later.
An important note on backwards compatibility:
>> Antoine, are you suggesting that we remove the current c-level
>> capability to use file system files (using open()) and just go with
>> raw bytes data at the C api level, and then do the 'filename or
>> filelike object' in Python land?
>
> Yes, I think that's reasonable. Hopefully it won't break any existing uses.
I absolutely agree that Python code should be responsible for doing the
distinction between file paths and file objects. However, I do not agree with
> remove the current c-level capability to use file system files
After following the code I am rather sure that we should leave the current C
implementation (`_ssl__SSLContext_load_cert_chain_impl()` in Modules/_ssl.c)
as untouched as possible, and not try to integrate the new logic into that
very same function.
Instead, I figured, we should use Python code to delegate to one of two
different C functions after the distinction between file path and file
object has been made, whereas one of the two C functions is the one
we used so far.
The central argument for having two separate C code paths is that old code
must proceed using OpenSSL's SSL_CTX_use_certificate_chain_file().
Hence, I opted for implementing SSLContext.load_cert_chain() in Python,
delegating the actual work
* to either the current load_cert_chain C implementation if file paths were
given or
* to a new C implementation resembling SSL_CTX_use_certificate_chain_file(),
after read()ing the file object(s) and creating the corresponding
MemoryBIO object(s) containing the cert/key data.
Comments on tests:
* The diff seems radical due to restructuring. I have manually confirmed that
the old tests pass against the new code.
* Major goal of the test refactoring was to take as much of the old test code
as possible (now `_test_load_cert_chain_core()`) and automatically test it
with three different input data types:
* file paths, as before (now `test_load_cert_chain_core_filepaths()`)
* ByteIO buffers (`test_load_cert_chain_core_byte_buffers()`)
* StringIO buffers (`test_load_cert_chain_core_text_buffers()`)
* Also, there are new test methods specifically targeting those behavioral
details that the current load_cert_chain() implementation and the new
implementation do _not_ have in common. This mainly accounts for new
(better, IMO) error messages created by the OpenSSL API calls in the
BIO-based implementation.
* In just _one_ of the more complex tests involving actual client/server
communication, I have replaced `SIGNED_CERTFILE` (a path) with
`bytebuf(SIGNED_CERTFILE)`.
Now I'd appreciate your review!
Jan-Philip |
|
Date |
User |
Action |
Args |
2016-08-13 12:58:33 | jgehrcke | set | recipients:
+ jgehrcke, jcea, pitrou, kristjan.jonsson, vstinner, giampaolo.rodola, christian.heimes, asvetlov, brandon-rhodes, dstufft, martius |
2016-08-13 12:58:31 | jgehrcke | set | messageid: <1471093111.6.0.32270802929.issue16487@psf.upfronthosting.co.za> |
2016-08-13 12:58:31 | jgehrcke | link | issue16487 messages |
2016-08-13 12:58:28 | jgehrcke | create | |
|