Author jgehrcke
Recipients asvetlov, brandon-rhodes, christian.heimes, dstufft, giampaolo.rodola, haypo, jcea, jgehrcke, kristjan.jonsson, martius, pitrou
Date 2016-08-13.12:58:25
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1471093111.6.0.32270802929.issue16487@psf.upfronthosting.co.za>
In-reply-to
Content
Hello!

Like everybody in this thread I would love to see this land and have prepared a new patch, hoping that we can process this still for 3.6.

Antoine summarized the core task here very well:

> Let's stay focused on what is
> necessary to solve this use case, IMO (i.e. specify non-file-backed data
> as a certification chain or private key).

What follows is what I believe is the consensus in this thread about a minimum
viable solution (points mentioned earlier, enriched with some detail):

- The `certfile` and `keyfile` arguments to `SSLContext.load_cert_chain()` can
  both be either a file path or a file object.

- `str` or `bytes` values to both, `certfile` and `keyfile`, are interpreted
  as paths.

- The `password` behavior must stay as is, independent of `certfile`/`keyfile`
  being file object or path.

- When handing file paths to `load_cert_chain()`, the behavior must stay as is,
  in all detail.

- When handing file objects to `load_cert_chain()`,

  - use the OpenSSL API call `PEM_read_bio_PrivateKey()` for loading the key
    data (reading PEM data is what's currently supported).

  - support the case where the `certfile` file object first presents a private
    key and then (a) certificate(s), as currently documented:

    """
    Often the private key is stored in the same file as the certificate; in
    this case, only the certfile parameter to SSLContext.load_cert_chain() and
    wrap_socket() needs to be passed. If the private key is stored with the
    certificate, it should come before the first certificate in the
    certificate chain:
    """

    Ref:
      - https://docs.python.org/3/library/ssl.html#combined-key-and-certificate

  - resemble the OpenSSL API call `SSL_CTX_use_certificate_chain_file()`
    behavior for loading the certificate data (as this is what's currently
    used in the `SSLContext.load_cert_chain()` implementation in _ssl.c. There
    is no direct correspondence for this function reading the data directly
    from memory BIOs. Relevant documentation for resembling that behavior:

    """
    SSL_CTX_use_certificate_chain_file() adds the first certificate found in
    the file to the certificate store. The other certificates are added to the
    store of chain certificates using SSL_CTX_add1_chain_cert.
    """

    """
    For a longer chain, the client must send the complete chain [...]. This
    can only be accomplished by either adding the intermediate CA certificates
    into the trusted certificate store for the SSL_CTX object [...], or by
    adding the chain certificates using the SSL_CTX_add_extra_chain_cert
    function,
    """

    """
    When building its own certificate chain, an OpenSSL client/server will try
    to fill in missing certificates from CAfile/CApath, if the certificate
    chain was not explicitly specified (see SSL_CTX_add_extra_chain_cert,
    SSL_CTX_use_certificate.
    """

    Refs:
      - https://github.com/python/cpython/blob/0852878a81edd5c16776a68ce34c45cca233deae/Modules/_ssl.c#L2824
      - https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_use_certificate.html
      - https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_set_client_cert_cb.html
      - https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_load_verify_locations.html

    When inspecting the OpenSSL code for SSL_CTX_use_certificate_chain_file(),
    some of the above statements are not entirely correct.

    After inspection, to me the safest bet seems to keep the memory/BIO-based
    load_cert_chain() implementation as close as possible to

    SSL_CTX_use_certificate_chain_file() from OpenSSL 1.0.2h:

      https://fossies.org/dox/openssl-1.0.2h/ssl__rsa_8c_source.html#l00685

    use_certificate_chain_file() in OpenSSL 1.1.0-pre6:

      https://fossies.org/dox/openssl-1.1.0-pre6/ssl__rsa_8c_source.html#l00599

    Using this as a template (instead of coming up with a solution purely
    based on OpenSSL docs) ensures that this part of the current
    `SSLContext.load_cert_chain()` docs will be remain correct:

    """
    The certfile [...] in PEM format containing the certificate as well as any
    number of CA certificates needed to establish the certificate’s
    authenticity.
    """


More behavioral detail that I planned with:

- `certfile` and `keyfile`, if given both, must both be a file path or both be
  a file object. A mixture is undefined behavior and leads to an error
  depending on the specifics of the mixture.

- If `certfile` and `keyfile` are not file paths, expect them to have a read()
  method and just call it. Let a TypeError or AttributeError be handled by the
  caller (as done elsewhere in the stdlib).

- Use the existing Python MemoryBIO abstractions introduced by Geert Jansen in this
  beautiful patch: https://github.com/python/cpython/commit/08802a11fdb844a5aaec3a1dc67d8281d582d22b

  That is, prepare these BIOs in Python code and pass them to an underlying C
  function that consumes BIOs.


A summary of a few things that are in my opinion beyond a minimal
implementation (but probably useful in the future!):

- Private key deserialization using d2i_PKCS8PrivateKey_bio(). The current
  implementation only supports SSL_FILETYPE_PEM and some of the semantics of
  the load_cert_chain() method is specifically adjusted to PEM features.

- Certificate deserialization supporting formats other than PEM (the current
  implementation uses SSL_CTX_use_certificate_chain_file() which only supports
  PEM).

- New types for certificates and keys.

Supporting additional serialization formats greatly increases the complexity
of the task at hand here. We can save that for later.


An important note on backwards compatibility:

>> Antoine, are you suggesting that we remove the current c-level
>> capability to use file system files (using open()) and just go with
>> raw bytes data at the C api level, and then do the 'filename or
>> filelike object' in Python land?
>
> Yes, I think that's reasonable. Hopefully it won't break any existing uses.

I absolutely agree that Python code should be responsible for doing the
distinction between file paths and file objects. However, I do not agree with

> remove the current c-level capability to use file system files

After following the code I am rather sure that we should leave the current C
implementation (`_ssl__SSLContext_load_cert_chain_impl()` in Modules/_ssl.c)
as untouched as possible, and not try to integrate the new logic into that
very same function.

Instead, I figured, we should use Python code to delegate to one of two
different C functions after the distinction between file path and file
object has been made, whereas one of the two C functions is the one 
we used so far.

The central argument for having two separate C code paths is that old code
must proceed using OpenSSL's SSL_CTX_use_certificate_chain_file().

Hence, I opted for implementing SSLContext.load_cert_chain() in Python,
delegating the actual work

* to either the current load_cert_chain C implementation if file paths were
  given or

* to a new C implementation resembling SSL_CTX_use_certificate_chain_file(),
  after read()ing the file object(s) and creating the corresponding
  MemoryBIO object(s) containing the cert/key data.


Comments on tests:

* The diff seems radical due to restructuring. I have manually confirmed that
  the old tests pass against the new code.

* Major goal of the test refactoring was to take as much of the old test code
  as possible (now `_test_load_cert_chain_core()`) and automatically test it
  with three different input data types:

     * file paths, as before (now `test_load_cert_chain_core_filepaths()`)
     * ByteIO buffers (`test_load_cert_chain_core_byte_buffers()`)
     * StringIO buffers (`test_load_cert_chain_core_text_buffers()`)

* Also, there are new test methods specifically targeting those behavioral
  details that the current load_cert_chain() implementation and the new
  implementation do _not_ have in common. This mainly accounts for new
  (better, IMO) error messages created by the OpenSSL API calls in the
  BIO-based implementation.

* In just _one_ of the more complex tests involving actual client/server
  communication, I have replaced `SIGNED_CERTFILE` (a path) with
  `bytebuf(SIGNED_CERTFILE)`.


Now I'd appreciate your review!

Jan-Philip
History
Date User Action Args
2016-08-13 12:58:33jgehrckesetrecipients: + jgehrcke, jcea, pitrou, kristjan.jonsson, haypo, giampaolo.rodola, christian.heimes, asvetlov, brandon-rhodes, dstufft, martius
2016-08-13 12:58:31jgehrckesetmessageid: <1471093111.6.0.32270802929.issue16487@psf.upfronthosting.co.za>
2016-08-13 12:58:31jgehrckelinkissue16487 messages
2016-08-13 12:58:28jgehrckecreate