classification
Title: SSLContext.load_verify_locations cannot handle paths on Windows which cannot be encoded using mbcs
Type: behavior Stage: resolved
Components: Extension Modules, SSL, Unicode, Windows Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: christian.heimes Nosy List: Ilya.Kulakov, christian.heimes, ezio.melotti, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2016-06-20 02:32 by Ilya.Kulakov, last changed 2017-09-08 18:57 by vstinner. This issue is now closed.

Messages (17)
msg268880 - (view) Author: Ilya Kulakov (Ilya.Kulakov) * Date: 2016-06-20 02:32
On Windows 8.1 x64 with Python 3.5.1 I was able to reproduce the issue by attempting to load a file at "C:\Users\غازي\AppData\Local\Temp\_غازي_70e5wbxo\cacert.pem".

    locale.getdefaultlocale()
    > ('en_US', 'cp1252')

    locale.getpreferredencoding()
    > 'cp1252'

    sys.getfilesystemencoding()
    > 'mbcs'
    
    sys.getdefaultencoding()
    > 'utf-8'

    c = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)

    c.load_verify_locations(cafile=r"C:\Users\غازي\AppData\Local\Temp\_غازي_70e5wbxo\cacert.pem")
    > TypeError: cafile should be a valid filesystem path

    c.load_verify_locations(cafile=r"C:\Users\غازي\AppData\Local\Temp\_غازي_70e5wbxo\cacert.pem".encode(sys.getfilesystemencoding()))
    > UnicodeEncodeError: 'mbcs' codec can't encode characters in positions 0--1: invalid character

    c.load_verify_locations(cafile=r"C:\Users\غازي\AppData\Local\Temp\_غازي_70e5wbxo\cacert.pem".encode('utf-8'))
    > ok
msg268912 - (view) Author: Ilya Kulakov (Ilya.Kulakov) * Date: 2016-06-20 17:15
I believe this is a bug, because path suitable for os.path (or pathlib), should be equally suitable for load_verify_locations.
msg268922 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-20 19:58
The Python ssl module is a wrapper to the OpenSSL library. For this issue, we are talking about the function SSL_CTX_load_verify_locations():
https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_load_verify_locations.html

OpenSSL expects a byte string for CAfile and CApath. On Windows, it means a string encoded to the ANSI code page.

OpenSSL doesn't seem to support paths not encodable to the ANSI code page on Windows. I suggest you to report the issue to the OpenSSL bug tracker:
https://www.openssl.org/community/#bugs

A workaround is to avoid characters not encodable to the ANSI code page, maybe by using symbolic links?
msg268924 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-20 20:01
"OpenSSL doesn't seem to support paths not encodable to the ANSI code page on Windows. I suggest you to report the issue to the OpenSSL bug tracker: (...)"

For example, it wass already requested 6 years ago on the openssl-users@openssl.org mailing list:
http://comments.gmane.org/gmane.comp.encryption.openssl.user/38104

--

Oh, another link is more useful:
https://stackoverflow.com/questions/2401059/openssl-with-unicode-paths

"you will have to manually load the certificate files yourself using standard OS file I/O functions that support Unicode paths, and then parse the raw data and load it into OpenSSL, such as via PEM_read_bio_X509 with sk_X509_NAME_push, PEM_read_bio_PrivateKey/d2i_PrivateKey_bio with SSL_CTX_use_PrivateKey, d2i_X509_bio/PEM_read_bio_X509 with SSL_CTX_use_certificate, etc."
msg268945 - (view) Author: Ilya Kulakov (Ilya.Kulakov) * Date: 2016-06-20 22:31
Viktor, I also came across this thread but it is rather old (we're using OpenSSL 1.0.2h). And it would only explain if _neither_ of my methods had worked.
But as you can see, the last one (passing UTF-8 encoded bytes) works.
msg268946 - (view) Author: Ilya Kulakov (Ilya.Kulakov) * Date: 2016-06-20 22:39
I checked the source code of OpenSSL, specifically the `bss_file.c:file_fopen` function (https://github.com/openssl/openssl/blob/OpenSSL_1_0_2h/crypto/bio/bss_file.c#L118-L167).

As you can see it support UTF-8 encoded strings under Windows. Python must follow up.
msg276528 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-09-15 08:08
Is this still an issue?
msg276534 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-15 08:49
> Is this still an issue?

I'm not 100% sure that SSL_CTX_load_verify_locations() accepts a path encoded to UTF-8 on Windows.


To be 100% sure, it's "simple": try a filename not encoded to the ANSI code page on Windows.

The best would be to have an unit test for that. You use use test.support.TESTFN_UNENCODABLE for example.
msg276552 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-15 13:33
TESTFN_UNENCODABLE will be invalid utf8 now, as the name is chosen by attempting to encode a list of names and using the first one to fail.

No code pages have emoji in them AFAIK, so a test with one of those would do.

ISTR looking at this function though and finding that OpenSSL will decide utf8 if that's what is passed, in which case we're fine now. Or maybe I'm thinking of Tcl...
msg276590 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-15 19:01
Steve Dower added the comment:
> TESTFN_UNENCODABLE will be invalid utf8 now, as the name is chosen by attempting to encode a list of names and using the first one to fail.

Oh, maybe we should reject filenames not encodable to utf8? Filenames
containg surrogate characters.
msg276849 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-17 21:09
I haven't tracked it down in 1.1, but in 1.0.2 OpenSSL handles ASCII, UTF-8 and mbcs/ANSI paths explicitly: https://github.com/openssl/openssl/blob/OpenSSL_1_0_2-stable/crypto/bio/bss_file.c#L138

So for 3.6 and later, if we're encoding the paths with fsencode(), it'll be fine, but we could also use utf-8 unconditionally.

Doing a search of the codebase though, there's only the one place that does this and everywhere else just uses fopen() without attempting to decode. I don't think we're exposing many of those publicly though.
msg301670 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-09-08 02:41
3.6 uses PyUnicode_FSConverter() to convert the cafile and capath arguments, then passes PyBytes_AS_STRING() to OpenSSL. What needs to change to support non-ASCII chars on Windows?
msg301683 - (view) Author: Ilya Kulakov (Ilya.Kulakov) * Date: 2017-09-08 06:48
Christian,

If you have windows under your hand and can try an alike path, you should see the problem right away if it's still there.

I think the original problem was unnecessary PyUnicode_FSConverter: it failed to encode string into mbcs, while OpenSSL did not have any problems understanding a UTF-8 on Windows.
msg301708 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-09-08 17:38
Ilya,

The exception "TypeError: cafile should be a valid filesystem path" is raised by Python, not by OpenSSL. On Python 3.5, PyUnicode_FSConverter() uses MBCS, which is CP-1552 on your system. CP-1552 cannot convert Arabic character set and fails to encode your user name to bytes. This issue has been addressed by PEP 529 in Python 3.6. Starting with Python 3.6, the file system encoding is UTF-8, see https://www.python.org/dev/peps/pep-0529/

3.5 is in security fix-only mode. You either have to keep encoding the path as UTF-8 explicitly or update to Python 3.6. I'm closing this issue was WONTFIX 3.5 and FIXED >= 3.6.
msg301714 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-09-08 18:53
> I'm closing this issue was WONTFIX 3.5

If I understood correctly, it's possible to work around the issue by encoding the filename manually to utf-8.

ssl_function(filename.encode('utf-8'))
msg301715 - (view) Author: Ilya Kulakov (Ilya.Kulakov) * Date: 2017-09-08 18:55
> On Python 3.5, PyUnicode_FSConverter() uses MBCS, which is CP-1552 on your system.

Will the behavior of Python 3.6 be different? Could you point me to relevant notes or code?

> If I understood correctly, it's possible to work around the issue by encoding the filename manually to utf-8.

That's correct. I'm not sure that this is true for all versions though.
msg301716 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-09-08 18:57
>> On Python 3.5, PyUnicode_FSConverter() uses MBCS, which is CP-1552 on your system.
>
> Will the behavior of Python 3.6 be different? Could you point me to relevant notes or code?

See the PEP 529.
History
Date User Action Args
2017-09-08 18:57:01vstinnersetmessages: + msg301716
2017-09-08 18:55:10Ilya.Kulakovsetmessages: + msg301715
2017-09-08 18:53:30vstinnersetmessages: + msg301714
2017-09-08 17:38:20christian.heimessetstatus: open -> closed
stage: needs patch -> resolved
2017-09-08 17:38:13christian.heimessetassignee: steve.dower -> christian.heimes
resolution: fixed
messages: + msg301708
2017-09-08 06:48:24Ilya.Kulakovsetmessages: + msg301683
2017-09-08 02:41:50christian.heimessetassignee: christian.heimes -> steve.dower
messages: + msg301670
versions: - Python 3.5
2016-09-17 21:09:56steve.dowersetmessages: + msg276849
2016-09-15 19:01:54vstinnersetmessages: + msg276590
2016-09-15 13:33:16steve.dowersetmessages: + msg276552
2016-09-15 08:49:31vstinnersetmessages: + msg276534
2016-09-15 08:08:26christian.heimessetassignee: christian.heimes
type: behavior
components: + SSL
versions: + Python 3.6, Python 3.7
nosy: + christian.heimes

messages: + msg276528
stage: needs patch
2016-06-20 22:39:25Ilya.Kulakovsetmessages: + msg268946
2016-06-20 22:31:09Ilya.Kulakovsetmessages: + msg268945
2016-06-20 20:01:41vstinnersetmessages: + msg268924
2016-06-20 19:58:05vstinnersetmessages: + msg268922
2016-06-20 17:15:46Ilya.Kulakovsetmessages: + msg268912
2016-06-20 02:32:09Ilya.Kulakovcreate