classification
Title: Create PyUnicode_EncodeFSDefault() function
Type: Stage:
Components: Interpreter Core, Unicode Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, benjamin.peterson, ezio.melotti, gregory.p.smith, lemburg, loewis, pitrou, vstinner
Priority: normal Keywords: patch

Created on 2010-05-14 16:53 by vstinner, last changed 2010-05-15 16:28 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
pyunicode_encodefsdefault-3.patch vstinner, 2010-05-14 16:56
Messages (6)
msg105721 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-14 16:53
PyUnicode_EncodeFSDefault() is the opposite of PyUnicode_DecodeFSDefault(AndSize)() and is similar to the new function os.fsencode(). As you can see in the patch, it simplifies many functions.

/* Encodes a Unicode object to Py_FileSystemDefaultEncoding with the
   "surrogateescape" error handler and returns a bytes object.

   If Py_FileSystemDefaultEncoding is not set, fall back to UTF-8.
*/

PyAPI_FUNC(PyObject*) PyUnicode_EncodeFSDefault(
    PyObject *unicode
    );

The function unify the behaviour when Py_FileSystemDefaultEncoding is NULL: use UTF-8 whereas import uses ASCII. Other functions did already fall back to UTF-8: PyUnicode_AsEncodedString() uses PyUnicode_GetDefaultEncoding() (hardcoded to utf8 in Python3) if encoding is NULL

The patch does also fix tkinter module initializer (use surrogateescape error handler, instead of strict).

The patch was first attached to issue #8611.
msg105722 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-14 16:56
Ooops, I attached the wrong version of the patch. Version 3 changes the documentation (Encodes => Encode).
msg105779 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-15 00:10
Notes for myself:
 - "Encodes" and "fallback" in .h documentation  => "Encode", "fall back"
 - bootstrap failure on Windows: import did use default error handler, it uses surrogateescape error handler, but PyUnicode_EncodeString() doesn't have codec "fast-path" for MBCS+surrogateescape.
msg105809 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-15 13:17
> bootstrap failure on Windows: import did use default error handler, 
> it uses surrogateescape error handler, but PyUnicode_EncodeString()
> doesn't have codec "fast-path" for MBCS+surrogateescape.

I enabled "shortcuts" in PyUnicode_EncodeString() for any error handler (not only the default error handler, strict) in r81192.
msg105810 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-15 13:23
PyUnicode_AsEncodedString() contains a special path for the file system encoding. I don't think that it is still needed, but I don't know how to check that.

    /* During bootstrap, we may need to find the encodings
       package, to load the file system encoding, and require the
       file system encoding in order to load the encodings
       package.

       Break out of this dependency by assuming that the path to
       the encodings module is ASCII-only.  XXX could try wcstombs
       instead, if the file system encoding is the locale's
       encoding. */
    else if (Py_FileSystemDefaultEncoding &&
             strcmp(encoding, Py_FileSystemDefaultEncoding) == 0 &&
             !PyThreadState_GET()->interp->codecs_initialized)
        return PyUnicode_EncodeASCII(PyUnicode_AS_UNICODE(unicode),
                                     PyUnicode_GET_SIZE(unicode),
                                     errors);
msg105819 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-15 16:28
Commited as r81194 (py3k), blocked in 3.1 (r81195).
History
Date User Action Args
2010-05-15 16:28:41vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg105819
2010-05-15 13:23:17vstinnersetmessages: + msg105810
2010-05-15 13:17:10vstinnersetmessages: + msg105809
2010-05-15 12:40:20vstinnerlinkissue8725 dependencies
2010-05-15 00:10:03vstinnersetmessages: + msg105779
2010-05-14 16:56:39vstinnersetfiles: - pyunicode_encodefsdefault-2.patch
2010-05-14 16:56:33vstinnersetfiles: + pyunicode_encodefsdefault-3.patch

messages: + msg105722
2010-05-14 16:53:52vstinnercreate