This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Speed up default encode()/decode()
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.3, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: akuchling, belopolsky, ezio.melotti, lemburg
Priority: normal Keywords: patch

Created on 2011-02-24 22:29 by belopolsky, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
default-encode.diff belopolsky, 2011-02-24 22:29 review
issue11313.diff ezio.melotti, 2011-02-25 00:11 Alexander's patch + tests review
Messages (10)
msg129318 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-02-24 22:29
In Python 3.x default encoding is always utf-8, but encode()/decode() still try to look it up.  Attached patch eliminates a call to normalize_encoding and several strcmp() calls.
msg129324 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-02-25 00:11
Patch looks good.
I checked the tests and couldn't fine any test for .encode()/.decode() without encoding, so I added them in the attached patch.
msg129325 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-02-25 00:25
Thanks for the review and the tests.  I have found one more place that can be easily optimized.  (See patch below.) The decode() methods in bytes and bytearray are not so easy unfortunately because for some reason they are written to accept any object as self, not only byte/bytearray.  As a result, it is not that easy to short-circuit default case in these instances.   With respect to the patch below, I'll make sure that there is a test for it.


===================================================================
--- Python/getargs.c	(revision 88545)
+++ Python/getargs.c	(working copy)
@@ -1010,8 +1010,6 @@
 
         /* Get 'e' parameter: the encoding name */
         encoding = (const char *)va_arg(*p_va, const char *);
-        if (encoding == NULL)
-            encoding = PyUnicode_GetDefaultEncoding();
 
         /* Get output buffer parameter:
            's' (recode all objects via Unicode) or
@@ -1051,9 +1049,12 @@
                     arg, msgbuf, bufsize);
 
             /* Encode object; use default error handling */
-            s = PyUnicode_AsEncodedString(u,
-                                          encoding,
-                                          NULL);
+            if (encoding == NULL)
+                s = PyUnicode_AsUTF8String(u);
+            else
+                s = PyUnicode_AsEncodedString(u,
+                                              encoding,
+                                              NULL);
             Py_DECREF(u);
             if (s == NULL)
                 return converterr("(encoding failed)",
msg129327 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-02-25 01:21
Committed issue11313.diff in revision 88553.

On the second thought, the getargs optimization is not worth the trouble because in existing sources 'e' code is used with constant encodings and one is unlikely to pass NULL as an encoding because that is equivalent to eliding the 'e' code altogether.

I will keep this issue open to consider whether remaining  

        if (encoding == NULL)
            encoding = PyUnicode_GetDefaultEncoding();

clauses can be simply removed because the lower level code accepts NULL in encoding.
msg129338 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011-02-25 08:36
Alexander Belopolsky wrote:
> 
> New submission from Alexander Belopolsky <belopolsky@users.sourceforge.net>:
> 
> In Python 3.x default encoding is always utf-8, but encode()/decode() still try to look it up.  Attached patch eliminates a call to normalize_encoding and several strcmp() calls.

+1.

Please add a comment explaining why this can be done.
msg133573 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-04-12 11:45
> Committed issue11313.diff in revision 88553.

The revision number seems to be wrong -- maybe the commit got lost during the mercurial migration.

All the changes of the patch seem to be there though, even if they went in with other commits:
  unicodeobjects.c: 7ab569b18c98 together with changes from #11303.
  test_bytes.py: be6c38d1817b together with the normalization of encoding names.
msg171351 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-09-26 17:32
Can this be closed?
msg174729 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2012-11-04 01:06
MAL suggested adding "a comment explaining why this can be done".  If Alexander or someone wants to do that, great!  Otherwise, there seems no other reason to leave this issue open.
msg174740 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2012-11-04 02:13
I don't think a comment explaining that default encoding is utf-8 in python 3.x will improve readability of this code.  

if (encoding == NULL)
        return PyUnicode_DecodeUTF8(s, size, errors);

seems quite self-explanatory.
msg175218 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-11-09 09:47
Agreed, closing.
History
Date User Action Args
2022-04-11 14:57:13adminsetgithub: 55522
2012-11-09 09:47:57ezio.melottisetstatus: open -> closed
versions: + Python 3.4
messages: + msg175218

resolution: fixed
stage: commit review -> resolved
2012-11-04 02:13:41belopolskysetmessages: + msg174740
2012-11-04 01:06:21akuchlingsetnosy: + akuchling
messages: + msg174729
2012-09-26 17:32:35ezio.melottisetmessages: + msg171351
2011-04-12 11:45:24ezio.melottisetmessages: + msg133573
2011-02-25 08:36:02lemburgsetnosy: + lemburg
messages: + msg129338
2011-02-25 01:21:19belopolskysetnosy: belopolsky, ezio.melotti
messages: + msg129327
2011-02-25 00:25:59belopolskysetnosy: belopolsky, ezio.melotti
messages: + msg129325
2011-02-25 00:11:48ezio.melottisetfiles: + issue11313.diff
nosy: belopolsky, ezio.melotti
messages: + msg129324

components: + Interpreter Core
stage: commit review
2011-02-24 22:30:30ezio.melottisetnosy: + ezio.melotti
2011-02-24 22:29:46belopolskycreate