classification
Title: encode and decode should accept 'errors' as a keyword argument
Type: enhancement Stage: needs patch
Components: Interpreter Core Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: benjamin.peterson Nosy List: benjamin.peterson, jbradberry, lemburg, r.david.murray
Priority: low Keywords: easy, patch

Created on 2009-06-18 00:00 by r.david.murray, last changed 2009-09-18 21:15 by benjamin.peterson. This issue is now closed.

Files
File name Uploaded Description Edit
python27.patch jbradberry, 2009-09-17 01:03 patch against revision 74852
python27.patch jbradberry, 2009-09-17 06:04 updated patch against revision 74852
python27.patch jbradberry, 2009-09-18 19:46 patch against revision 74880
Messages (11)
msg89485 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-06-18 00:00
I repeatedly find myself typing things like
"mybytestring.decode('ASCII', errors='replace')".  This seems like the
natural (I'm tempted to say Pythonic) thing to do, and is more readable
(IMO) than "mybytestring.decode('ASCII', 'replace')".  (replace what?).
 However currently encode and decode complain that they do not take any
keyword arguments.
msg92737 - (view) Author: Jeff Bradberry (jbradberry) Date: 2009-09-17 01:03
This patch adds the requested behavior to the current 2.7 svn trunk. 
Both 'encoding' and 'errors' may be used as keyword arguments for
encode() and decode().
msg92738 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-09-17 01:59
They should also probably be added to unicode(), str(), unicode.decode,
and unicode.encode then. (Also some tests, please!)
msg92741 - (view) Author: Jeff Bradberry (jbradberry) Date: 2009-09-17 06:04
As it turns out, someone had previously made this adjustment to str()
and unicode().  My updated patch adds this behavior to unicode.decode
and unicode.encode, adds a couple of tests to test_unicode.py, and
updates the documentation to show that these functions (and str.format,
which had failed to be noted as taking them) now take keyword arguments.
msg92743 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-09-17 08:08
The patch looks fine, the idea is good as well.

I'm just a little worried about the performance impact this might have
(not much though).

Could you run a quick comparison of before applying the patch compared
to after the patch is applied, using positional arguments in both cases ?

Thanks.
msg92789 - (view) Author: Jeff Bradberry (jbradberry) Date: 2009-09-17 18:20
Before:

~/python2.7$ ./python -mtimeit "u'Andr\202 x'.encode('ascii', 'replace')"
1000000 loops, best of 3: 1.8 usec per loop

After:

~/python2.7-patched$ ./python -mtimeit "u'Andr\202 x'.encode('ascii',
'replace')"
1000000 loops, best of 3: 1.73 usec per loop


The difference in performance seems to be trivial, perhaps favoring the
patched version slightly.
msg92791 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-09-17 19:43
Perfect. Thanks for checking.

Benjamin, could you please check this in ? Thanks.
msg92795 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-09-17 20:48
I still have a few things I would like changed:

- Instead of listing which methods take keyword arguments at the top of
the section, I would prefer that each used the "versionchanged: 2.7"
directive and indicated the added ability to use keyword arguments.
- Your tests only cover str.decode and unicode.encode, not vice-versa.
- In your tests, there should be a space between the comma in the
arguments and the next argument.
msg92837 - (view) Author: Jeff Bradberry (jbradberry) Date: 2009-09-18 19:46
Ok, fixed.  I am kind of vague, though, on the usefulness of str.encode
and unicode.decode.
msg92839 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-09-18 19:59
Jeff Bradberry wrote:
> 
> Jeff Bradberry <jeff.bradberry@gmail.com> added the comment:
> 
> Ok, fixed.  I am kind of vague, though, on the usefulness of str.encode
> and unicode.decode.

codecs can work on any combination of types. Here's an example of
a codec that accepts str and unicode and returns str:

'313233'
>>> u'123'.encode('hex')
'313233'
>>> '313233'.decode('hex')
'123'
>>> u'313233'.decode('hex')
'123'

In 3.1 the method don't support this anymore, since they are more
strict w/r to type checking.
msg92846 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-09-18 21:15
Applied in r74929. Thanks!
History
Date User Action Args
2009-09-18 21:15:36benjamin.petersonsetstatus: open -> closed
resolution: fixed
messages: + msg92846
2009-09-18 19:59:21lemburgsetmessages: + msg92839
2009-09-18 19:46:41jbradberrysetfiles: + python27.patch

messages: + msg92837
2009-09-17 20:48:47benjamin.petersonsetmessages: + msg92795
2009-09-17 19:43:40lemburgsetassignee: benjamin.peterson
messages: + msg92791
2009-09-17 18:20:14jbradberrysetmessages: + msg92789
2009-09-17 08:08:30lemburgsetnosy: + lemburg
messages: + msg92743
2009-09-17 06:04:22jbradberrysetfiles: + python27.patch

messages: + msg92741
2009-09-17 01:59:56benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg92738
2009-09-17 01:03:55jbradberrysetfiles: + python27.patch

nosy: + jbradberry
messages: + msg92737

keywords: + patch
2009-06-18 09:08:25amaury.forgeotdarcsetkeywords: + easy
stage: needs patch
2009-06-18 00:00:13r.david.murraycreate