classification
Title: base64.encodestring does not actually accept strings
Type: behavior Stage: test needed
Components: Documentation, Library (Lib) Versions: Python 3.0
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: ajaksu2, ddvoinikov, georg.brandl, gvanrossum, kawai, mgiuca, pitrou
Priority: high Keywords: patch

Created on 2008-08-20 06:00 by ddvoinikov, last changed 2009-06-04 09:13 by georg.brandl. This issue is now closed.

Files
File name Uploaded Description Edit
get_host_info.diff ajaksu2, 2009-02-08 20:02 Use unquote_to_bytes to feed base64.encodestring, then decode
encodestring_rename.patch mgiuca, 2009-04-24 02:59 base64.py with encodestring/decodestring renamed to encodebytes/decodebytes.
encodebytes_new_types.patch mgiuca, 2009-04-24 03:04 encodestring/decodestring with new input/output types.
Messages (12)
msg71513 - (view) Author: Dmitry Dvoinikov (ddvoinikov) Date: 2008-08-20 06:00
This quote from base64.py:

---
bytes_types = (bytes, bytearray)  # Types acceptable as binary data
...
def encodestring(s):
    """Encode a string into multiple lines of base-64 data.

    Argument and return value are bytes.
    """
    if not isinstance(s, bytes_types):
        raise TypeError("expected bytes, not %s" % s.__class__.__name__)
    ...
---

shows that encodestring method won't accept str for an argument, only
bytes. Perhaps this is by design, but then wouldn't it make sense to
change the name of the method ?

Anyway, this behavior clashes in (the least I know) xmlrpc.client, line
1168 when basic authentication is present:

---
auth = base64.encodestring(urllib.parse.unquote(auth))
---

because unquote() returns str, not bytes.
msg71531 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-08-20 10:00
The encodestring() function is refered to in the docs as "the legacy
interface". Perhaps it should be simply deprecated in 3.0?
msg71532 - (view) Author: Matt Giuca (mgiuca) Date: 2008-08-20 10:03
Hi Dmitry,

RE the method behaviour: I think it probably is correct to NOT accept a
string. Given that it's base64 encoding it, it only makes sense to
encode bytes, not arbitrary Unicode characters which have no
well-defined binary representation.

RE the method name: I agree, it should be renamed to encodestring. I
argued a similar case for the array.tostring and fromstring methods
(which actually act on bytes in Python 3.0) - here:
http://bugs.python.org/issue3565. So far nobody replied on that issue; I
think it may be too late to rename them. Best we can do is document them.

RE xmlrpc.client:1168. We just checked in a patch to urllib which adds
an unquote_to_bytes function (see
http://docs.python.org/dev/3.0/library/urllib.parse.html#urllib.parse.unquote_to_bytes).
(Unquote itself still returns a string). It should be correct to just
change xmlrpc.client:1168 to call urllib.parse.unquote_to_bytes. (Though
I've not tested it).
msg71535 - (view) Author: Dmitry Dvoinikov (ddvoinikov) Date: 2008-08-20 10:42
> I think it probably is correct to NOT accept a string

I agree.

> it should be renamed to encodestring

Huh ? It is already called that :) IMO it should be renamed to
encodebytes or simply encode if the module is only (or most frequently)
used to encode bytes.

> Best we can do is document them.

Oh well.
msg71536 - (view) Author: Matt Giuca (mgiuca) Date: 2008-08-20 10:47
> > it should be renamed to encodestring
> Huh ? It is already called that :)

Um ... yes. I mean encodebytes :)

> > Best we can do is document them.
> Oh well.

But I don't know the rules. People are saying things like "no new
features after beta3" but I take it that
backwards-compatibility-breaking changes are included in this.

But maybe it's still OK for us to break code after the beta. Perhaps
someone involved in the release can comment on this issue (and hopefully
with a view to my array patch - http://bugs.python.org/issue3565 - as well).
msg71550 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-08-20 15:17
Did someone fix xmlrpc.client:1168 yet?

IMO it's okay to add encodebytes(), but let's leave encodestring()
around with a deprecation warning, since it's so late in the release cycle.
msg81413 - (view) Author: Daniel Diniz (ajaksu2) Date: 2009-02-08 20:02
Here's a trivial patch for xmlrpc.client:1168. The testcase below
doesn't seem to fit well in test_xmlrpc, should it just be hacked in?


import xmlrpc.client
transp = xmlrpc.client.Transport()
transp.get_host_info("user@host.tld")
msg81904 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2009-02-13 10:50
Applied the patch in r69575.
msg86307 - (view) Author: Daniel Diniz (ajaksu2) Date: 2009-04-22 17:18
We still need to solve the  encodebytes/encodestring stuff.
msg86390 - (view) Author: Matt Giuca (mgiuca) Date: 2009-04-24 02:59
I've attached a patch which renames encodestring to encodebytes (keeping
encodestring around as an alias). Updated test and documentation.

I also renamed decodestring to decodebytes, because it also refuses to
accept a string (only a bytes). I have an alternative suggestion, which
I'll post in a separate comment (in a minute).
msg86391 - (view) Author: Matt Giuca (mgiuca) Date: 2009-04-24 03:04
Now, base64.encodestring and decodestring seem a bit weird because the
Base64 encoded string is also required to be a bytes.

It seems to me that once something is Base64-encoded, it's considered to
be ASCII text, not just some byte string, and therefore it should be a
str, not a bytes. (For example, they end with a '\n'. That's something
which strings do, not bytes).

Hence, base64.encodestring (which should be "encodebytes") should take a
bytes and return a str. base64.decodestring should take a str (required
to be ASCII-only) and return a bytes.

I've attached an alternative patch, encodebytes_new_types.patch (which,
unlike my other patch, doesn't rename decodestring to decodebytes). This
patch:

- Renames encodestring to encodebytes.
- Changes the output of encodebytes to return an ASCII str*, not a bytes.
- Changes the input of decodestring to accept an ASCII str, not a bytes.

* An ASCII str is a Unicode string with only ASCII characters.

This isn't a proper patch (it breaks a lot of other code which I haven't
bothered to fix). I'm just submitting it as an idea, in case this is
something we want to do. Most likely not, due to the breakage. Also we
have the same problem for the non-legacy functions, b64encode and
b64decode, etc, so the problem is more widespread than just these two
functions.
msg88869 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2009-06-04 09:13
Applied a patch to rename (and keep old aliases) in r73204.
History
Date User Action Args
2009-06-04 09:13:14georg.brandlsetstatus: open -> closed
resolution: fixed
messages: + msg88869
2009-04-24 03:04:19mgiucasetfiles: + encodebytes_new_types.patch

messages: + msg86391
2009-04-24 02:59:03mgiucasetfiles: + encodestring_rename.patch

messages: + msg86390
2009-04-22 17:18:13ajaksu2setmessages: + msg86307
stage: test needed
2009-02-13 10:50:13georg.brandlsetmessages: + msg81904
2009-02-08 20:02:02ajaksu2setfiles: + get_host_info.diff
nosy: + ajaksu2
messages: + msg81413
keywords: + patch
2009-01-19 08:04:21kawaisetnosy: + kawai
2008-08-20 15:17:51gvanrossumsetnosy: + gvanrossum
messages: + msg71550
2008-08-20 10:47:44mgiucasetmessages: + msg71536
2008-08-20 10:42:24ddvoinikovsetmessages: + msg71535
2008-08-20 10:03:05mgiucasetnosy: + mgiuca
messages: + msg71532
2008-08-20 10:00:11pitrousetpriority: high
nosy: + georg.brandl, pitrou
messages: + msg71531
components: + Documentation
assignee: georg.brandl
2008-08-20 06:00:33ddvoinikovcreate