classification
Title: array documentation, method names not 3.x-compliant
Type: enhancement Stage: resolved
Components: Documentation, Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: duplicate
Dependencies: Superseder: array constructor and array.fromstring should accept bytearray.
View: 8990
Assigned To: docs@python Nosy List: LambertDW, ajaksu2, benjamin.peterson, docs@python, eric.araujo, georg.brandl, loewis, mgiuca, pitrou, terry.reedy
Priority: normal Keywords: easy, patch

Created on 2008-08-16 09:14 by mgiuca, last changed 2011-07-13 03:57 by mgiuca. This issue is now closed.

Files
File name Uploaded Description Edit
doc-only-old.patch mgiuca, 2008-08-16 09:14 Superseded. review
doc+bytesmethods-old.patch mgiuca, 2008-08-16 10:30 Superseded. review
doc-only.patch mgiuca, 2009-04-24 03:36 Fixes array documentation; patch against r71822 review
doc+bytesmethods.patch mgiuca, 2009-04-24 03:36 Fixes array documentation, renames string methods to bytes methods; patch against r71822 review
Messages (20)
msg71201 - (view) Author: Matt Giuca (mgiuca) Date: 2008-08-16 09:14
A few weeks ago I fixed the struct module's documentation which wasn't
3.0 compliant (basically renaming "strings" to "bytes" and "unicode" to
"string"). Now I've had a look at the array module, and it's got similar
problems.

http://docs.python.org/dev/3.0/library/array.html

Unfortunately, the method names are wrong as far as Py3K is concerned.
"tostring" returns what is now called a "bytes", and "tounicode" returns
what is now called a "string".

There are a few other errors in the documentation too, like the 'c' type
code (which no longer exists, but is still documented), and examples
using Python 2 syntax. Those are trivial to fix.

I suggest a 3-step process for fixing this:
1. Update the documentation to describe the 3.0 behaviour using 3.0
terminology, even though the method names are wrong (I've done this
already).
2. Rename "tostring" and "fromstring" methods to "tobytes" and
"frombytes". I think this is quite important as the value being returned
can no longer be described as a "string".
3. Rename "tounicode" and "fromunicode" methods to "tostring" and
"fromstring". I think this is less important, as the name "unicode"
isn't ambiguous, and potentially undesirable, as we'd be re-using method
names which previously did something else.

I'm aware we've got the final beta in 4 days, and there's no way my
phase 2-3 can be done after that. I think we should aim to do phase 2,
but probably not phase 3.

I've fixed the documentation to accurately describe the current
behaviour, using Python 3 terminology. This doesn't change any behaviour
at all, so it should be able to be committed immediately.

I'll have a go at a "phase 2" patch shortly. Is it feasible to even
think about renaming a method at this stage?

Commit log:

Doc/library/array.rst, Modules/arrayobject.c:

Updated array module documentation to be Python 3.0 compliant.

* Removed references to 'c' type code (no longer valid).
* References to "string" changed to "bytes".
* References to "unicode" changed to "string".
* Updated examples to use Python 3.0 syntax (and show the output of
evaluating them).
msg71202 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-08-16 09:22
> 2. Rename "tostring" and "fromstring" methods to "tobytes" and
> "frombytes". I think this is quite important as the value being returned
> can no longer be described as a "string".

I'm not a native speaker (of English), but my understanding is that the
noun "string", in itself, can very well be used to describe this type:
the result is a "byte string", as opposed to a "character string".
Merriam-Webster's seems to agree; meaning 5b(2) is "a sequence of like
items (as bits, characters, or words)"
msg71203 - (view) Author: Matt Giuca (mgiuca) Date: 2008-08-16 09:59
> I'm not a native speaker (of English), but my understanding is that the
> noun "string", in itself, can very well be used to describe this type:
> the result is a "byte string", as opposed to a "character string".
> Merriam-Webster's seems to agree; meaning 5b(2) is "a sequence of like
> items (as bits, characters, or words)"

Ah yes, that's quite right (and computer science literature will
strongly support that claim as well).

However the word "string", unqualified, and in Python 3.0 terminology
(as described in PEP 358) now refers only to the "str" type (formerly
known as "unicode"), so it is very confusing to have a method "tostring"
which returns a bytes object.

For array to become a good Py3k citizen, I'd strongly argue that
tostring/fromstring should be renamed to tobytes/frombytes. I'm
currently writing a patch for that - it looks like there's very minimal
damage.

However as a separate issue, I think the documentation update should be
approved first.
msg71204 - (view) Author: Matt Giuca (mgiuca) Date: 2008-08-16 10:00
(Fixed issue title)
msg71205 - (view) Author: Matt Giuca (mgiuca) Date: 2008-08-16 10:26
I renamed tostring/fromstring to tobytes/frombytes in the array module,
as described above. I then grepped the entire py3k tree for "tostring"
and "fromstring", and carefully replaced all references which pertain to
array objects.

The relatively minor number of these references suggests this won't be a
big problem. All the test cases pass.

I haven't (yet) renamed tounicode/fromunicode to tostring/fromstring.
The more I think about it, the more that sounds like a bad idea (and
could create confusion as to whether this is a character string or byte
string, as Martin pointed out).

The patch (doc+bytesmethods.patch) does both the original
doc-only.patch, plus the renaming and updating of all usages. Use the
above commit log, plus:

Renamed array.tostring to array.tobytes, and array.fromstring to
array.frombytes, to reflect the Python 3.0 terminology.

Updated all references to these methods in Lib to the new names.
msg71206 - (view) Author: Matt Giuca (mgiuca) Date: 2008-08-16 10:30
Oops .. forgot to update the array.rst docs with the new method names.
Replaced doc+bytesmethods.patch with a fixed version.
msg71555 - (view) Author: Matt Giuca (mgiuca) Date: 2008-08-20 16:15
A similar issue came up in another bug
(http://bugs.python.org/issue3613), and Guido said:

"IMO it's okay to add encodebytes(), but let's leave encodestring()
around with a deprecation warning, since it's so late in the release cycle."

I think that's probably wise RE this bug as well - my original
suggestion to REPLACE tostring/fromstring with tobytes/frombytes was
probably a bit over-zealous.

I'll have another go at this during some spare cycles tomorrow -
basically taking my current patch and adding tostring/fromstring back
in, to call tobytes/frombytes with deprecation warnings. Does this sound
like a good plan?

(Also policy question: When you have deprecated functions, how do you
document them? I assume you say "deprecated" in the docs; is there a
standard template for this?)
msg72439 - (view) Author: Matt Giuca (mgiuca) Date: 2008-09-04 00:02
Can I just remind people that I have a documentation patch ready here
(and has been for about a month)?

Of course the doc+bytesmethods.patch may be debatable and probably too
late to go in 3.0. But you should be able to commit doc-only.patch with
no problems.

Current array documentation
(http://docs.python.org/dev/3.0/library/array.html) is clearly wrong in
Python 3.0 (even containing syntax errors).
msg83664 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-03-17 10:31
Benjamin, do you think this should be fixed in 3.1?
msg83668 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-03-17 11:48
It would be nice to deprecate the old names in 3.1 and remove them in
3.2, but I think it should get approval on python-dev.
msg83670 - (view) Author: Matt Giuca (mgiuca) Date: 2009-03-17 12:15
Note that, irrespective of the changes to the library itself, the
documentation is out of date since it still uses the old
"string/unicode" nomenclature, rather than the new "bytes/string". I
have provided a separate documentation patch which should be applicable
with relatively little fuss.

(It's from August so it will probably conflict, but I can update it if
necessary).
msg86295 - (view) Author: Daniel Diniz (ajaksu2) (Python triager) Date: 2009-04-22 14:37
The doc patch is in scope for the Bug Day.
msg86393 - (view) Author: Matt Giuca (mgiuca) Date: 2009-04-24 03:36
OK since the patches I submitted are now eight months old, I just did an
update and re-applied them. I am submitting new patch files which don't
change anything, but are patches against revision 71822 (should be much
easier to apply).

I'd still like to see doc+bytesmethods.patch applied (since it fixes
method names which make no sense at all in Python 3.0 context), but
since it's getting a bit late for that, I'll be happy for the doc-only
patch to be accepted (which merely corrects the documentation which is
still using Python 2.x terminology).
msg86394 - (view) Author: Matt Giuca (mgiuca) Date: 2009-04-24 03:36
Full method renaming patch.
msg86397 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-04-24 05:04
I think this patch is unacceptable for Python 3.1. It is an incompatible
change (removing a method), one would have to deprecate the method to be
removed first. I also agree with Benjamin that a wider-audience approval
of the deprecation would be required. I, myself, remain opposed to this
change.
msg86398 - (view) Author: Matt Giuca (mgiuca) Date: 2009-04-24 05:12
I agree with that -- too big a change to make now.

But can we please get the documentation patch accepted? It's been
waiting here for eight months with corrections to clearly-incorrect
documentation.
msg130404 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-03-09 02:55
In 3.2, a change *was* committed (by who?) but not recorded here:
.from/.tostring were renamed .from/.tobytes and kept as deprecated aliases. Is there anything more to this issue other than removing the deprecated aliases in 3.3 (which could be done now if that was the intention or 3.4 in not)?

Is there still any idea/intention of renaming .from/.tounicode to
.from/.tostring? That would have to be done at least one version with the 'string' names absent, and would have little gain as 'unicode' is clear.
msg140189 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-12 13:43
It was Antoine in fa8b57f987c5, for #8990.
msg140210 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-07-12 19:51
> Is there still any idea/intention of renaming .from/.tounicode to
> .from/.tostring? That would have to be done at least one version with
> the 'string' names absent, and would have little gain as 'unicode' is
> clear.

Indeed, not only it would bring little benefit, but may also confuse users porting from 2.x (since the from/tostring methods would then have a totally different meaning).
msg140224 - (view) Author: Matt Giuca (mgiuca) Date: 2011-07-13 03:57
There are still some inconsistencies in the documentation (in particular, incorrectly using the word "string" to refer to a bytes object, which made sense in Python 2 but not 3), which I fixed in my doc-only.patch file that's coming up to its third birthday.

Most of it has been fixed with the previous change which added 'tobytes' and 'frombytes' and made tostring and fromstring aliases. But there are some places which don't make sense:

array: "If given a list or string" needs to be "If given a list, bytes or string" (since a bytes is not a string).
frombytes: "Appends items from the string" needs to be "Appends items from the bytes object", since this does not work if you give it a string.

Less importantly, I also recommended renaming "unicode string" to just "string", since in Python 3 there is no such thing as a non-unicode string. For instance, there is an example that uses a variable named "unicodestring" that could be renamed to just "string".

> Indeed, not only it would bring little benefit, but may also confuse
> users porting from 2.x (since the from/tostring methods would then
> have a totally different meaning).
Well, by that logic, you shouldn't have renamed "unicode" to "str" since that would also confuse users porting from 2.x. It generally seems like a good idea in Python 3 to rename all mentions of "string" to "bytes" and all mentions of "unicode" to "string", so as to be consistent with the new names of the types (it is better to be internally consistent than consistent with the previous version).

Though I do agree that it would be chaos to rename array.from/tounicode to from/tostring now, given that array.from/tostring already has a different meaning in Python 3.
History
Date User Action Args
2011-07-13 03:57:47mgiucasetmessages: + msg140224
2011-07-12 19:51:43pitrousetstatus: open -> closed
resolution: duplicate
messages: + msg140210

superseder: array constructor and array.fromstring should accept bytearray.
stage: patch review -> resolved
2011-07-12 13:43:46eric.araujosetnosy: + eric.araujo
messages: + msg140189
2011-03-09 02:55:51terry.reedysetversions: + Python 3.3, - Python 3.1, Python 2.7, Python 3.2
nosy: + terry.reedy

messages: + msg130404

components: + Library (Lib), - Interpreter Core
2010-08-24 22:50:39eric.araujosetnosy: + docs@python
title: array documentation, method names not 3.0 compliant -> array documentation, method names not 3.x-compliant
assignee: docs@python
versions: + Python 2.7, Python 3.2
stage: test needed -> patch review
2009-04-24 05:12:48mgiucasetmessages: + msg86398
2009-04-24 05:04:51loewissetmessages: + msg86397
2009-04-24 03:37:00mgiucasetfiles: + doc+bytesmethods.patch

messages: + msg86394
2009-04-24 03:36:14mgiucasetfiles: + doc-only.patch

messages: + msg86393
2009-04-22 14:37:23ajaksu2settype: enhancement
versions: + Python 3.1, - Python 3.0
keywords: + easy
nosy: + ajaksu2

messages: + msg86295
stage: test needed
2009-03-17 12:15:10mgiucasetmessages: + msg83670
2009-03-17 11:48:20benjamin.petersonsetassignee: georg.brandl -> (no value)
messages: + msg83668
2009-03-17 10:31:51pitrousetnosy: + pitrou, benjamin.peterson
messages: + msg83664
2009-03-16 05:24:55LambertDWsetnosy: + LambertDW
2008-09-04 00:02:28mgiucasetmessages: + msg72439
2008-08-20 16:15:01mgiucasetmessages: + msg71555
2008-08-16 10:30:37mgiucasetfiles: + doc+bytesmethods-old.patch
messages: + msg71206
2008-08-16 10:29:05mgiucasetfiles: - doc+bytesmethods.patch
2008-08-16 10:26:07mgiucasetfiles: + doc+bytesmethods.patch
messages: + msg71205
2008-08-16 10:00:18mgiucasetmessages: + msg71204
title: array documentation, method names not 3.0 compliant -> array documentation, method names not 3.0 compliant
2008-08-16 09:59:15mgiucasetmessages: + msg71203
2008-08-16 09:22:34loewissetnosy: + loewis
messages: + msg71202
title: array documentation, method names not 3.0 compliant -> array documentation, method names not 3.0 compliant
2008-08-16 09:14:12mgiucacreate