classification
Title: Efficiently create empty array.array, consistent with bytearray
Type: enhancement Stage: test needed
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: loewis, ncoghlan, pfalcon, serhiy.storchaka, terry.reedy, vstinner
Priority: normal Keywords:

Created on 2014-04-08 13:32 by pfalcon, last changed 2014-06-04 18:46 by gvanrossum. This issue is now closed.

Messages (8)
msg215757 - (view) Author: Paul Sokolovsky (pfalcon) * Date: 2014-04-08 13:32
With bytearray, you can do:

>>> bytearray(3)
bytearray(b'\x00\x00\x00')

However, with arrays:

>>> array.array('i', 3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable

Given that int passed as seconf argument is not handled specially in array, I'd like to propose to make it an initial size of a zero-filled array to create. This will make it: a) consitent with bytearray; b) efficient for the scenarios where one needs to pre-create array of given length, pass to some (native) function to fill in, then do rest of processing. For the latter case, assuming that both fill-in and further processing is efficient (e.g. done by native function), the initial array creation becomes a bottleneck.
msg215774 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-04-08 17:12
>>> array.array('i', [0]) * 3
array('i', [0, 0, 0])
msg215950 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-04-11 19:38
You are  proposing to copy behavior that will likely be deprecated and removed (see  http://legacy.python.org/dev/peps/pep-0467/#id5). Lets reject that idea. The same pep proposes to replace byte(s/array)(n) "by two more explicit alternate constructors provided as class methods" -- perhaps called .zeros(n). A proposal to copy the new replacements to array might be accepted.
msg216018 - (view) Author: Paul Sokolovsky (pfalcon) * Date: 2014-04-13 15:19
> >>> array.array('i', [0]) * 3

@Serhiy Storchaka:

The keyword is "efficiently". Let's analyze: this creates useless array.array('i', [0]) object destined only for garbage collection. Then, it forces using loop of loops to fill in a new object. Whereas array.array('i', 3) immediately reduces to a memset().

You can say that these implementation efficiency issues are of little concert to CPython. But what's being reported here is that, while generally Python is pretty good in allowing to efficiently process raw binary data (memoryview and all other bits and pieces), there are inconsistent accidental gaps in API here and there which jeopardize  efficiency, in obvious way (and also obvious to resolve), what may be of concern for other Python implementations (my case).
msg216020 - (view) Author: Paul Sokolovsky (pfalcon) * Date: 2014-04-13 15:29
@Terry J. Reedy:

Thanks for the pointer. My inital response is <sadness>, another bloating of namespace. But I'm adjusting.

But that PEP shows the issue with all that activity: CPython stdlib got so big and bloated, that it lives its own life and people consider it normal. So, there're PEPs to perfectalize bytearray, without paying attention to the fact that array.array('B') is pretty much the same thing, but apirots (cf. bitrot) behind it.

So, following PEP467, this request should be updated to call for array.array.zero(typecode, size) factory method. Note that it would need to take 2 arguments to follow array.array API. Is that good or bad? IMHO, not much worse than introducing separate factory method at all. But again, author of PEP467 should rather consider how those proposals extend to other related types.

If you participate in the discussion of the PEP, I'd appreciate if shared there link to this ticket - I'm not in loop of general CPython development and have limited resources to learn/follow them. Thanks.
msg216021 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2014-04-13 15:38
Paul: Discussion of the PEP is out of the scope of this issue. The primary point of the PEP process is that it steers discussion. So if you object to the PEP, contact the PEP author and ask for your objection to be considered, and if not that, at least be recorded.

I feel that your proposed change is highly confusing. People might expect that 

array.array('i', 3)

creates an array with the single value 3 (just as they factually do expect that for bytes). Keyword-only parameters might clear that ambiguity, e.g.

array.array('i', len=3)
array.array(type='i', len=3, value=0)

In any case, it is unlikely that anything will be done with this issue unless you provide a patch (and that still would be no guarantee that somebody accepts it).
msg216025 - (view) Author: Paul Sokolovsky (pfalcon) * Date: 2014-04-13 16:17
Martin:

> People might expect that array.array('i', 3) creates an array with the single value 3.

I don't know which people would expect that. Personally I recognize the need to create an empty array of of given size, and have read the docs for builtin "bytearray" type, and expect extension type array.array to work the same. I'd even say that I found any other expectations ungrounded, in particular, a way to create an array of single int is obviously bytearray([3]) or array.array('i', [3]).

Anyway, as I say in "2014-04-13 15:29" comment, I'm only glad to follow with PEP467 changes. Unfortunately, I don't seem to be able to update initial ticket description with updated proposal.


> array.array('i', len=3)

The whole scope of this ticket is to have consistent API between bytearray and array.array. So, the only way I could suggest (or back) the above is if PEP467 was changed too, and as you suggest, I don't do that. (I personally also dislike overuse of keyword args.)

> In any case, it is unlikely that anything will be done with this issue unless you provide a patch (and that still would be no guarantee that somebody accepts it).

The ending is the saddest. As I mentioned, the best outcome I could imagine of such reports is that people who consider changing builtin types to also consider obvious stdlib counterparts too. (There's another issue - stdlib is indeed enormous, so it would be nice to separate it (speculatively) into ["builtin types/lib"], "core stdlib", and "extended stdlib". There's clear difference between array module and for example unittest or logging.

Thanks for hint re: mailing Nick Coghlan - nice to know that's acceptable, that's certainly easier than figuring out which mailing list to use and then to follow it. (I see that Nick is in noselist of this ticket, but I guess I'll mail him anyway to have my conscience clear that I did everything to make Python better (== more consistent)).
msg216029 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-04-13 18:58
A few notes: This issue depends on PEP467, but there is no corresponding tracker issue yet to put in the dependency box. Title and other headers can be edited. Messages and uploaded files can be unlinked from the issue but not edited (or deleted from the database). Nick and other developers are busy with PyCon, so please be patient.

Why this issue depends on the PEP: There is a general feeling that a default class constructor can be overloaded too far, and that a separate constructor method is sometimes better. Many people think that byte(s/array) is the worst stdlib example of 'too much'. In particular, few seem to like the 0 initializaiton and many dislike it. Changing it is the motivation for the PEP. From Guido's comments, I expect that some version of this change will be accepted even if other parts of the PEP are rejected and eliminated (as some already have been).

In summary, while Martin and I agree that 'copy the existing bytearray api' should be rejected, we also think that 'copy the new api' can be considered if and when there is one.
History
Date User Action Args
2014-06-04 18:46:37gvanrossumsetstatus: open -> closed
resolution: rejected
2014-04-16 07:45:30vstinnersetnosy: + vstinner
2014-04-13 18:58:02terry.reedysetversions: + Python 3.5, - Python 3.4
title: Cannot efficiently create empty array.array of given size, inconsistency with bytearray -> Efficiently create empty array.array, consistent with bytearray
messages: + msg216029

type: performance -> enhancement
stage: test needed
2014-04-13 16:17:33pfalconsetmessages: + msg216025
2014-04-13 15:38:49loewissetnosy: + loewis
messages: + msg216021
2014-04-13 15:29:57pfalconsetmessages: + msg216020
2014-04-13 15:19:09pfalconsetmessages: + msg216018
2014-04-11 19:38:37terry.reedysetnosy: + terry.reedy, ncoghlan
messages: + msg215950
2014-04-08 17:12:23serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg215774
2014-04-08 13:32:06pfalconcreate