Author rhettinger
Recipients joern, rhettinger, serhiy.storchaka
Date 2015-05-22.22:00:49
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1432332049.45.0.358258447041.issue23509@psf.upfronthosting.co.za>
In-reply-to
Content
Sorry, I don't want to any of these changes (though it is a close call on a couple of them).

Before the particulars, here are some high-level thoughts (not definitive).  I would like to confine the optimizations and complexities to the more important parts of the API (actually counting as opposed to counter-to-counter operations).  Also, I don't want to preclude some of the future possibilities under consideration (for example, I am leaning toward guaranteeing the order of updates so that the OrderedCounter recipe has guaranteed behavior).  Also, I'm considering removing the existing self.get(elem, 0) in update() and substract() so that subclassers can usefully override/extend the __missing__ method to return other types (like decimals, fractions, etc) or have other behaviors like logging missing entries, etc.  And the self.get optimization doesn't seem to perform well under PyPy in contrast to an inherited __getitem__.  The current code choices were biased towards simplicity, space-over-speed, and keeping a predictable operation order where possible.

Particulars:

1) get() bound method optimization:  This is a close call.  We already use this in update() and subtract() though I'm thinking of removing those two cases.  Switching from c[k] to c.get(k, 0) is a semantic change that affects subclasses that define, __getitem__(), get(), or __missing__().  Speedwise, c.get() is faster than a fallback to __missing__() for missing keys; conversely, the inherited __getitem__() is faster than c.get(k, 0) when the keys are present.  There is some room for debate about which is the common case (it really depends on what your application is) and I would prefer at this point not to shift assumptions about is more common.  Clarity-wise:  The square brackets are clearer than the boundmethod trick which I would like to use only where it really matters.

2)  The current _keep_positive() is shorter, clearer, maintains order for the OrderedCounter use case, and is more space-efficient (never using more space than the original counter and intentionally choosing to remove elements rather than building a new one from scratch).  This is how it was done in setobject.c for the same reasons.

3) Other than brevity, I don't see any advantage to __add__ and __or__ being defined via inplace operations.  That is a semantic change that can affect subclassers, violating the open-closed-principle (I want people to be able to override/extend the in-place methods without unintentionally breaking add/or methods).  Also, the current approach has a space saving bias (not storing negative counts in the first place rather than using a follow-on call to _keep_positive pass to eliminate the negatives after they have been stored).

4) The code expansion for __pos__ and __neg__ grows the code and is less clear (IMO).  The change for __pos__ scrambles the order, interfering with the OrderedCounter example.  Also, I want the meaning of +c to be the same as c plus an empty counter (at least, that is how I think of the operation).  FWIW, the unary plus operation was intended to be a trigger for _keep_positive.  It was modeled after the unary plus in Decimal which serves the purpose of triggering rounding.

I'm sure there is room for argument about any one of the points above, some are just judgment calls.  I'm closing this because the OP's original concern about wanting an in-place operation was already solved and because the proposed optimizations slightly change semantics, aren't really the important part of the API, the snarl the code a bit, and they interfere with some future directions I want keep open.  Also, I've already spent several hours to reviewing this patch and need to return attention to other matters.
History
Date User Action Args
2015-05-22 22:00:49rhettingersetrecipients: + rhettinger, joern, serhiy.storchaka
2015-05-22 22:00:49rhettingersetmessageid: <1432332049.45.0.358258447041.issue23509@psf.upfronthosting.co.za>
2015-05-22 22:00:49rhettingerlinkissue23509 messages
2015-05-22 22:00:49rhettingercreate