classification
Title: Adding a way to strip annotations from compiled bytecode
Type: enhancement Stage: resolved
Components: Interpreter Core Versions:
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: cary, gvanrossum, levkivskyi, rhettinger
Priority: normal Keywords: patch

Created on 2019-03-29 02:11 by cary, last changed 2019-04-18 01:03 by cary. This issue is now closed.

Files
File name Uploaded Description Edit
strip_annotations.patch cary, 2019-03-29 02:11
Messages (14)
msg339091 - (view) Author: cary (cary) * Date: 2019-03-29 02:11
Similar to how `-OO` currently strips docstrings from compiled bytecode, it would be nice if there were a way to strip annotations as well to further compact the bytecode.

Attached is my initial attempt. From some simple manual testing, Python with this patch applied will generate the same bytecode (verified with `marshal` and `dis`) for two files with the same logic, but with annotations manually removed from one of them.

This will probably need some new flag/optimization level rather than relying on `-OO` (as it would be a breaking change).

Open to initial reviews of the patch and idea in general, and suggestions on how to best thread the option through to the module.
msg339182 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-03-30 07:04
-0 At first blush, this seems reasonable. Like removing docstrings, it would make the bytecode more compact.  That said, annotations can be used for more things than typing (they predate typing and could be used for anything). It's unclear whether stripping them might break published modules that rely on the annotations being present.

Leaving this feature request open so that it can gather more comments from the other devs.
msg339215 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-03-30 17:18
+1 from me. Looking for a better way to enable this from the command line.
Our alternative would be to maintain a local patch, since this definitely
helps us.

On Sat, Mar 30, 2019 at 12:04 AM Raymond Hettinger <report@bugs.python.org>
wrote:

>
> Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:
>
> -0 At first blush, this seems reasonable. Like removing docstrings, it
> would make the bytecode more compact.  That said, annotations can be used
> for more things than typing (they predate typing and could be used for
> anything). It's unclear whether stripping them might break published
> modules that rely on the annotations being present.
>
> Leaving this feature request open so that it can gather more comments from
> the other devs.
>
> ----------
> nosy: +rhettinger
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue36466>
> _______________________________________
>
-- 
--Guido (mobile)
msg339228 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-03-30 23:04
+1 for a command-line option decouples this from eliminating docstrings.  The latter generally has no semantic effect, but the former might.

Ideally, we don't want to break non-typing uses of annotations.  One example below uses annotations for argument validation and documentation.  Another example would be the using __annotations__ for dynamic dispatch.

--- Example -----------------------------------------------------------

>>> class Limit:
	def __init__(self, low, high):
		self.low = low
		self.high = high
	def valid(self, x):
		return self.low <= x <= self.high
	def __repr__(self):
		return f'{type(self).__name__}(low={self.low}, high={self.high})'
		 
>>> def validate(function, parameter, value):
	assert function.__annotations__[parameter].valid(value)

>>> def maneuver(thrust: Limit(100, 150), angle: Limit(-10, 20)):
	'Engage OMS burn (orbital maneuvering system).'
	validate(maneuver, 'thrust', thrust)
	validate(maneuver, 'angle', angle)
	...
	 
>>> help(maneuver)
		 
Help on function maneuver in module __main__:

maneuver(thrust: Limit(low=100, high=150), angle: Limit(low=-10, high=20))
    Engage OMS burn (orbital maneuvering system).

>>> maneuver(120, 7)
		 
>>> maneuver(120, 35)
		 
Traceback (most recent call last):
  File "<pyshell#41>", line 1, in <module>
    maneuver(120, 35)
  File "<pyshell#38>", line 4, in maneuver
    validate(maneuver, 'angle', angle)
  File "<pyshell#30>", line 2, in validate
    assert function.__annotations__[parameter].valid(value)
AssertionError
msg339244 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-03-31 04:37
Also note that functools.singledispatch depends on type annotations being present.¹𝄒²
 
¹ https://docs.python.org/3/library/functools.html#functools.singledispatch
² https://forum.dabeaz.com/t/singledispatch-and-the-visitor-pattern/395
msg339261 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-03-31 15:36
There's a similar thing with docstrings though. Some code depend on docstrings (e.g. David Beazley's parser generator). help() of course also depends on it. And yet we have a way to disable it. (Same with asserts, plenty of code depends on them even though we recommend against it.)

So as long as we have a separate mechanism to disable this I'm not worried -- it's up to the person that runs the program to ensure that they don't use functionality that breaks when annotations are suppressed.

(Note that functools.singledispatch has an alternate registration syntax that doesn't require annotations.)
msg339266 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-03-31 16:23
> So as long as we have a separate mechanism to disable this I'm not worried 

Having a separate mechanism is a good solution.
msg339277 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-03-31 22:23
Here are some quick measurements on a single module¹ that uses typing. It shows about a 7% space savings between the baseline and patched versions:

  -rw-r--r--  1 raymond  staff  3490 Mar 31 15:07 kmeans.cpython-38.opt-2.pyc
  -rw-r--r--  1 raymond  staff  3245 Mar 31 15:10 kmeans.cpython-38.opt-2.pyc

Since unique types are singletons, the savings will likely be much less on bigger modules where the same types are used over and over again in the signatures:

    >>> List[int] is List[int]
    True

It would be nice if someone could measure the effect on a big project. It's possible that the benefits are negligible compared the savings from docstrings.

¹ https://raw.githubusercontent.com/rhettinger/modernpython/master/kmeans.py
msg339403 - (view) Author: Ivan Levkivskyi (levkivskyi) * (Python committer) Date: 2019-04-03 17:04
+1 from me.

There are two ways to enable this:
* Add -OOO that would remove all three: asserts, docstrings, annotations
* Add separate --O-asserts --O-docstrings --O-annotations (or similar)

I think I like the second option more.

@cary Please note that our workflow changed, you can now submit a PR to our GitHub repo instead of sending a patch.

Also please include tests and docs in your PR.
msg339409 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-04-03 18:23
@cary are you planning on updating with the suggested/requested improvements to the patch? If not, let us know and we'll see if someone else is interested in taking over.
msg339421 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-04-04 00:53
FYI, this partially breaks functools.singledispatch() and completely breaks both typing.NamedTuple() and dataclasses.dataclass().  A user may be able to avoid these in their own code, but I don't see how they can avoid it in third-party code.
msg339425 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-04-04 02:51
One way would be to compile only their own source to bytecode using this flag. But I agree it doesn't look very viable in general. I'll talk to Cary offline.
msg339426 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-04-04 02:58
(I just found out that Cary is on vacation until 4/17.)
msg340460 - (view) Author: cary (cary) * Date: 2019-04-18 01:03
Thanks for the feedback! I wasn't aware that modules depended on this during runtime.

Abandoning this :)
History
Date User Action Args
2019-04-18 01:03:04carysetstatus: open -> closed
resolution: rejected
messages: + msg340460

stage: resolved
2019-04-04 02:58:06gvanrossumsetmessages: + msg339426
2019-04-04 02:51:46gvanrossumsetmessages: + msg339425
2019-04-04 00:53:50rhettingersetmessages: + msg339421
2019-04-03 18:23:33gvanrossumsetmessages: + msg339409
2019-04-03 17:04:35levkivskyisetmessages: + msg339403
2019-03-31 22:23:25rhettingersetmessages: + msg339277
2019-03-31 16:23:53rhettingersetmessages: + msg339266
2019-03-31 15:36:55gvanrossumsetmessages: + msg339261
2019-03-31 04:37:50rhettingersetmessages: + msg339244
2019-03-30 23:04:08rhettingersetmessages: + msg339228
2019-03-30 17:18:23gvanrossumsetmessages: + msg339215
2019-03-30 07:04:25rhettingersetnosy: + rhettinger
messages: + msg339182
2019-03-29 19:56:15levkivskyisetnosy: + gvanrossum, levkivskyi
2019-03-29 02:11:47carycreate