classification
Title: remove *_INTERNED opcodes from marshal
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, inada.naoki, serhiy.storchaka
Priority: normal Keywords:

Created on 2017-09-07 04:54 by benjamin.peterson, last changed 2018-07-11 07:58 by inada.naoki.

Messages (8)
msg301569 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-09-07 04:54
The *_INTERN opcodes inform the marsahl reader to intern the encoded string after deserialization. I believe for pycs this is pointless because PyCode_New ends up interning all strings that are interesting to intern. Writing this opcodes makes pycs non-deterministic because the intern state may be inconsistent in the writer. See https://bugzilla.opensuse.org/show_bug.cgi?id=1049186
msg301571 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-07 06:25
Marshal is used not only in pyc files. It is used for fast data serialization, faster than pickle, json, etc.
msg301572 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-09-07 06:41
Used but not really supported. Anyway, I doubt intern round-tripping is a particularly important.
msg301576 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-09-07 08:17
w_ref() depends on refcnt already.
I don't think removing *_INTERN opcode makes PYC reproducible.
https://github.com/python/cpython/blob/1f06a680de465be0c24a78ea3b610053955daa99/Python/marshal.c#L269-L271

I think "intern one string, then share it 10 times" is faster than
"share one string 10 times, then intern each of 10 references".
msg301592 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-09-07 16:36
On Thu, Sep 7, 2017, at 01:17, INADA Naoki wrote:
> 
> INADA Naoki added the comment:
> 
> w_ref() depends on refcnt already.
> I don't think removing *_INTERN opcode makes PYC reproducible.
> https://github.com/python/cpython/blob/1f06a680de465be0c24a78ea3b610053955daa99/Python/marshal.c#L269-L271

I know—we're going to have to do something about that, too. In practice,
though, the interning behavior seems to be a bigger reproducibility
problem.

> I think "intern one string, then share it 10 times" is faster than
> "share one string 10 times, then intern each of 10 references".

We end up interning each reference individually currently.
msg301593 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-09-07 16:46
> We end up interning each reference individually currently.

But interning interned string is much faster. It only checks flag.
Interning normal string requires dict lookup.
msg301594 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-09-07 16:54
On Thu, Sep 7, 2017, at 09:46, INADA Naoki wrote:
> 
> INADA Naoki added the comment:
> 
> > We end up interning each reference individually currently.
> 
> But interning interned string is much faster. It only checks flag.
> Interning normal string requires dict lookup.

We could makes sure the version in the internal marshal memo is interned
if appropriate.
msg321413 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-07-11 07:58
I doubt that interning cause reproduciblity problem.

AFAIK, all strings in code object are interned or not
interned deterministically.

https://bugzilla.opensuse.org/show_bug.cgi?id=1049186
This issue seems be caused by w_ref() based on object refcnt,
not interning.
History
Date User Action Args
2018-07-11 10:37:20inada.naokilinkissue34033 dependencies
2018-07-11 07:58:22inada.naokisetmessages: + msg321413
2017-09-07 16:54:17benjamin.petersonsetmessages: + msg301594
2017-09-07 16:46:03inada.naokisetmessages: + msg301593
2017-09-07 16:36:52benjamin.petersonsetmessages: + msg301592
2017-09-07 08:17:39inada.naokisetnosy: + inada.naoki
messages: + msg301576
2017-09-07 06:41:54benjamin.petersonsetmessages: + msg301572
2017-09-07 06:25:43serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg301571
2017-09-07 04:54:12benjamin.petersoncreate