Title: remove *_INTERNED opcodes from marshal
Components: Interpreter Core Versions: Python 3.7
Nosy List: benjamin.peterson, methane, serhiy.storchaka
Created on 2017-09-07 04:54 by benjamin.peterson

Messages (8)
msg301569 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-09-07 04:54
The *_INTERN opcodes inform the marsahl reader to intern the encoded string after deserialization. I believe for pycs this is pointless because PyCode_New ends up interning all strings that are interesting to intern. Writing this opcodes makes pycs non-deterministic because the intern state may be inconsistent in the writer. See
msg301571 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-07 06:25
Marshal is used not only in pyc files. It is used for fast data serialization, faster than pickle, json, etc.
msg301572 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-09-07 06:41
Used but not really supported. Anyway, I doubt intern round-tripping is a particularly important.
msg301576 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-09-07 08:17
w_ref() depends on refcnt already.
I don't think removing *_INTERN opcode makes PYC reproducible.

I think "intern one string, then share it 10 times" is faster than
"share one string 10 times, then intern each of 10 references".
msg301592 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-09-07 16:36
On Thu, Sep 7, 2017, at 01:17, INADA Naoki wrote:
> INADA Naoki added the comment:
> w_ref() depends on refcnt already.
> I don't think removing *_INTERN opcode makes PYC reproducible.

I know—we're going to have to do something about that, too. In practice,
though, the interning behavior seems to be a bigger reproducibility

> I think "intern one string, then share it 10 times" is faster than
> "share one string 10 times, then intern each of 10 references".

We end up interning each reference individually currently.
msg301593 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-09-07 16:46
> We end up interning each reference individually currently.

But interning interned string is much faster. It only checks flag.
Interning normal string requires dict lookup.
msg301594 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-09-07 16:54
On Thu, Sep 7, 2017, at 09:46, INADA Naoki wrote:
> INADA Naoki added the comment:
> > We end up interning each reference individually currently.
> But interning interned string is much faster. It only checks flag.
> Interning normal string requires dict lookup.

We could makes sure the version in the internal marshal memo is interned
if appropriate.
msg321413 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-07-11 07:58
I doubt that interning cause reproduciblity problem.

AFAIK, all strings in code object are interned or not
interned deterministically.
This issue seems be caused by w_ref() based on object refcnt,
not interning.
