classification
Title: Make number serialization ES6/V8 compatible
Type: enhancement Stage:
Components: Extension Modules Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: anders.rundgren.net@gmail.com, eric.smith, ezio.melotti, mark.dickinson, pitrou, rhettinger
Priority: normal Keywords:

Created on 2016-01-28 07:25 by anders.rundgren.net@gmail.com, last changed 2016-02-02 20:31 by anders.rundgren.net@gmail.com.

Messages (10)
msg259105 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2016-01-28 07:25
ECMA has in their latest release defined that JSON elements must be ordered during serialization.  This is easy to accomplish using Python's OrderedDict.  What is less trivial is that numbers have to be formatted in a certain way as well.  I have tested 100 millions specific and random values and found out that Python 3.5.1 is mathematically identical to ES6/V8 but has some differences in formatting:

   IEEE Double        ECMAScript 6/V8            Python 3.5.1

c43211ede4974a35, -333333333333333300000,    -3.333333333333333e+20
c3fce97ca0f21056, -33333333333333336000,     -3.3333333333333336e+19
c3c7213080c1a6ac, -3333333333333334000,      -3.333333333333334e+18
c39280f39a348556, -333333333333333400,       -3.333333333333334e+17
c35d9b1f5d20d557, -33333333333333340,        -3.333333333333334e+16

c327af4c4a80aaac, -3333333333333334,         -3333333333333334.0

bf0179ec9cbd821e, -0.000033333333333333335,  -3.3333333333333335e-05
becbf647612f3696, -0.0000033333333333333333, -3.3333333333333333e-06

4024000000000000, 10,                        10.0
0000000000000000, 0,                         0.0
4014000000000000, 5,                         5.0
3f0a36e2eb1c432d, 0.00005,                   5e-05
3ed4f8b588e368f1, 0.000005,                  5e-06

3ea0c6f7a0b5ed8d, 5e-7,                      5e-07

Why could this be important?

https://github.com/Microsoft/ChakraCore/issues/149

# Python test program
import binascii
import struct
import json

f = open('c:\\es6\\numbers\\es6testfile100m.txt','rb')

l = 0;
string = '';

while True:
  byte = f.read(1);
  if len(byte) == 0:
    exit(0)
  if byte == b'\n':
    l = l + 1;
    i = string.find(',')
    if i <= 0 or i >= len(string) - 1:
      print('Bad string: ' + str(i))
      exit(0)
    hex = string[:i]
    while len(hex) < 16:
      hex = '0' + hex
    o = dict()
    o['double'] = struct.unpack('>d',binascii.a2b_hex(hex))[0]
    py3Double = json.dumps(o)[11:-1]
    es6Double = string[i + 1:]
    if es6Double != py3Double:
      es6Dpos = es6Double.find('.')
      py3Dpos = py3Double.find('.')
      es6Epos = es6Double.find('e')
      py3Epos = py3Double.find('e')
      if py3Epos > 0:
        py3Exp = int(py3Double[py3Epos + 1:])
      if es6Dpos < 0 and py3Dpos > 0:
        if es6Epos < 0 and py3Epos > 0:
          py3New = py3Double[:py3Dpos] + py3Double[py3Dpos + 1:py3Epos - len(py3Double)]
          q = py3Exp - py3Epos + py3Dpos
          while q >= 0:
            py3New += '0'
            q -= 1
          if py3New != es6Double:
            print('E1: ' + py3New)
            exit(0)
        elif py3Epos < 0:
          py3New = py3Double[:-2]
          if py3New != es6Double:
            print('E2: ' + py3New)
            exit(0)
        else:
          print (error + hex + '#' + es6Double + '#' + py3Double)
          exit(0)
      elif es6Dpos > 0 and py3Dpos > 0 and py3Epos > 0 and es6Epos < 0:
        py3New = py3Double[py3Dpos - 1:py3Dpos] + py3Double[py3Dpos + 1:py3Epos - len(py3Double)]
        q = py3Exp + 1
        while q < 0:
          q += 1
          py3New = '0' + py3New
        py3New = py3Double[0:py3Dpos - 1] + '0.' + py3New 
        if py3New != es6Double:
          print('E3: ' + py3New + '#' + es6Double)
          exit(0)
      elif es6Dpos == py3Dpos and py3Epos > 0 and es6Epos > 0:
        py3New = py3Double[:py3Epos + 2] + str(abs(py3Exp))
        if py3New != es6Double:
          print('E4: ' + py3New + '#' + es6Double)
          exit(0)
      elif es6Dpos > 0 and py3Dpos < 0 and py3Epos > 0 and es6Epos < 0:
        py3New = py3Double[:py3Epos - len(py3Double)]
        q = py3Exp + 1
        while q < 0:
          q += 1
          py3New = '0' + py3New
        py3New = '0.' + py3New 
        if py3New != es6Double:
          print('E5: ' + py3New + '#' + es6Double)
          exit(0)
      else:
        print ('Unexpected: ' + hex + '#' + es6Double + '#' + py3Double)
        exit(0)
    string = ''
    if l % 10000 == 0:
      print(l)
  else:
    string += byte.decode(encoding='UTF-8')
msg259210 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2016-01-29 14:34
Do you have a pointer to the spec which requires -333333333333333300000 vs. -3.333333333333333e+20, for example?
msg259219 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-01-29 20:32
Eric: I suspect he's talking about section 7.1.12.1 of the 6th edition of ECMA-262; a PDF can be found here: http://www.ecma-international.org/ecma-262/6.0/ECMA-262.pdf. Clause 6 applies to this particular example:

"""
If k <= n <= 21, return the String consisting of the code units of the k digits of the decimal representation of s (in order, with no leading zeroes), followed by nk occurrences of the code unit 0x0030 (DIGIT ZERO).
"""

here 'k' is the number of significant digits in the shortest possible representation (i.e., the number of significant digits that Python's repr will use), and n is the "adjusted exponent" of the input (so k = 16 and n = 21 in this case, because 10**20 <= target_value < 10**21).

I'm not convinced of the importance / value of making Python's json implementation exactly correspond to that of Google's JS engine, though. For one thing, there's no spec: ECMA-262 isn't good enough, since (as noted in the spec), the least significant digit isn't necessarily defined by their requirements, so it's still perfectly possible for two JS implementations to both conform with the specification and still give different output strings.
msg259220 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-01-29 20:37
Here's the relevant part of the JCS document, from Appendix A of https://cyberphone.github.io/openkeystore/resources/docs/jcs.html#ECMAScript_Compatibility_Mode:

"""
Numbers *must* be expressed as specified by EMCAScript [ES6] using the improved serialization algorithm featured in Google's V8 JavaScript engine [V8]. That is, in the ECMAScript compatibility mode there are no requirements saving the textual value of numbers. This also means that the JCS Sample Signature in incompatible with the ECMAScript mode since it uses unnormalized numbers.
"""

I think exactly matching Google's implementation is an unreasonable requirement, and I don't see any evidence that JCS usage is widespread enough to warrant making changes to the JSON float output format.
msg259221 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2016-01-29 20:40
That said, someone interested in this should probably voice their concerns towards the JCS standardizers, as the restrictions it imposes on number serialization are clearly an impediment to implementing their protocol.
msg259245 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2016-01-30 09:29
As I said, the problem is close to fixed in 3.5.

You should not consider the JCS specification as the [sole] target but the ability to creating a normalized JSON object which has many uses including calculating a hash of such objects.

##################################################################
# Convert a Python double/float into an ES6/V8 compatible string #
##################################################################
def convert2Es6Format(value):
# Convert double/float to str using the native Python formatter
    pyDouble = str(value)
    pySign = ''
    if pyDouble.find('-') == 0:
#
#     Save sign separately, it doesn't have any role in the rest
#
        pySign = '-'
        pyDouble = pyDouble[1:]
    pyExpStr = ''
    pyExpVal = 0
    q = pyDouble.find('e')
    if q > 0:
#
# Grab the exponent and remove it from the number
#
        pyExpStr = pyDouble[q:]
        if pyExpStr[2:3] == '0':
#
# Supress leading zero on exponents
#
            pyExpStr = pyExpStr[0:2] + pyExpStr[3:]
        pyDouble = pyDouble[0:q]
        pyExpVal = int(pyExpStr[1:])
#
# Split number in pyFirst + pyDot + pyLast
#
    pyFirst = pyDouble
    pyDot = ''
    pyLast = ''
    q = pyDouble.find('.')
    if q > 0:
        pyDot = '.'
        pyFirst = pyDouble[0:q]
        pyLast = pyDouble[q + 1:]
#
# Now the string is split into: pySign + pyFirst + pyDot + pyLast + pyExpStr
#
    if pyLast == '0':
#
# Always remove trailing .0
#
        pyDot = ''
        pyLast = ''
    if pyExpVal > 0 and pyExpVal < 21:
#
# Integers are shown as is with up to 21 digits
#
        pyFirst += pyLast
        pyLast = ''
        pyDot = ''
        pyExpStr = ''
        q = pyExpVal - len(pyFirst)
        while q >= 0:
            q -= 1;
            pyFirst += '0'
    elif pyExpVal < 0 and pyExpVal > -7:
#
# Small numbers are shown as 0.etc with e-6 as lower limit
#
        pyLast = pyFirst + pyLast
        pyFirst = '0'
        pyDot = '.'
        pyExpStr = ''
        q = pyExpVal
        while q < -1:
            q += 1;
            pyLast = '0' + pyLast
#
# The resulting sub-strings are concatenated
#
    return pySign + pyFirst + pyDot + pyLast + pyExpStr
msg259363 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2016-02-02 06:26
An easier fix than mucking around in the pretty complex number serializer code would be adding an "ES6Format" option to the "json.dump*" methods which could use the supplied conversion code as is.

For JSON parsing in an ES6-compatible way you must anyway use an "OrderedDict" hook option to get the right (=original) property order.
msg259366 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-02-02 07:43
> For JSON parsing in an ES6-compatible way you must anyway use an "OrderedDict" hook option to get the right (=original) property order.

Why? From the JSON spec: "An object is an *unordered* set of name/value pairs." (emphasis mine). What do you mean by "JSON parsing in an ES6-compatible way"? Surely the JSON specification is all that should matter when determining how to parse JSON?
msg259368 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-02-02 08:16
> An easier fix than mucking around in the pretty complex number serializer
> code would be adding an "ES6Format" option to the "json.dump*" methods
> which could use the supplied conversion code as is.

Certainly if this were added we'd want to do it in a backwards compatible way; adding (yet) another flag to the json.dump* methods is one possibility.
msg259427 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2016-02-02 20:31
In ES6/V8-compatible implementations which include "node.js", Chrome, Firefox, Safari and (of course) my Java reference implementation you can take a cryptographic hash of a JSON object with a predictable result.

That is, this request is in no way limited to JCS.

Other solutions to this problem has been to create something like XML's canonicalization which is much more complex.

The JSON RFC is still valid, it just isn't very useful for people who are interested in security solutions.  The predictable property order introduced in ES6 makes a huge difference!  Now it is just the number thing left...

The other alternative is dressing your JSON objects in Base64 to maintain a predictable signature like in IETF's JOSE.  I doubt that this is going to be mainstream except for OpenID/OAuth which JOSE stems from.
History
Date User Action Args
2016-02-02 20:31:58anders.rundgren.net@gmail.comsetmessages: + msg259427
2016-02-02 08:16:21mark.dickinsonsetmessages: + msg259368
2016-02-02 07:43:31mark.dickinsonsetmessages: + msg259366
2016-02-02 06:26:48anders.rundgren.net@gmail.comsetmessages: + msg259363
2016-01-30 09:29:14anders.rundgren.net@gmail.comsetmessages: + msg259245
2016-01-29 20:40:21pitrousetmessages: + msg259221
2016-01-29 20:37:53mark.dickinsonsetmessages: + msg259220
2016-01-29 20:32:11mark.dickinsonsetmessages: + msg259219
2016-01-29 20:02:38mark.dickinsonsetnosy: + mark.dickinson
2016-01-29 14:34:33eric.smithsetnosy: + eric.smith
messages: + msg259210
2016-01-28 18:44:32SilentGhostsetnosy: + rhettinger, pitrou, ezio.melotti

components: + Extension Modules, - Interpreter Core
versions: + Python 3.6, - Python 3.5
2016-01-28 07:25:11anders.rundgren.net@gmail.comcreate