This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Only READ support for Decimal in json
Type: behavior Stage:
Components: Extension Modules Versions: Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: bob.ippolito Nosy List: anders.rundgren.net@gmail.com, barry, bob.ippolito, eli.bendersky, mark.dickinson, ncoghlan, pitrou, rhettinger
Priority: normal Keywords:

Created on 2014-12-27 17:10 by anders.rundgren.net@gmail.com, last changed 2022-04-11 14:58 by admin.

Messages (24)
msg233139 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2014-12-27 17:10
jsonString = '{"t":6,"h":4.50, "g":"text","j":1.40e450}'
jsonObject = json.loads(jsonString, object_pairs_hook=collections.OrderedDict,parse_float=Decimal)
for item in jsonObject:
  print jsonObject[item]
6
4.50
text
1.40E+450

Works as expected.

However, there seems to be no way to get back to the original JSON string as far as I can tell since you have to convert Decimal to str in cls when using json.dumps which adds "" around the arguments
msg233156 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-12-28 23:36
It is unfortunate that there doesn't seem to be a way to round-trip Decimals.  That would seem to be a fundamental capability that we should expect to support.

I have a vague recollection that you used to be able to trick the encoder by returning a subclass of float with a custom __str__; however, I don't think that hack would work anymore because float subclasses now get coerced back to a regular float in order to make the json module work with Enums (which have a __str__ that is meaningless in JSON).

In Python 3.5, it would be nice to add a hook that affords more control than "cls" currently does.  Ideally, it should allow any class to special exactly what it wants written-out.

Another option, for Py2.7, 3.4, and 3.5 is to add direct support for decimal instances (much like the enum support was backported).
msg233157 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-12-29 01:20
Enums (and other numeric subclasses), do not round-trip back to themselves.  An IntEnum with the value of 4 is written as 4 and converted back from json as the integer 4 (not Settings.TabSpaces, or whatever).

Given that json is multi-language format (or a javascript format -- but either way not a Python-specific format) I don't know that we can expect much more from it.
msg233159 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2014-12-29 05:32
simplejson has had a use_decimal flag for output since 2.1.0 and has been enabled by default since 2.2.0. simplejson 3.2.0 introduced a for_json argument that checks objects for a method of that name for serialization. 

https://github.com/simplejson/simplejson/blob/master/CHANGES.txt
msg233160 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2014-12-29 05:46
I was actually hoping to implement the final part of this:
https://openkeystore.googlecode.com/svn/resources/trunk/docs/jcs.html#Normalization_and_Signature_Validation

It seems that the current Decimal implementation wouldn't save me anyway since it modifies the input :-(

Anyway, floats in JSON have rather little use so maybe my existing Pyhton (PoC) solution will be "good enough":
https://code.google.com/p/openkeystore/source/browse/python/trunk/src/org/webpki/json/JCSValidator.py
msg233161 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2014-12-29 07:31
I'm sure there's some hack that would allow you to preserve the input. I would try using parse_float and have it return some object that preserves the string and will be output in precisely the same way. It may need to be a Decimal subclass. I'm traveling for the next few weeks so I won't have much of a chance to investigate myself.
msg233162 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2014-12-29 07:48
It would be great if I could use a sub-classed Decimal during parsing but since it doesn't appear to be a way to serialize the result using the "json" package I'm probably stuck with the current "99%" solution.

I have solved this in Java and JavaScript by writing my own JSON stuff
http://webpki.org/papers/keygen2/doc/org/webpki/json/package-summary.html
but that method obviously doesn't scale and I'm a real n00b when it comes to Python although it was more fun than I had expected :-)

A minor patch addressing serialization of Decimal would probably do fine (after sub-classing) and would be generally useful.
msg233163 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2014-12-29 07:53
Subclass Decimal and implement __str__ to return your own representation. Use parse_float to use your Decimal subclass. Should work with simplejson, a similar hack may be possible with the json module.
msg233165 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-12-29 08:30
> Given that json is multi-language format ... I don't know 
> that we can expect much more from it.

JSON specifies a textual number format but doesn't dictate whether that format represents a fixed precision binary float point number or a decimal floating point number.  It is perfectly reasonable for someone to want to read and write a JSON number format to and from a decimal (we also see this with database formats as well -- such as sqlite).

This bug report isn't a JSON spec issue; rather, it is about how the JSON module API can support (or inhibit) valid use cases.

AFAICT, the patch to make the API better support enums had the side-effect of inhibiting the APIs ability to support number objects that want to control their output via __str__ or __repr__.  This seems to block-off decimal support and support for controlling displayed precision.

I think the Enum patch is suspect and could be considered a regression.  That said, we could simply add direct support for decimals and leave the enum patch in-place (though it still impairs a user's ability to control the displayed precision).
msg233166 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2014-12-29 09:08
I guess my particular requirement/wish is unusual (keeping the original textual representation of a floating point number intact) while using Decimal should be fairly universal.

If these things could be combined in a Decimal support option I would (of course) be extremely happy.  They do not appear to be in conflict.

Currently I'm a bit bogged down by the crypto-stuff since it is spread over different and incompatible modules which makes it awkward creating a nice unified RSA/EC solution.  I may end-up writing a wrapper...
msg233168 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2014-12-29 09:30
I don't think it's reasonable to expect Decimal to always output precisely the same string it was given. It's a waste of complexity and space and the only time you would want this behavior is when you really should've left it accessible as a string in the first place.

It sounds like the spec for that signature may be poorly designed (with regard to portability). Relying on the precise string output of a number is not going to work in any JSON parser I've ever seen. You'd need to work at the tokenizer level and not all of the parsers provide an interface at that layer (since many of them combine tokenization and parsing).
msg233169 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2014-12-29 09:40
Bob,
Your'e right, I have put up a requirement for JSON serializing that may be "over the top".  OTOH, there are (AFAICT...) only two possible solutions:
1. Outlaw floating point data from the plot
2. Insist that serializers conform to the spec

As a pragmatic I have settled on something in between :-)
https://openkeystore.googlecode.com/svn/resources/trunk/docs/jcs.html#Interoperability

I don't think that the overhead in Decimal would be a problem but I'm not a Python platform maintainer so I leave it to you guys.
msg233170 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2014-12-29 09:58
Well, I could have insisted on canonicalization of floating-point data but that's so awkward that outlawing such data is a cleaner approach.  Since the target for JCS is security- and payment-protocols, I don't think the absence of floating-point support will be a show-stopper. I does though make the IETF folks unhappy.

Another reason for still wanting it to work as currently specified is because it would be nice to have JCS running on three fully compatible platforms, including one which I haven't designed :-)
msg233175 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2014-12-29 15:23
Using simplejson I got it to work!!!
I just wonder what you think of the solution:

import collections
import simplejson as json
from decimal import Decimal

class EnhancedDecimal(Decimal):
   def __str__ (self):
     return self.saved_string

   def __new__(cls, value="0", context=None):
     obj = Decimal.__new__(cls,value,context)
     obj.saved_string = value
     return obj;

jsonString = '{"t":6,"h":4.50, "g":"text","j":1.40e450}'
jsonObject = json.loads(jsonString, object_pairs_hook=collections.OrderedDict,parse_float=EnhancedDecimal)
for item in jsonObject:
  print jsonObject[item]
print json.dumps(jsonObject)

6
4.50
text
1.40e450
{"t": 6, "h": 4.50, "g": "text", "j": 1.40e450}
msg233183 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2014-12-29 20:42
Yeah, that's the hack I was suggesting.

I suppose I don't see the point of having a protocol that normalizes *almost* everything. Normalization should be all or nothing. Other options would be to define the signature at the encoded byte level with no normalization (in which case you could use any off the shelf signing), or at the value level and prescribe a specific interpretation for data types. I would've done it at the value level and prescribed that dictionaries should be key sorted, strings dealt with as UTF-8, and numbers as IEEE 754. I would make sure not to depend on the decimal conversion of numbers, and just work with the serialized bit representation in a particular endian (which you can even do efficiently in modern browser JS with Float64Array, DataView and ArrayBuffer). For JS portability it'd probably treat *all* numbers as floats in the same way, whether they had a decimal to begin with or not.
msg233184 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2014-12-29 21:19
Bob,
I'm not sure I understand why you say that JCS requires *almost* full normalization.  Using browsers you can generate fully compliant JCS objects using like 20 lines of javascript/webcrypto (here excluding base64 support).  No normalization step is needed.

But sure, the IETF JOSE WG has taken an entirely different approach and require JSON objects to be serialized and Base64-encoded.  Then the Base64 is signed.  Boring.  And in conflict with complex messaging like:
https://openkeystore.googlecode.com/svn/wcpp-payment-demo/trunk/docs/messages.html#UserAuthorizesTransaction

Thanx anyway, I'm pretty happy with how it works now!

Well, if Decimal didn't manipulate its argument I would be even happier :-) because then there wouldn't even be a hack.
msg233187 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2014-12-29 23:20
The current JCS validator is only 150 lines and does both RSA and EC signatures:

https://code.google.com/p/openkeystore/source/browse/python/trunk/src/org/webpki/json/JCSValidator.py

My Java-version is much more advanced but this is quite useful anyway
msg233188 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-12-30 00:19
Raymond Hettinger added the comment:
-----------------------------------
> This bug report isn't a JSON spec issue; rather, it is about how the JSON module API can
> support (or inhibit) valid use cases.
> 
> AFAICT, the patch to make the API better support enums had the side-effect of inhibiting
> the APIs ability to support number objects that want to control their output via __str__
> or __repr__.  This seems to block-off decimal support and support for controlling displayed
> precision.
> 
> I think the Enum patch is suspect and could be considered a regression.  That said, we
> could simply add direct support for decimals and leave the enum patch in-place (though it
> still impairs a user's ability to control the displayed precision).

The enum patch is in issue18264 if anyone wants to read the discussion.

I am not a regular json user, but my impression is the format is pretty basic, and we would be overloading it to try and keep numbers with three decimal places as Decimal, and anything else as float.

Isn't json's main purpose to support data exchange between different programs of different languages?  Not between different Python programs?
msg233191 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2014-12-30 05:43
Ethan Furman added the comment:

> I am not a regular json user, but my impression is the format is
> pretty  basic, and we would be overloading it to try and keep numbers
> with three decimal places as Decimal, and anything else as float.

> Isn't json's main purpose to support data exchange between different
> programs of different languages?  Not between different Python
> programs?

Right, unfortunately the need to support non-native data types like big decimals, dates and blobs have lead to a certain amount of confusion and innovation among JSON tool designers.

I (FWIW) do actually NOT want to extend a single bit from the RFC, I just want serializing to be "non-invasive".   If the parse_float option stays "as is" it seems that both the people who want big (non-standard) numbers and I who want somewhat non-standard serialization would be happy.  I.e. a documentation snippet would be sufficient as far as I can tell.

Serialization order of objects is apparently a hot topic
https://code.google.com/p/v8/issues/detail?id=164
but Python has no problem with that.
msg233193 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-12-30 08:28
"""To cope with this potential problem, compliant parsers must preserve the original textual representation of properties internally in order to support JCS normalization requirements"""

That sounds ridiculous. Did someone try to reason the "IETF guys"? :)
msg233195 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2014-12-30 09:16
> Antoine Pitrou added the comment:
> 
> "To cope with this potential problem, compliant parsers must preserve the original textual representation of properties internally in order to support JCS normalization requirements"
> 
> That sounds ridiculous. Did someone try to reason the "IETF guys"?:)

The alternative is either doing what Bob suggested which is almost the same as writing a new parser or take the IETF route and shroud the message payload in base64.

So all solutions are "by definition" baaaaaaaaaaaaaaaad :-)

FWIW my super-bad solution has the following compatibility issues:
- Whitespace: None, all parsers can "stringify", right?
- Escaping: None, all parsers MUST do it to follow the JSON spec.
- Property order: A problem in some parsers.  If you take a look on stackoverflow lots of folks request that insertion/reader order should be honored since computers <> humans.  Fixed in Python. Works in browsers as well.
- Floating point: an almost useless JSON feature anyway, it doesn't work for crypto-numbers or money.  It is "only" a validation problem though.  Now fixed in Python.

http://www.ietf.org/mail-archive/web/acme/current/msg00200.html
msg233198 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-12-30 09:39
I won't claim to know/understand the specifics, but "message payload in base64" actually sounds reasonable to me, if far from optimal (both from readibility and space overhead POV) :-).
msg233199 - (view) Author: Anders Rundgren (anders.rundgren.net@gmail.com) Date: 2014-12-30 09:47
> Antoine Pitrou added the comment:
> 
> I won't claim to know/understand the specifics, but "message payload in base64" actually sounds reasonable to me, if far from optimal (both from readibility and space overhead POV) :-).

It is indeed a working solution.  I do though think that communities that previously used XML would accept base64-encoded messages.  It becomes really messy when applied to counter-signed messages like the following:

{
  "@context": "http://xmlns.webpki.org/wcpp-payment-demo",
  "@qualifier": "AuthData",
  "paymentRequest": 
    {
      "commonName": "Demo Merchant",
      "amount": 8600550,
      "currency": "USD",
      "referenceId": "#1000000",
      "dateTime": "2014-12-18T13:39:35Z",
      "signature": 
        {
          "algorithm": "RS256",
          "signerCertificate": 
            {
              "issuer": "CN=Merchant Network Sub CA5,C=DE",
              "serialNumber": "1413983542582",
              "subject": "CN=Demo Merchant,2.5.4.5=#1306383936333235,C=DE"
            },
          "certificatePath": 
            [
              "MIIDQzCCAiugAwIBAgIGAUk3_J02M...eMGlY734U3NasQfAhTUhxrdDbphEvsWTc",
              "MIIEPzCCAiegAwIBAgIBBTANBgkqh...gU1IyRGA7IbdHOeDB2RUpsXloU2QKfLrk"
            ],
          "value": "Ny4Qe6FQhd5_qcSc3xiH8Kt7tIZ9Z...9LEjC6_Rulg_G20fGxJ-wzezFpsAGbmuFQg"
        }
    },
  "domainName": "merchant.com",
  "cardType": "SuperCard",
  "pan": "1618342124919252",
  "dateTime": "2014-12-18T13:40:59Z",
  "signature": 
    {
      "algorithm": "RS256",
      "signerCertificate": 
        {
          "issuer": "CN=Mybank Client Root CA1,C=US",
          "serialNumber": "1413983550045",
          "subject": "CN=The Cardholder,2.5.4.5=#13083935363733353232"
        },
      "certificatePath": ["MIIENzCCAh-gAwIBAgIGAUk3_LpdM...IGcN1md5feo5DndNnV8D0UM-oBRkUDDFiWlhCU"],
      "value": "wyUcFcHmvM5ZozZKOEwOQkYic0D7M...S_HbaPGau5KfZjCaksvb5U1lYZaXxP8kAbuGPQ"
    }
}
msg233238 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-12-31 10:00
See also #16535.
History
Date User Action Args
2022-04-11 14:58:11adminsetgithub: 67312
2015-07-21 07:40:49ethan.furmansetnosy: - ethan.furman
2014-12-31 10:00:25mark.dickinsonsetnosy: + mark.dickinson
messages: + msg233238
2014-12-30 09:47:17anders.rundgren.net@gmail.comsetmessages: + msg233199
2014-12-30 09:39:25pitrousetmessages: + msg233198
2014-12-30 09:16:22anders.rundgren.net@gmail.comsetmessages: + msg233195
2014-12-30 08:28:01pitrousetnosy: + pitrou
messages: + msg233193
2014-12-30 05:43:19anders.rundgren.net@gmail.comsetmessages: + msg233191
2014-12-30 00:19:27ethan.furmansetnosy: + barry, ncoghlan, eli.bendersky
messages: + msg233188
2014-12-29 23:20:23anders.rundgren.net@gmail.comsetmessages: + msg233187
2014-12-29 21:19:04anders.rundgren.net@gmail.comsetmessages: + msg233184
2014-12-29 20:42:26bob.ippolitosetmessages: + msg233183
2014-12-29 15:23:07anders.rundgren.net@gmail.comsetmessages: + msg233175
2014-12-29 09:58:30anders.rundgren.net@gmail.comsetmessages: + msg233170
2014-12-29 09:40:06anders.rundgren.net@gmail.comsetmessages: + msg233169
2014-12-29 09:30:16bob.ippolitosetmessages: + msg233168
2014-12-29 09:08:06anders.rundgren.net@gmail.comsetmessages: + msg233166
2014-12-29 08:30:19rhettingersetmessages: + msg233165
2014-12-29 07:53:59bob.ippolitosetmessages: + msg233163
2014-12-29 07:48:54anders.rundgren.net@gmail.comsetmessages: + msg233162
2014-12-29 07:31:06bob.ippolitosetmessages: + msg233161
2014-12-29 05:46:09anders.rundgren.net@gmail.comsetmessages: + msg233160
2014-12-29 05:32:17bob.ippolitosetmessages: + msg233159
2014-12-29 01:20:03ethan.furmansetmessages: + msg233157
2014-12-28 23:36:04rhettingersetversions: + Python 3.4, Python 3.5
nosy: + rhettinger, ethan.furman, bob.ippolito

messages: + msg233156

assignee: bob.ippolito
2014-12-27 17:10:51anders.rundgren.net@gmail.comcreate