Author serhiy.storchaka
Recipients ezio.melotti, pitrou, rhettinger, serhiy.storchaka
Date 2013-05-05.13:10:30
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1367759430.76.0.988181674037.issue17909@psf.upfronthosting.co.za>
In-reply-to
Content
RFC 4627 specifies a method to determine an encoding (one of UTF-8, UTF-16(BE|LE) or UTF-32(BE|LE)) of encoded JSON text. The proposed preliminary patch (it doesn't include the documentation yet) allows load() and loads() functions accept bytes data when it is encoded with standard Unicode encoding. Also accepted data with BOM (this doesn't specified in RFC 4627, but is widely used).

There is only one case where the method can give a misfire. Serialized string "\x00..." encoded in UTF-16LE may be erroneously detected as encoded in UTF-32LE. This case violates the two rules of RFC 4627: the string was serialized instead of a an object or an array, and the control character U+0000 was not escaped. The standard encoded JSON always detected correctly.

This patch requires "surrogatepass" error handler for utf-16/32 (see issue12892 and issue13916).
History
Date User Action Args
2013-05-05 13:10:30serhiy.storchakasetrecipients: + serhiy.storchaka, rhettinger, pitrou, ezio.melotti
2013-05-05 13:10:30serhiy.storchakasetmessageid: <1367759430.76.0.988181674037.issue17909@psf.upfronthosting.co.za>
2013-05-05 13:10:30serhiy.storchakalinkissue17909 messages
2013-05-05 13:10:30serhiy.storchakacreate