This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author rhettinger
Recipients Brian.Merrell, belopolsky, rhettinger, vstinner
Date 2011-03-14.20:09:35
SpamBayes Score 2.553513e-15
Marked as misclassified No
Message-id <1300133375.67.0.828879918057.issue11489@psf.upfronthosting.co.za>
In-reply-to
Content
> We seem to be in the worst of both worlds right now 
> as I've generated and stored a lot of json that can 
> not be read back in

This is unfortunate.  The dumps() should have never worked in the first place.

I don't think that loads() should be changed to accommodate the dumps() error though.  JSON is UTF-8 by definition and it is a useful feature that invalid UTF-8 won't load.

To fix the data you've already created (one that other compliant JSON readers wouldn't be able to parse), I think you need to repreprocess those file to make them valid:

   bs.decode('utf-8', errors='ignore').encode('utf-8')

Then we need to fix dumps so that it doesn't silently create invalid JSON.

> This on the other hand should probably be 
> fixed by either rejecting lone surrogates 
> in json.dumps or accepting them in json.loads or both.

Rejection is the right way to go.  For the most part,
it is never helpful to create invalid JSON files that
other readers can't and shouldn't read.
History
Date User Action Args
2011-03-14 20:09:35rhettingersetrecipients: + rhettinger, belopolsky, vstinner, Brian.Merrell
2011-03-14 20:09:35rhettingersetmessageid: <1300133375.67.0.828879918057.issue11489@psf.upfronthosting.co.za>
2011-03-14 20:09:35rhettingerlinkissue11489 messages
2011-03-14 20:09:35rhettingercreate