Message 130889 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rhettinger
Recipients	Brian.Merrell, belopolsky, rhettinger, vstinner
Date	2011-03-14.20:09:35
SpamBayes Score	2.553513e-15
Marked as misclassified	No
Message-id	<1300133375.67.0.828879918057.issue11489@psf.upfronthosting.co.za>
In-reply-to

Content
> We seem to be in the worst of both worlds right now > as I've generated and stored a lot of json that can > not be read back in This is unfortunate. The dumps() should have never worked in the first place. I don't think that loads() should be changed to accommodate the dumps() error though. JSON is UTF-8 by definition and it is a useful feature that invalid UTF-8 won't load. To fix the data you've already created (one that other compliant JSON readers wouldn't be able to parse), I think you need to repreprocess those file to make them valid: bs.decode('utf-8', errors='ignore').encode('utf-8') Then we need to fix dumps so that it doesn't silently create invalid JSON. > This on the other hand should probably be > fixed by either rejecting lone surrogates > in json.dumps or accepting them in json.loads or both. Rejection is the right way to go. For the most part, it is never helpful to create invalid JSON files that other readers can't and shouldn't read.

> We seem to be in the worst of both worlds right now 
> as I've generated and stored a lot of json that can 
> not be read back in

This is unfortunate.  The dumps() should have never worked in the first place.

I don't think that loads() should be changed to accommodate the dumps() error though.  JSON is UTF-8 by definition and it is a useful feature that invalid UTF-8 won't load.

To fix the data you've already created (one that other compliant JSON readers wouldn't be able to parse), I think you need to repreprocess those file to make them valid:

   bs.decode('utf-8', errors='ignore').encode('utf-8')

Then we need to fix dumps so that it doesn't silently create invalid JSON.

> This on the other hand should probably be 
> fixed by either rejecting lone surrogates 
> in json.dumps or accepting them in json.loads or both.

Rejection is the right way to go.  For the most part,
it is never helpful to create invalid JSON files that
other readers can't and shouldn't read.

History
Date	User	Action	Args
2011-03-14 20:09:35	rhettinger	set	recipients: + rhettinger, belopolsky, vstinner, Brian.Merrell
2011-03-14 20:09:35	rhettinger	set	messageid: <1300133375.67.0.828879918057.issue11489@psf.upfronthosting.co.za>
2011-03-14 20:09:35	rhettinger	link	issue11489 messages
2011-03-14 20:09:35	rhettinger	create