Message130889
> We seem to be in the worst of both worlds right now
> as I've generated and stored a lot of json that can
> not be read back in
This is unfortunate. The dumps() should have never worked in the first place.
I don't think that loads() should be changed to accommodate the dumps() error though. JSON is UTF-8 by definition and it is a useful feature that invalid UTF-8 won't load.
To fix the data you've already created (one that other compliant JSON readers wouldn't be able to parse), I think you need to repreprocess those file to make them valid:
bs.decode('utf-8', errors='ignore').encode('utf-8')
Then we need to fix dumps so that it doesn't silently create invalid JSON.
> This on the other hand should probably be
> fixed by either rejecting lone surrogates
> in json.dumps or accepting them in json.loads or both.
Rejection is the right way to go. For the most part,
it is never helpful to create invalid JSON files that
other readers can't and shouldn't read. |
|
Date |
User |
Action |
Args |
2011-03-14 20:09:35 | rhettinger | set | recipients:
+ rhettinger, belopolsky, vstinner, Brian.Merrell |
2011-03-14 20:09:35 | rhettinger | set | messageid: <1300133375.67.0.828879918057.issue11489@psf.upfronthosting.co.za> |
2011-03-14 20:09:35 | rhettinger | link | issue11489 messages |
2011-03-14 20:09:35 | rhettinger | create | |
|