Message 208430 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mu_mind
Recipients	mu_mind
Date	2014-01-19.02:19:12
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1390097954.47.0.6743273993.issue20298@psf.upfronthosting.co.za>
In-reply-to

Content
Many sloppy JSON APIs return data with poorly-encoded strings, either with single-quotes as delimiters: {'foo': 'bar'} or with no delimiters at all on keys: {foo: 'bar'} The json library is useless for making sense of this data, because all it will tell you is "No JSON object could be decoded". It would be incredibly helpful if the json library had some special non-strict decoding mode that could interpret these common misspellings of JSON strings. Or, more generally, it could accept another decoder hook to reformat the remaining string, something like: def malformed_hook(remaining, parent): if remaining.startswith("'"): # We know this is a string, regex it to find the end. m = re.match(pyparsing.quotedString.reString, remaining) if m is not None: # Use json.dumps to add quotes around string literal. return json.dumps(eval(m.group(0))) + remaining[m.end():] # If we're inside an object, this could be a naked object key. if isinstance(parent, dict): m = re.match(r'([a-zA-Z_]\w*):', remaining) if m is not None: return json.dumps(m.group(1)) + remaining[len(m.group(1)):] print(json.loads("['foo', null, {a: 'b'}]", malformed_hook=malformed_hook)) ['foo', None, {'a': 'b'}] This would at least save you having to write a parser/tokenizer from scratch.

Many sloppy JSON APIs return data with poorly-encoded strings, either with single-quotes as delimiters:
  {'foo': 'bar'}
or with no delimiters at all on keys:
  {foo: 'bar'}

The json library is useless for making sense of this data, because all it will tell you is "No JSON object could be decoded".

It would be incredibly helpful if the json library had some special non-strict decoding mode that could interpret these common misspellings of JSON strings. Or, more generally, it could accept another decoder hook to reformat the remaining string, something like:
  def malformed_hook(remaining, parent):
      if remaining.startswith("'"):
          # We know this is a string, regex it to find the end.
          m = re.match(pyparsing.quotedString.reString, remaining)
          if m is not None:
              # Use json.dumps to add quotes around string literal.
              return json.dumps(eval(m.group(0))) + remaining[m.end():]
      # If we're inside an object, this could be a naked object key.
      if isinstance(parent, dict):
          m = re.match(r'([a-zA-Z_]\w*):', remaining)
          if m is not None:
              return json.dumps(m.group(1)) + remaining[len(m.group(1)):]
  print(json.loads("['foo', null, {a: 'b'}]", malformed_hook=malformed_hook))
  ['foo', None, {'a': 'b'}]

This would at least save you having to write a parser/tokenizer from scratch.

History
Date	User	Action	Args
2014-01-19 02:19:14	mu_mind	set	recipients: + mu_mind
2014-01-19 02:19:14	mu_mind	set	messageid: <1390097954.47.0.6743273993.issue20298@psf.upfronthosting.co.za>
2014-01-19 02:19:14	mu_mind	link	issue20298 messages
2014-01-19 02:19:12	mu_mind	create