|
msg126772 - (view) |
Author: (hhas) |
Date: 2011-01-21 19:01 |
json.loads() accepts strings but errors on bytes objects. Documentation and API indicate that both should work. Review of json/__init__.py code shows that the loads() function's 'encoding' arg is ignored and no decoding takes place before the object is passed to JSONDecoder.decode()
Tested on Python 3.1.2 and Python 3.2rc1; fails on both.
Example:
#################################################
#!/usr/local/bin/python3.2
import json
print(json.loads('123'))
# 123
print(json.loads(b'123'))
# /Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/json/decoder.py:325:
# TypeError: can't use a string pattern on a bytes-like object
print(json.loads(b'123', encoding='utf-8'))
# /Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/json/decoder.py:325:
# TypeError: can't use a string pattern on a bytes-like object
#################################################
Patch attached.
|
|
msg126782 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2011-01-21 20:35 |
Hmm. According to issue 4136, all bytes support was supposed to have been removed.
|
|
msg126785 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-01-21 20:46 |
Indeed, the documentation (and function docstring) needs fixing instead. It's a pity we didn't remove the useless `encoding` parameter.
|
|
msg126786 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2011-01-21 20:54 |
Georg: Is it still time to deprecate the encoding parameter in 3.2?
|
|
msg126788 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-01-21 21:38 |
I've committed a doc fix in r88137.
|
|
msg126831 - (view) |
Author: (hhas) |
Date: 2011-01-22 12:28 |
Doc fix works for me.
|
|
msg126986 - (view) |
Author: Anthony Long (antlong) |
Date: 2011-01-25 03:38 |
Works for me, py2.7 on snow leopard.
|
|
msg126997 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2011-01-25 11:42 |
anthony: this is python3-only problem.
|
|
msg133645 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2011-04-13 07:23 |
Now it's too late for 3.2, should this be done for 3.3?
|
|
msg133672 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2011-04-13 15:40 |
If you’re talking about deprecating the obsolete encoding argument (maybe it’s time for a new bug report), +1.
|
|
msg145343 - (view) |
Author: Barry A. Warsaw (barry) *  |
Date: 2011-10-11 13:44 |
I'll just mention that the elimination of bytes handling is a bit unfortunate, since this idiom which works in Python 2 no longer works:
fp = urlopen(url)
json_data = json.load(fp)
/me sad
|
|
msg145345 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-10-11 13:51 |
> I'll just mention that the elimination of bytes handling is a bit
> unfortunate, since this idiom which works in Python 2 no longer works:
>
> fp = urlopen(url)
> json_data = json.load(fp)
What if the returned JSON uses a charset other than utf-8 ?
|
|
msg159359 - (view) |
Author: Balthazar Rouberol (Balthazar.Rouberol) |
Date: 2012-04-26 08:20 |
I know this does not fix anything at the core, but it would allow you to use json.loads() with python 3.2 (maybe 3.1?):
Replace
json.loads(raw_data)
by
raw_data = raw_data.decode('utf-8') # Or any other ISO format
json.loads(raw_data)
|
|
msg159360 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-04-26 08:34 |
> What if the returned JSON uses a charset other than utf-8 ?
According to RFC 4627: "JSON text SHALL be encoded in Unicode. The default encoding is UTF-8." RFC 4627 also offers a way to autodetect other Unicode encodings.
|
|
msg159364 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-04-26 13:03 |
Well, adding support for bytes objects using the spec from RFC 4627 (or at least with utf-8 as a default) may be an enhancement for 3.3.
|
|
msg159366 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-04-26 14:07 |
Things are a little more complicated. '123' is not a valid JSON according to RFC 4627 (the top-level element can only be an object or an array). This means that the autodetection algorithm will not always work for such non-standard data.
If we can parse binary data, then there must be a way to generate binary data in at least one of the Unicode encodings.
By the way, the documentation should give a link to RFC 4627 and explain the current implementation is different from it.
|
|
msg159368 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-04-26 14:21 |
> Things are a little more complicated. '123' is not a valid JSON
> according to RFC 4627 (the top-level element can only be an object or
> an array). This means that the autodetection algorithm will not always
> work for such non-standard data.
The autodetection algorithm needn't examine all 4 first bytes. If the 2
first bytes are non-zero, you have UTF-8 data. Otherwise, the JSON text
will be at least 4 bytes long (since it's either UTF-16 or UTF-32).
|
|
msg159388 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-04-26 15:48 |
I mean a string that starts with '\u0000'. b'"\x00...'.
|
|
msg159391 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-04-26 16:12 |
Le jeudi 26 avril 2012 à 15:48 +0000, Serhiy Storchaka a écrit :
>
> I mean a string that starts with '\u0000'. b'"\x00...'.
According to the RFC, that should be escaped:
All Unicode characters may be placed within the
quotation marks except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).
And indeed:
>>> json.loads('"\u0000"')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/antoine/opt/lib/python3.2/json/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/home/antoine/opt/lib/python3.2/json/decoder.py", line 351, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/antoine/opt/lib/python3.2/json/decoder.py", line 367, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 1 (char 1)
>>> json.loads('"\\u0000"')
'\x00'
|
|
msg159395 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-04-26 16:21 |
According to current implementation this is acceptable.
>>> json.loads('"\u0000"', strict=False)
'\x00'
|
|
msg159454 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-04-27 14:06 |
> According to current implementation this is acceptable.
Then perhaps auto-detection can be restricted to strict mode? Non-strict mode would always use utf-8.
Or we can just skip auto-detection altogether (I don't think many people produce utf-16 or utf-32 JSON; that would be a waste of bandwidth for no obvious benefit).
|
|
msg159469 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-04-27 15:28 |
Related to this question is a question about errors. How to inform the user, if an error occurred in the decoding with detected encoding? Leave UnicodeDecodeError or convert it to ValueError? If there is a syntax error in JSON -- exception will refer to the position in the decoded string, we should to translate it to the position in the original binary string?
|
|
| Date |
User |
Action |
Args |
| 2012-04-27 15:28:06 | serhiy.storchaka | set | messages:
+ msg159469 |
| 2012-04-27 14:06:12 | pitrou | set | messages:
+ msg159454 |
| 2012-04-26 16:21:34 | serhiy.storchaka | set | messages:
+ msg159395 |
| 2012-04-26 16:12:44 | pitrou | set | messages:
+ msg159391 |
| 2012-04-26 15:48:23 | serhiy.storchaka | set | messages:
+ msg159388 |
| 2012-04-26 15:09:07 | eric.araujo | set | title: json.loads() throws TypeError on bytes object -> json.loads() raises TypeError on bytes object |
| 2012-04-26 14:21:40 | pitrou | set | messages:
+ msg159368 |
| 2012-04-26 14:07:45 | serhiy.storchaka | set | messages:
+ msg159366 |
| 2012-04-26 13:03:55 | pitrou | set | versions:
+ Python 3.3, - Python 3.2 messages:
+ msg159364
assignee: docs@python -> components:
+ Library (Lib), - Documentation type: behavior -> enhancement stage: needs patch |
| 2012-04-26 08:34:31 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg159360
|
| 2012-04-26 08:20:56 | Balthazar.Rouberol | set | nosy:
+ Balthazar.Rouberol messages:
+ msg159359
|
| 2011-10-11 13:51:37 | pitrou | set | messages:
+ msg145345 |
| 2011-10-11 13:44:47 | barry | set | nosy:
+ barry messages:
+ msg145343
|
| 2011-04-13 15:40:46 | eric.araujo | set | messages:
+ msg133672 versions:
- Python 3.1 |
| 2011-04-13 07:23:28 | ezio.melotti | set | nosy:
+ ezio.melotti messages:
+ msg133645
|
| 2011-01-25 11:42:30 | r.david.murray | set | nosy:
georg.brandl, hhas, pitrou, eric.araujo, r.david.murray, docs@python, antlong messages:
+ msg126997 |
| 2011-01-25 03:38:49 | antlong | set | nosy:
+ antlong messages:
+ msg126986
|
| 2011-01-22 12:28:33 | hhas | set | nosy:
georg.brandl, hhas, pitrou, eric.araujo, r.david.murray, docs@python messages:
+ msg126831 |
| 2011-01-21 21:38:06 | pitrou | set | nosy:
georg.brandl, hhas, pitrou, eric.araujo, r.david.murray, docs@python messages:
+ msg126788 |
| 2011-01-21 20:54:35 | eric.araujo | set | nosy:
+ eric.araujo, georg.brandl messages:
+ msg126786
|
| 2011-01-21 20:46:48 | pitrou | set | nosy:
+ docs@python messages:
+ msg126785
assignee: docs@python components:
+ Documentation, - Library (Lib) |
| 2011-01-21 20:35:32 | r.david.murray | set | nosy:
+ r.david.murray, pitrou messages:
+ msg126782
|
| 2011-01-21 19:01:47 | hhas | create | |