Title: parse failed for mutibytes characters, encode will show in \xxx
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
Status: closed Resolution: fixed
Assigned To: Nosy List: ezio.melotti, methane, miss-islington, vstinner, zhou.ronghua
Priority: normal Keywords: patch

Created on 2018-05-29 14:29 by zhou.ronghua, last changed 2022-04-11 14:59 by admin. This issue is now closed.

msg318039 - (view) Author: zhou.ronghua (zhou.ronghua) * Date: 2018-05-29 14:29
when type this command in windows(xp or win7, all the same):
python -m json.tool xxx.txt xxx.json
if xxx.txt contains Chinese(or other multibytes characters):
if xxx.txt is encoded in ansi, xxx.json will encode Chinese as \xxx, very bad to see what they are;
if xxx.txt is encoded in utf8(without bom for most of the time), because with no bom, json.tool will think it is encoded in ansi, and decode fail.

as now, utf8 is widely use, set default to utf8 for most of the time when auto detect encoding failed
msg357786 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-12-04 09:39
New changeset 808769f3a4cbdc47cf1a5708dd61b1787bb192d4 by Inada Naoki in branch 'master':
bpo-33684: json.tool: Use utf-8 for infile and outfile. (GH-17460)
msg357789 - (view) Author: miss-islington (miss-islington) Date: 2019-12-04 09:57
New changeset a75cad440ab50d823af5f06e51dfed3a319f1e8c by Miss Islington (bot) in branch '3.8':
bpo-33684: json.tool: Use utf-8 for infile and outfile. (GH-17460)
msg357791 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-12-04 10:26
New changeset e0f148e6635480521036415bd782c3424fe6c619 by Inada Naoki in branch '3.7':
bpo-33684: json.tool: Use utf-8 for infile and outfile. (GH-17460)
