Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding problem: coding:gbk cause syntaxError #79321

Open
anmikf mannequin opened this issue Nov 2, 2018 · 12 comments
Open

encoding problem: coding:gbk cause syntaxError #79321

anmikf mannequin opened this issue Nov 2, 2018 · 12 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes OS-windows type-bug An unexpected behavior, bug, or error

Comments

@anmikf
Copy link
Mannequin

anmikf mannequin commented Nov 2, 2018

BPO 35140
Nosy @pfmoore, @tjguk, @zware, @zooba, @animalize, @eamanu, @Windsooon
Files
  • encoding_problem_gbk.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2018-11-02.02:33:33.628>
    labels = ['3.8', 'type-bug', '3.7', 'OS-windows']
    title = 'encoding problem: coding:gbk cause syntaxError'
    updated_at = <Date 2018-11-11.20:48:47.064>
    user = 'https://bugs.python.org/anmikf'

    bugs.python.org fields:

    activity = <Date 2018-11-11.20:48:47.064>
    actor = 'eamanu'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Windows']
    creation = <Date 2018-11-02.02:33:33.628>
    creator = 'anmikf'
    dependencies = []
    files = ['47901']
    hgrepos = []
    issue_num = 35140
    keywords = []
    message_count = 12.0
    messages = ['329098', '329108', '329113', '329117', '329118', '329119', '329120', '329121', '329122', '329658', '329674', '329686']
    nosy_count = 8.0
    nosy_names = ['paul.moore', 'tim.golden', 'zach.ware', 'steve.dower', 'malin', 'eamanu', 'Windson Yang', 'anmikf']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue35140'
    versions = ['Python 3.5', 'Python 3.6', 'Python 3.7', 'Python 3.8']

    @anmikf
    Copy link
    Mannequin Author

    anmikf mannequin commented Nov 2, 2018

    OS 名称: Microsoft Windows 10 专业版
    OS 版本: 10.0.15063 暂缺 Build 15063
    OS 制造商: Microsoft Corporation
    OS 配置: 独立工作站
    OS 构件类型: Multiprocessor Free
    注册的所有人: Windows 用户
    注册的组织:
    产品 ID: 00330-80000-00000-AA183
    初始安装日期: 2017/04/10, 17:24:40
    系统启动时间: 2018/09/18, 09:44:52
    系统制造商: Dell Inc.
    系统型号: OptiPlex 9010
    系统类型: x64-based PC

    Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)] on win32

    @anmikf anmikf mannequin added 3.7 (EOL) end of life OS-windows type-bug An unexpected behavior, bug, or error labels Nov 2, 2018
    @Windsooon
    Copy link
    Mannequin

    Windsooon mannequin commented Nov 2, 2018

    If I understand your question correctly, you should save the file(the one contain Chinese chars) with GBK encoding using your editor. Otherwise, your editor would save it using the default encoding which led to python can't decode it correctly.

    @animalize
    Copy link
    Mannequin

    animalize mannequin commented Nov 2, 2018

    Let me give an explanation.
    Run encoding_problem_gbk.py, get an error:

    D:\>encoding_problem_gbk.py
    File "D:\encoding_problem_gbk.py", line 1
    SyntaxError: encoding problem: gbk

    If remove the comment line, run as expected.

    @Windsooon
    Copy link
    Mannequin

    Windsooon mannequin commented Nov 2, 2018

    Thank you, Lin. Can you reproduce on your machine, I guess it is related to terminal encoding or text file ending. However, I can't reproduce on macOS.

    @animalize
    Copy link
    Mannequin

    animalize mannequin commented Nov 2, 2018

    Yes, I can reproduce on my Windows 10 (Simplfied Chinese).
    The file is a pure ASCII file, and doesn't have a BOM prefix.

    @anmikf
    Copy link
    Mannequin Author

    anmikf mannequin commented Nov 2, 2018

    this problem not exist on macOS.
    this problem not exist in python2.

    Windows10x64 Python 3.7.0 (v3.7.0:1bf9cc5093

    script have no problem with 15 blank lines.
    script have problem with fist line '#coding:gbk' and 14 blank lines.

    @anmikf
    Copy link
    Mannequin Author

    anmikf mannequin commented Nov 2, 2018

    I'm sorry for my english.
    Can I use Chinese?

    @tjguk
    Copy link
    Member

    tjguk commented Nov 2, 2018

    I'm afraid you'll have to use English in this forum so that all current and future readers have the best chance of understanding the situation. Thank you very much for making the effort this far.

    If anyone on this issue knows of a Chinese-language forum where this issue could explored before coming back here, please say so. Otherwise I'll ask around on Twitter etc. to see what's available

    @Windsooon
    Copy link
    Mannequin

    Windsooon mannequin commented Nov 2, 2018

    It's fine @anmikf, keep practice :D. Let's recap what happened:

    Run encoding_problem_gbk.py on Windows10 using Python 3.7.0 will cause "SyntaxError: encoding problem: gbk". But it will run as expected if

    1. The file has less than less than 15 lines.
    2. Change coding:gbk to other encoding (like utf-8)
    3. Remove coding:gbk

    @Windsooon Windsooon mannequin changed the title encoding problem: gbk encoding problem: coding:gbk cause syntaxError Nov 2, 2018
    @animalize
    Copy link
    Mannequin

    animalize mannequin commented Nov 11, 2018

    I debugged, this is a duplicate of bpo-20844 and bpo-27797.
    Eryk Sun analyzed this detailedly, it's a problem of Windows CRT.

    @animalize animalize mannequin added the 3.8 only security fixes label Nov 11, 2018
    @zooba
    Copy link
    Member

    zooba commented Nov 11, 2018

    Yes, seems like we should be opening the file in binary mode, though I haven't tried it. The CRT's interpretation of text mode really isn't compatible with Python's own interpretation of text mode, and chaining them makes even less sense.

    @eamanu
    Copy link
    Mannequin

    eamanu mannequin commented Nov 11, 2018

    I can not reproduce this issue on my Debian9.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes OS-windows type-bug An unexpected behavior, bug, or error
    Projects
    Development

    No branches or pull requests

    2 participants