Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

built-in compile() should take encoding option. #50161

Closed
methane opened this issue May 3, 2009 · 6 comments
Closed

built-in compile() should take encoding option. #50161

methane opened this issue May 3, 2009 · 6 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode type-feature A feature request or enhancement

Comments

@methane
Copy link
Member

methane commented May 3, 2009

BPO 5911
Nosy @facundobatista, @benjaminp, @Trundle, @methane
Files
  • compile_with_encoding.patch: add encoding option and test.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-08-25.04:12:08.065>
    created_at = <Date 2009-05-03.01:55:23.839>
    labels = ['interpreter-core', 'type-feature', 'expert-unicode']
    title = 'built-in compile() should take encoding option.'
    updated_at = <Date 2010-08-25.04:12:08.064>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2010-08-25.04:12:08.064>
    actor = 'benjamin.peterson'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-08-25.04:12:08.065>
    closer = 'benjamin.peterson'
    components = ['Interpreter Core', 'Unicode']
    creation = <Date 2009-05-03.01:55:23.839>
    creator = 'methane'
    dependencies = []
    files = ['15030']
    hgrepos = []
    issue_num = 5911
    keywords = ['patch']
    message_count = 6.0
    messages = ['86994', '93501', '93897', '114814', '114879', '114880']
    nosy_count = 5.0
    nosy_names = ['facundobatista', 'benjamin.peterson', 'Trundle', 'methane', 'BreamoreBoy']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = 'test needed'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue5911'
    versions = ['Python 3.2']

    @methane
    Copy link
    Member Author

    methane commented May 3, 2009

    The built-in compile() expects source is encoded in utf-8.
    This behavior make it harder to implement alternative shell
    like IDLE and IPython. (http://bugs.python.org/issue1542677 and
    https://bugs.launchpad.net/ipython/+bug/339642 are related bugs.)

    Below is current compile() behavior.

    # Python's interactive shell in Windows cp932 console.
    >>> ""
    '\x82\xa0'
    >>> u""
    u'\u3042'
    
    # compile() fails to decode str.
    >>> code = compile('u"あ"', '__interactive__', 'single')
    >>> exec code
    u'\x82\xa0'  # u'\u3042' expected.
    
    # compile() encodes unicode to utf-8.
    >>> code = compile(u'"あ"', '__interactive__', 'single')
    >>> exec code
    '\xe3\x81\x82' # '\x82\xa0' (cp932) wanted, but I get utf-8.

    Currentry, using PEP-0263 like below is needed to get compile
    code in expected encoding.

    >>> code = compile('# coding: cp932\n%s' % ('"あ"',), '__interactive__', 
    'single')
    >>> exec code
    '\x82\xa0'
    >>> code = compile('# coding: cp932\n%s' % ('u"あ"',), '__interactive__', 
    'single')
    >>> exec code
    u'\u3042'

    But I feel compile() with PEP-0263 is bit dirty hack.
    I think adding a 'encoding' argument that have a 'utf-8' as default value to
    compile() is cleaner way and it doesn't break backward compatibility.

    Following example is describe behavior of compile() with encoding option.

    # coding: utf-8 (in utf-8 context)
    code = compile('"あ"', '__foo.py', 'single')
    exec code #=> '\xe3\x81\x82'
    
    code = compile('"あ"', '__foo.py', 'single', encoding='cp932') => 
    UnicodeDecodeError
    
    code = compile(u'"あ"', '__foo.py', 'single')
    exec code #=> '\xe3\x81\x82'
    
    code = compile(u'"あ"', '__foo.py', 'single', encoding='cp932')
    exec code #=> '\x82\xa0'

    @methane methane added the type-feature A feature request or enhancement label May 3, 2009
    @devdanzin devdanzin mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode labels May 8, 2009
    @methane
    Copy link
    Member Author

    methane commented Oct 3, 2009

    add sample implementation.

    @benjaminp
    Copy link
    Contributor

    The patch as it currently stands is unacceptable because it changes
    public APIs.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Aug 24, 2010

    Anyone interested in producing an updated patch?

    @methane
    Copy link
    Member Author

    methane commented Aug 25, 2010

    This problem is not heavy on Python 3.
    Because Python 3's byte string can't contain non-ASCII string directory.
    So passing unicode string to the compile() is good enough for all cases I can imagine.

    @benjaminp
    Copy link
    Contributor

    I'll close this then.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants