Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode-internal encoder reports wrong length #47989

Closed
doerwalter opened this issue Aug 30, 2008 · 4 comments
Closed

unicode-internal encoder reports wrong length #47989

doerwalter opened this issue Aug 30, 2008 · 4 comments

Comments

@doerwalter
Copy link
Contributor

BPO 3739
Nosy @doerwalter, @vstinner
Files
  • issue3739.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-05-06.14:45:28.619>
    created_at = <Date 2008-08-30.13:05:18.752>
    labels = ['expert-unicode']
    title = 'unicode-internal encoder reports wrong length'
    updated_at = <Date 2009-05-06.15:14:01.052>
    user = 'https://github.com/doerwalter'

    bugs.python.org fields:

    activity = <Date 2009-05-06.15:14:01.052>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-05-06.14:45:28.619>
    closer = 'doerwalter'
    components = ['Unicode']
    creation = <Date 2008-08-30.13:05:18.752>
    creator = 'doerwalter'
    dependencies = []
    files = ['13875']
    hgrepos = []
    issue_num = 3739
    keywords = ['patch']
    message_count = 4.0
    messages = ['72193', '87161', '87335', '87336']
    nosy_count = 2.0
    nosy_names = ['doerwalter', 'vstinner']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue3739'
    versions = ['Python 3.0']

    @doerwalter
    Copy link
    Contributor Author

    The encoder for the "unicode-internal" codec reports the wrong length:

    Python 3.0b3+ (py3k, Aug 30 2008, 11:55:21) 
    [GCC 4.0.1 (Apple Inc. build 5484)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import codecs
    >>> codecs.getencoder("unicode-internal")("a")
    (b'a\x00', 2)

    I would have expected it to output:

    (b'a\x00', 1)

    instead.

    @vstinner
    Copy link
    Member

    vstinner commented May 4, 2009

    Patch fixing unicode-internal encoder for unicode string input: return
    the length of the input string (number of characters) and not the
    internal size (number of bytes needed to store the text). I wrote a
    small test, I hope that it will be enough (to test the function).

    If the input is not an unicode string, return the number of bytes (I
    leaved this case unchanged).

    @doerwalter
    Copy link
    Contributor Author

    Checked in:
    r72404,72406 (trunk)
    r72408 (py3k)

    As IMHO this is somewhat between a feature and a bugfix, I didn't check
    it into release26-maint and release30-maint.

    @vstinner
    Copy link
    Member

    vstinner commented May 6, 2009

    I didn't check it into release26-maint and release30-maint.

    I agree and anyway this encoder is not really important (it looks to
    be unused...). Thanks for the commit.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants