Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decoding functions in _codecs module accept str arguments #49124

Closed
pitrou opened this issue Jan 7, 2009 · 12 comments
Closed

decoding functions in _codecs module accept str arguments #49124

pitrou opened this issue Jan 7, 2009 · 12 comments
Assignees
Labels
extension-modules C modules in the Modules dir release-blocker type-bug An unexpected behavior, bug, or error

Comments

@pitrou
Copy link
Member

pitrou commented Jan 7, 2009

BPO 4874
Nosy @malemburg, @amauryfa, @pitrou, @vstinner
Files
  • _codecs_bytes-2.patch
  • mbdecode-unicode.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pitrou'
    closed_at = <Date 2009-01-22.12:25:56.020>
    created_at = <Date 2009-01-07.23:48:15.674>
    labels = ['extension-modules', 'type-bug', 'release-blocker']
    title = 'decoding functions in _codecs module accept str arguments'
    updated_at = <Date 2009-01-22.12:25:56.019>
    user = 'https://github.com/pitrou'

    bugs.python.org fields:

    activity = <Date 2009-01-22.12:25:56.019>
    actor = 'pitrou'
    assignee = 'pitrou'
    closed = True
    closed_date = <Date 2009-01-22.12:25:56.020>
    closer = 'pitrou'
    components = ['Extension Modules']
    creation = <Date 2009-01-07.23:48:15.674>
    creator = 'pitrou'
    dependencies = []
    files = ['12770', '12831']
    hgrepos = []
    issue_num = 4874
    keywords = ['patch']
    message_count = 12.0
    messages = ['79384', '79387', '79402', '79741', '79843', '79991', '80362', '80363', '80364', '80365', '80366', '80367']
    nosy_count = 4.0
    nosy_names = ['lemburg', 'amaury.forgeotdarc', 'pitrou', 'vstinner']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue4874'
    versions = ['Python 3.0', 'Python 3.1']

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 7, 2009

    The following function calls should raise a TypeError instead. Encoding
    functions are fine (they only accept str).

    >>> import codecs
    >>> codecs.utf_8_decode('aa')
    ('aa', 2)
    >>> codecs.utf_8_decode('éé')
    ('éé', 4)
    >>> codecs.latin_1_decode('éé')
    ('éé', 4)

    @pitrou pitrou added release-blocker extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Jan 7, 2009
    @vstinner
    Copy link
    Member

    vstinner commented Jan 8, 2009

    Patch replacing "s*" parsing format by "y*" for:

    • utf_7_decode()
    • utf_8_decode()
    • utf_16_decode()
    • utf_16_le_decode()
    • utf_16_be_decode()
    • utf_16_ex_decode()
    • utf_32_decode()
    • utf_32_le_decode()
    • utf_32_be_decode()
    • utf_32_ex_decode()
    • unicode_escape_decode()
    • raw_unicode_escape_decode()
    • latin_1_decode()
    • ascii_decode()
    • charmap_decode()
    • mbcs_decode()

    Using run_tests.sh, all tests are ok (with 19 skipped tests). I guess
    that there is not tests for all these functions :-/

    Note: codecs documentation was already correct:

    .. method:: Codec.decode(input[, errors])
    (...)
    *input* must be a bytes object or one which provides the read-only
    character
    buffer interface -- for example, buffer objects and memory mapped
    files.

    @malemburg
    Copy link
    Member

    On 2009-01-08 01:59, STINNER Victor wrote:

    STINNER Victor <victor.stinner@haypocalc.com> added the comment:

    Patch replacing "s*" parsing format by "y*" for:

    • utf_7_decode()
    • utf_8_decode()
    • utf_16_decode()
    • utf_16_le_decode()
    • utf_16_be_decode()
    • utf_16_ex_decode()
    • utf_32_decode()
    • utf_32_le_decode()
    • utf_32_be_decode()
    • utf_32_ex_decode()
    • latin_1_decode()
    • ascii_decode()
    • charmap_decode()
    • mbcs_decode()

    These are fine.

    • unicode_escape_decode()
    • raw_unicode_escape_decode()

    These changes are in line with their C API codec interfaces as well,
    but those particular codecs could well also be made to work on Unicode
    input, since unescaping can well be applied to Unicode as well.

    I'll probably open a new item for this.

    Using run_tests.sh, all tests are ok (with 19 skipped tests). I guess
    that there is not tests for all these functions :-/

    The mbcs codec is only available on Windows.

    All others are tested by test_codecs.py.

    Which ones are skipped in your case ?

    Note: codecs documentation was already correct:

    .. method:: Codec.decode(input[, errors])
    (...)
    *input* must be a bytes object or one which provides the read-only
    character
    buffer interface -- for example, buffer objects and memory mapped
    files.

    That's not entirely correct: codecs are allowed to accept any
    object type and can also return any object type. It up to them
    to decide, e.g. a codec may accept both bytes and Unicode input
    and always generate Unicode output when decoding.

    I guess I have to review those doc changes...

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 13, 2009

    The patch is probably fine, but it would be nice to add some unit tests
    for the new behaviour.

    @malemburg
    Copy link
    Member

    On 2009-01-13 14:14, Antoine Pitrou wrote:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    The patch is probably fine, but it would be nice to add some unit tests
    for the new behaviour.

    +1 from my side.

    Thanks for the patch, Viktor.

    @vstinner
    Copy link
    Member

    New patch:

    • Leave unicode_escape_decode() and raw_unicode_escape_decode()
      unchanged (still accept unicode string)
    • Test changes (reject unicode for most codecs decode functions)
    • Write tests for unicode_escape_decode() and
      raw_unicode_escape_decode() (there was no test for these functions)

    @pitrou pitrou self-assigned this Jan 22, 2009
    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 22, 2009

    Committed in r68855, r68856. Thanks!

    @pitrou pitrou closed this as completed Jan 22, 2009
    @amauryfa
    Copy link
    Member

    IMO, Modules/cjkcodecs/multibytecodec.c should be changed as well.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 22, 2009

    Right, I hadn't thought of that.

    @pitrou pitrou reopened this Jan 22, 2009
    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 22, 2009

    "Fixing" multibytecodec.c produces a TypeError in the following test:

        def test_errorcallback_longindex(self):
            dec = codecs.getdecoder('euc-kr')
            myreplace  = lambda exc: ('', sys.maxsize+1)
            codecs.register_error('test.cjktest', myreplace)
            self.assertRaises(IndexError, dec,
                              'apple\x92ham\x93spam', 'test.cjktest')

    TypeError: decode() argument 1 must be bytes or buffer, not str

    Since the test is meant to test recovery from a misbehaving callback, I
    guess the type of the input string is not really important and can be
    changed to a bytes string instead. What do you think?

    (in any case, here is a patch)

    @amauryfa
    Copy link
    Member

    The patch looks good.
    I think the missing b in test_errorcallback_longindex is an overlook
    when the tests were updated for py3k. You are right to change the test.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jan 22, 2009

    Committed in r68857, r68858.

    @pitrou pitrou closed this as completed Jan 22, 2009
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir release-blocker type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants