Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add link to alternatives for bytes-to-bytes codecs #62044

Closed
serhiy-storchaka opened this issue Apr 25, 2013 · 15 comments
Closed

Add link to alternatives for bytes-to-bytes codecs #62044

serhiy-storchaka opened this issue Apr 25, 2013 · 15 comments
Labels
docs Documentation in the Doc dir type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 17844
Nosy @malemburg, @doerwalter, @ncoghlan, @ezio-melotti, @florentx, @serhiy-storchaka
Files
  • doc_codecs_impl.patch: Patch for 3.x
  • doc_codecs_impl-2.7_2.patch: Patch for 2.7
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-05-23.10:26:34.141>
    created_at = <Date 2013-04-25.11:37:22.833>
    labels = ['type-feature', 'docs']
    title = 'Add link to alternatives for bytes-to-bytes codecs'
    updated_at = <Date 2013-05-23.10:26:34.140>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2013-05-23.10:26:34.140>
    actor = 'ncoghlan'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2013-05-23.10:26:34.141>
    closer = 'ncoghlan'
    components = ['Documentation']
    creation = <Date 2013-04-25.11:37:22.833>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['30331', '30333']
    hgrepos = []
    issue_num = 17844
    keywords = ['patch']
    message_count = 15.0
    messages = ['187777', '189540', '189578', '189579', '189740', '189743', '189747', '189749', '189761', '189797', '189811', '189812', '189821', '189856', '189857']
    nosy_count = 8.0
    nosy_names = ['lemburg', 'doerwalter', 'ncoghlan', 'ezio.melotti', 'flox', 'docs@python', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue17844'
    versions = ['Python 2.7', 'Python 3.3', 'Python 3.4']

    @serhiy-storchaka
    Copy link
    Member Author

    The proposed patch adds link to alternative interfaces for bytes-to-bytes codecs. I.e. base64.b64encode and base64.b64decode for base64_codec.

    Patch for 2.7 should mention other functions/modules (due to lack of some of them).

    @serhiy-storchaka serhiy-storchaka added docs Documentation in the Doc dir type-feature A feature request or enhancement labels Apr 25, 2013
    @serhiy-storchaka
    Copy link
    Member Author

    Any opinions?

    @ncoghlan
    Copy link
    Contributor

    I like this, both because it quite clearly defines the encode and decode directions, and allows notes the more direct entry points if the codec isn't being specified as an input string.

    So +1 from me.

    @malemburg
    Copy link
    Member

    Not a bad idea. More information is always better when it comes to
    documentation :-)

    @serhiy-storchaka
    Copy link
    Member Author

    Not a bad idea.

    How about implementation? Here is updated patches for 3.x and 2.7. Note that in 2.7 I split codecs table as in 3.x.

    @ncoghlan
    Copy link
    Contributor

    I like the idea of splitting the table in 2.7 rather than using a result type column. However, the two intro paragraphs need a bit of work. How does the following sound:

    1. Create a new subheading at the same level as the current "Standard Encodings" heading: "Python Specific Encodings"

    2. Split out rot-13 to its own table in Python 2.7 as well

    3. Under the new subheading, have the following text introducing the tables:

    ----
    A number of predefined codecs are specific to Python, so their codec names have no meaning outside Python. These are listed in the tables below based on the expected input and output types (note that while text encodings are the most common use case for codecs, the underlying codec infrastructure supports arbitrary data transforms rather than just text encodings). For asymmetric codecs, the stated purpose describes the encoding direction.

    The following codecs provide text-to-binary encoding and binary-to-text decoding, similar to the Unicode text encodings.
    ----
    The following codecs provide binary-to-binary encoding and decoding.
    ----
    The following codecs provide text-to-text encoding and decoding.
    ----

    @serhiy-storchaka
    Copy link
    Member Author

    However, the two intro paragraphs need a bit of work.

    Yes, it's a help which I needed. Thank you.

    However your wording is not entirely correct. In 2.7 binary-to-binary codecs and rot-13 works with Unicode strings (only ascii-compatible) as with bytes strings.

    >>> u'Python'.encode('base64')
    'UHl0aG9u\n'
    >>> u'UHl0aG9u'.decode('base64')
    'Python'
    >>> u'Python\u20ac'.encode('base64')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/serhiy/py/cpython-2.7/Lib/encodings/base64_codec.py", line 24, in base64_encode
        output = base64.encodestring(input)
      File "/home/serhiy/py/cpython-2.7/Lib/base64.py", line 315, in encodestring
        pieces.append(binascii.b2a_base64(chunk))
    UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 6: ordinal not in range(128)

    Rot-13 works as common text-to-binary encoding (encode returns str, decode returns unicode).

    >>> u'Python'.encode('rot13')
    'Clguba'
    >>> u'Python'.decode('rot13')
    u'Clguba'
    >>> 'Python'.encode('rot13')
    'Clguba'
    >>> 'Python'.decode('rot13')
    u'Clguba'
    >>> u'Python\u20ac'.encode('rot13')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/serhiy/py/cpython-2.7/Lib/encodings/rot_13.py", line 17, in encode
        return codecs.charmap_encode(input,errors,encoding_map)
    UnicodeEncodeError: 'charmap' codec can't encode character u'\u20ac' in position 6: character maps to <undefined>
    >>> u'Python\u20ac'.decode('rot13')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/serhiy/py/cpython-2.7/Lib/encodings/rot_13.py", line 20, in decode
        return codecs.charmap_decode(input,errors,decoding_map)
    UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 6: ordinal not in range(128)

    @ncoghlan
    Copy link
    Contributor

    While the Python 2 text model was almost certainly a necessary transition step to full unicode support, it is things like this that highlight how fundamentally broken implicit conversion turned out to be at a conceptual level :P

    Perhaps the following would work for 2.7 then (with rot-13 in the first table), with footnotes added to cover the quirks of the implicit type conversions between str and unicode:

    ----
    A number of predefined codecs are specific to Python, so their codec names have no meaning outside Python. These are listed in the tables below based on the expected input and output types (note that while text encodings are the most common use case for codecs, the underlying codec infrastructure supports arbitrary data transforms rather than just text encodings). For asymmetric codecs, the stated purpose describes the encoding direction.

    The following codecs provide unicode-to-str encoding [#1] and str-to-unicode decoding [#2], similar to the Unicode text encodings.
    ----
    The following codecs provide str-to-str encoding and decoding [#2].
    ----

    .. [#1] str objects are also accepted as input in place of unicode objects. They are implicitly converted to unicode by decoding them using the default encoding. If this conversion fails, it may lead to encoding operations raising :exc:`UnicodeDecodeError`.

    .. [#2] unicode objects are also accepted as input in place of str objects. They are implicitly converted to str by encoding them using the default encoding. If this conversion fails, it may lead to decoding operations raising :exc:`UnicodeEncodeError`.

    @serhiy-storchaka
    Copy link
    Member Author

    Thank you Nick. Here is an updated patch for 2.7.

    @ncoghlan
    Copy link
    Contributor

    Thanks Serhiy, that version looks great.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 22, 2013

    New changeset 85c04fdaa404 by Serhiy Storchaka in branch '2.7':
    Issue bpo-17844: Refactor a documentation of Python specific encodings.
    http://hg.python.org/cpython/rev/85c04fdaa404

    New changeset 039dc6dd2bc0 by Serhiy Storchaka in branch '3.3':
    Issue bpo-17844: Add links to encoders and decoders for bytes-to-bytes codecs.
    http://hg.python.org/cpython/rev/039dc6dd2bc0

    New changeset 9afdd88fe33a by Serhiy Storchaka in branch 'default':
    Issue bpo-17844: Add links to encoders and decoders for bytes-to-bytes codecs.
    http://hg.python.org/cpython/rev/9afdd88fe33a

    @serhiy-storchaka
    Copy link
    Member Author

    Thank you Nick. It's mainly your patch.

    Do you want to foreport your changes (a "Python Specific Encodings" subheading and followed paragraph) to 3.x?

    @ncoghlan
    Copy link
    Contributor

    That sounds like a good idea. Yay for not needing those arcane footnotes, though :)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 23, 2013

    New changeset 85e8414060b4 by Nick Coghlan in branch '3.3':
    bpo-17844: Clarify meaning of different codec tables
    http://hg.python.org/cpython/rev/85e8414060b4

    New changeset 801567d6302c by Nick Coghlan in branch 'default':
    Merge bpo-17844 from 3.3
    http://hg.python.org/cpython/rev/801567d6302c

    @ncoghlan
    Copy link
    Contributor

    Thanks for initiating this Serhiy :)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants