Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncore and asynchat incompatible with Py3k str and bytes #45904

Closed
djarb mannequin opened this issue Dec 6, 2007 · 24 comments
Closed

asyncore and asynchat incompatible with Py3k str and bytes #45904

djarb mannequin opened this issue Dec 6, 2007 · 24 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@djarb
Copy link
Mannequin

djarb mannequin commented Dec 6, 2007

BPO 1563
Nosy @josiahcarlson, @giampaolo, @benjaminp, @bitdancer, @intgr
Files
  • asynchat.diff
  • asynchat.diff
  • asyn_aggregate.diff
  • asyn_aggregate.diff
  • aggregate.zip
  • asyn_py3k.diff: Update asyncore and asynchat to py3k revision 60780
  • asyn_py3k_restructured.diff: Update asyncore and asynchat to py3k revision 60780, and restructure for better iterator support
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/josiahcarlson'
    closed_at = <Date 2010-04-23.14:07:37.056>
    created_at = <Date 2007-12-06.18:50:51.164>
    labels = ['type-bug', 'library']
    title = 'asyncore and asynchat incompatible with Py3k str and bytes'
    updated_at = <Date 2010-04-23.14:07:37.055>
    user = 'https://bugs.python.org/djarb'

    bugs.python.org fields:

    activity = <Date 2010-04-23.14:07:37.055>
    actor = 'r.david.murray'
    assignee = 'josiahcarlson'
    closed = True
    closed_date = <Date 2010-04-23.14:07:37.056>
    closer = 'r.david.murray'
    components = ['Library (Lib)']
    creation = <Date 2007-12-06.18:50:51.164>
    creator = 'djarb'
    dependencies = []
    files = ['8890', '8908', '8926', '8932', '8938', '9427', '9428']
    hgrepos = []
    issue_num = 1563
    keywords = []
    message_count = 24.0
    messages = ['58251', '58277', '58352', '58406', '58451', '58504', '58513', '58534', '70474', '70496', '70500', '70520', '70593', '70601', '70602', '70604', '70622', '88545', '88559', '90489', '90490', '90491', '90492', '104013']
    nosy_count = 10.0
    nosy_names = ['josiahcarlson', 'exarkun', 'janssen', 'giampaolo.rodola', 'djarb', 'josiah.carlson', 'benjamin.peterson', 'r.david.murray', 'intgr', 'JPosthuma']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1563'
    versions = ['Python 3.2']

    @djarb
    Copy link
    Mannequin Author

    djarb mannequin commented Dec 6, 2007

    djarb@highenergymagic.org is working up a patch for this.

    @djarb djarb mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 6, 2007
    @djarb
    Copy link
    Mannequin Author

    djarb mannequin commented Dec 7, 2007

    The attached patch introduces async_chat.set_terminator_str and
    async_chat.push_str methods, accepting a string and an encoding. It also
    modifies the test suite and stdlib to use the _str versions, where
    appropriate, adds tests for the new methods, and adds documentation.

    The patch also includes two things not materially related to its main
    purpose: a lot of docstrings, and a new async_chat.push_iterable method
    allowing the use of an iterable as an async_chat data producer (along
    with test and documentation). I figure docstrings are always justified,
    and push_iterable is both short and highly useful.

    @djarb
    Copy link
    Mannequin Author

    djarb mannequin commented Dec 10, 2007

    Some documentation changes thanks to feedback from Giampaolo Rodola and
    James Y Knight.

    @giampaolo
    Copy link
    Contributor

    There's a patch pending which should be included in the 2.6 trunk before
    solving issues related to py3k and/or applying other changes, imho:
    http://bugs.python.org/issue1736190

    @djarb
    Copy link
    Mannequin Author

    djarb mannequin commented Dec 11, 2007

    Those patches can't be applied directly to the py3k branch, so instead
    I've ported them and merged them with my patch where appropriate. This
    merged patch should address both py3k porting and the issues in raised
    in 1736190

    In addition, some changes have been made to the async_chat interface:
    specifically, collect_incoming_data has a useful default implementation,
    and internal variables are now protected by name mangling from
    accidental overriding.

    The file asyn_aggregate.diff contains the merged patches.

    @djarb
    Copy link
    Mannequin Author

    djarb mannequin commented Dec 12, 2007

    Fixed a bug, and added the test that would have caught it.

    @josiahcarlson
    Copy link
    Mannequin

    josiahcarlson mannequin commented Dec 13, 2007

    The patches that giampaolo sent you were for 2.x, not 3.x . Arguably
    they should be applied to 2.6 first, tested, etc., then be run through
    the 2.6 to 3.0 converter, then adjusted for str/bytes stuff.

    One of my concerns with your changes (which are hard to work out because
    the diff is so huge, could you attach your actual files?) is that they
    seem to change asyncore/asynchat for no other reason than to be different.

    The changes that giampaolo sent you (that were from some earlier work
    that I did) preserved compatibility while adding better error handling, etc.

    But maybe I'm just not reading the diff correctly.

    @djarb
    Copy link
    Mannequin Author

    djarb mannequin commented Dec 13, 2007

    Certainly, I'll attach the complete asyncore and asynchat files. I had a
    similar problem making heads or tails of the patch that Giampaolo
    pointed out to me.

    Regarding the "changes for changes sake" argument, I have two responses.

    First, it's perfectly reasonable to use the Dec 10th "asynchat.diff"
    file, in which I made no significant changes, aside from the addition of
    an adapter that would turn an iterator into a producer (rather than vice
    versa). That patch just fixes the problems I was tasked with. It's still
    a fairly large patch, but it consists entirely of trivial chunks.

    Second, after applying Giampaolo's patch, I felt rather more free to fix
    things, since that was what that patch did. Most of those changes you're
    complaining about come from there. My own additions were as follows: I
    created a default collect_incoming_data, which implements the common
    usage case. This will not alter behavior for old implementations. I
    applied name mangling to the internal names of async_chat, so as to
    prevent children from accidentally stepping on the names (which causes
    weird bugs that can be a pain to figure out). This also will not alter
    behavior for existing implementations, unless they were doing something
    very odd indeed. I replaced producers with iterators, and adjusted
    initiate_send to fit. I did that because producers have been redundant
    in Python ever since iterators were introduced, and doubly so since
    generators. This makes it possible to easily and efficiently do things
    like place an itertools.ifilter on the outgoing fifo, etc. It's more
    flexible, and has better compatibility with the rest of the standard
    library. Again, this change doesn't alter or remove existing
    functionality, it just changes the way it's implemented.

    Regarding the correct way to port Giampaolo's patches, I followed a
    similar course, though I didn't stop there.

    @benjaminp
    Copy link
    Contributor

    Any progress?

    @giampaolo
    Copy link
    Contributor

    Now that the major bugs have been fixed I think this patch needs to be
    re-adapted to the new asynchat.py currently available in the trunk.

    @djarb
    Copy link
    Mannequin Author

    djarb mannequin commented Jul 31, 2008

    I'll update the patch and post it again.

    @janssen
    Copy link
    Mannequin

    janssen mannequin commented Jul 31, 2008

    Thanks. I'll read it.

    Bill

    On Thu, Jul 31, 2008 at 6:16 AM, Daniel Arbuckle <report@bugs.python.org>wrote:

    Daniel Arbuckle <djarb@highenergymagic.org> added the comment:

    I'll update the patch and post it again.


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1563\>


    @josiahcarlson
    Copy link
    Mannequin

    josiahcarlson mannequin commented Aug 1, 2008

    The current revision of 3.0 handles the case where reading from the
    socket returns a Python 3.0 str object, which it then translates into
    bytes objects. This is sufficient for passing the 3.0 unittests. See
    asynchat.async_chat.use_encoding and asynchat.async_chat.encoding .

    From what I understand, the OP wants to be able to pass unicode strings
    across a network connection. I'm not sure that such is generally
    desirable to be within the standard library. I would actually argue
    that being able to transparently pass unicode across a socket is a
    misfeature; especially considering 1 unciode character can become 2 or
    more bytes when encoded as utf-8, etc., and sockets make no guarantees
    about an entire packet being delivered.

    In terms of sending (push_str), it should be an error that the user
    software is attempting to send textual data (all reasonable RFCs define
    protocols in terms of ascii, not utf). Handling incoming data
    (set_terminator_str) follows the same argument against handling unicode
    data as push_str, only in reverse.

    This should not be backported to any version of Python.

    Before discussion about actually applying any patch to
    asyncore/asynchat/smtpd continues, I would like to hear of a use-case
    for wanting to pass textual unicode data across a socket connection.

    @djarb
    Copy link
    Mannequin Author

    djarb mannequin commented Aug 1, 2008

    From what I understand, the OP wants to be able to pass unicode strings
    across a network connection.

    That's incorrect.

    At the time I formulated this patch, asyncore/asynchat were slated for
    removal from the standard lib unless somebody stepped up and made it
    work correctly after the str/bytes transition. That's what
    asyn_py3k.diff and its predecessors do, as well as adding sorely-needed
    docstrings to the module. This patch may be a little more complete than
    the currently committed code, or it may be largely redundant. The
    docstrings, at least, are still needed.

    The asyn_py3k_restructured.diff patch is a superset of asyn_py3k.diff,
    which additionally removes the "producer" type from the module,
    replacing it with iterators. This is controversial, and seems unlikely
    to be committed.

    @josiahcarlson
    Copy link
    Mannequin

    josiahcarlson mannequin commented Aug 1, 2008

    Asyncore and asynchat are not going to be removed, and were not being
    seriously discussed as being removable in Python 3.0 since January of
    2007 when I took over maintenance. If there was a miscommunication in
    an email thread on python-3000 or python-dev, then I'm sorry, as I was
    relatively incommunicado for about a year.

    As of right now, the tests for 3.0 pass, and when passing only bytes in
    and out of asyncore and asynchat, everything works just fine. Mixing
    and matching might or might not work, but explicitly allowing an error
    to pass silently (passing text where bytes should be passed) is the
    wrong thing to do.

    If you want to change the docstrings, that's fine, submit a patch that
    includes docstring changes only, and I'll probably commit it some time
    next week. Functionality changes will need to be discussed.

    @djarb
    Copy link
    Mannequin Author

    djarb mannequin commented Aug 1, 2008

    As of December 2007, Guido did not believe that you or anyone else cared
    enough about asyncore/asynchat to update them to py3k:
    http://mail.python.org/pipermail/python-dev/2007-December/075574.html

    Thankfully there's been a resurgence of activity on the modules, and
    they're well out of danger now.

    I _am_not_ proposing that passing unicode into asyncore should do
    anything other than fail. I have never proposed such, and my patch
    tightened those constraints rather than loosening them. Please stop
    beating that horse; it's dead.

    On a more positive note, I'll put together that docstring patch for you
    at the same time I'm evaluating whether any of the rest of my patch
    remains necessary.

    @josiahcarlson
    Copy link
    Mannequin

    josiahcarlson mannequin commented Aug 2, 2008

    Sounds good. I look forward to seeing the patch :) .

    @bitdancer
    Copy link
    Member

    Reducing priority since the critical issues seem to have been resolved
    already. Also retargeting for 3.2 since 3.1 is about to go rc.

    @josiahcarlson
    Copy link
    Mannequin

    josiahcarlson mannequin commented May 30, 2009

    You can probably close this unless someone says otherwise. asyncore/asynchat work in Python 3.0+, as long as only bytes are passed.
    As of right now, this is a request for documentation stating "you can only
    send/receive bytes"...which may be implicit with the fact that we are
    reading/writing to sockets.

    @JPosthuma
    Copy link
    Mannequin

    JPosthuma mannequin commented Jul 13, 2009

    It's not 'that' clear you should only work with bytes on a socket.
    Especially not when using a wrapper like asynchat. They are supposed to
    make it easier, and passing a string is easier. Maybe it's possible to do
    a default byte conversion when the user is working with strings.

    @exarkun
    Copy link
    Mannequin

    exarkun mannequin commented Jul 13, 2009

    It's not 'that' clear you should only work with bytes on a socket.

    It's pretty clear to me. :) That's what sockets can deal with - bytes.
    If you want to transfer something other than bytes via a socket, then
    you need to convert it to bytes. In the case of unicode, there are many
    different choices which can be made for how to do this conversion.
    asyncore cannot know what the correct choice is in any particular
    situation, so it shouldn't try to make it.

    The attached patch forces the application to make this choice,
    fortunately. However, since push_str is only one line, I'm not sure
    what the attraction is. Why is push_str(foo, bar) preferable to
    push(foo.encode(bar))?

    Maybe it's possible to do a default byte conversion when the user is
    working with strings.

    This definitely isn't reasonable and should not be done. It's also not
    what the last proposed patch does, so it doesn't seem to the direction
    the other interested parties have been working in.

    @JPosthuma
    Copy link
    Mannequin

    JPosthuma mannequin commented Jul 13, 2009

    On 13 jul 2009, at 17:33, Jean-Paul Calderone wrote:

    Jean-Paul Calderone <exarkun@divmod.com> added the comment:

    > It's not 'that' clear you should only work with bytes on a socket.

    It's pretty clear to me. :) That's what sockets can deal with -
    bytes.

    It's allso clear to me, but there are people that don't know that.

    If you want to transfer something other than bytes via a socket, then
    you need to convert it to bytes. In the case of unicode, there are
    many
    different choices which can be made for how to do this conversion.
    asyncore cannot know what the correct choice is in any particular
    situation, so it shouldn't try to make it.

    The attached patch forces the application to make this choice,
    fortunately. However, since push_str is only one line, I'm not sure
    what the attraction is. Why is push_str(foo, bar) preferable to
    push(foo.encode(bar))?

    It's not, I was more thinking of push_str(foo), where it uses a
    default encoding. I think some people don't know what unicode or
    encoding is, or don't want to worry about it.

    > Maybe it's possible to do a default byte conversion when the user
    > is working with strings.

    This definitely isn't reasonable and should not be done. It's also
    not
    what the last proposed patch does, so it doesn't seem to the direction
    the other interested parties have been working in.

    It's not an attack ;), i'm pretty new to Python, and it was just
    something that i noticed (after changing from 2 to 3)

    @exarkun
    Copy link
    Mannequin

    exarkun mannequin commented Jul 13, 2009

    It's not, I was more thinking of push_str(foo), where it uses a
    default encoding. I think some people don't know what unicode or
    encoding is, or don't want to worry about it.

    Nice as it sounds, the problem with this is that those people cannot
    write correct programs without understanding this. A push_str API which
    implicitly encodes unicode to str would just be a point for bugs to be
    introduced.

    @bitdancer
    Copy link
    Member

    Since no doc only patch has been proposed and there is some disagreement as to whether it is needed anyway, I'm closing this per msg88559.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants