classification
Title: asyncore and asynchat incompatible with Py3k str and bytes
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: josiahcarlson Nosy List: JPosthuma, benjamin.peterson, djarb, exarkun, giampaolo.rodola, intgr, janssen, josiah.carlson, josiahcarlson, r.david.murray
Priority: normal Keywords:

Created on 2007-12-06 18:50 by djarb, last changed 2010-04-23 14:07 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
asynchat.diff djarb, 2007-12-07 18:04
asynchat.diff djarb, 2007-12-10 16:12
asyn_aggregate.diff djarb, 2007-12-11 18:01
asyn_aggregate.diff djarb, 2007-12-12 18:56
aggregate.zip djarb, 2007-12-13 14:34
asyn_py3k.diff djarb, 2008-02-14 05:59 Update asyncore and asynchat to py3k revision 60780
asyn_py3k_restructured.diff djarb, 2008-02-14 06:00 Update asyncore and asynchat to py3k revision 60780, and restructure for better iterator support
Messages (24)
msg58251 - (view) Author: Daniel Arbuckle (djarb) * Date: 2007-12-06 18:50
djarb@highenergymagic.org is working up a patch for this.
msg58277 - (view) Author: Daniel Arbuckle (djarb) * Date: 2007-12-07 18:04
The attached patch introduces async_chat.set_terminator_str and
async_chat.push_str methods, accepting a string and an encoding. It also
modifies the test suite and stdlib to use the _str versions, where
appropriate, adds tests for the new methods, and adds documentation.

The patch also includes two things not materially related to its main
purpose: a lot of docstrings, and a new async_chat.push_iterable method
allowing the use of an iterable as an async_chat data producer (along
with test and documentation). I figure docstrings are always justified,
and push_iterable is both short and highly useful.
msg58352 - (view) Author: Daniel Arbuckle (djarb) * Date: 2007-12-10 16:12
Some documentation changes thanks to feedback from Giampaolo Rodola and
James Y Knight.
msg58406 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2007-12-11 02:37
There's a patch pending which should be included in the 2.6 trunk before
solving issues related to py3k and/or applying other changes, imho: 
http://bugs.python.org/issue1736190
msg58451 - (view) Author: Daniel Arbuckle (djarb) * Date: 2007-12-11 18:01
Those patches can't be applied directly to the py3k branch, so instead
I've ported them and merged them with my patch where appropriate. This
merged patch should address both py3k porting and the issues in raised
in 1736190

In addition, some changes have been made to the async_chat interface:
specifically, collect_incoming_data has a useful default implementation,
and internal variables are now protected by name mangling from
accidental overriding.

The file asyn_aggregate.diff contains the merged patches.
msg58504 - (view) Author: Daniel Arbuckle (djarb) * Date: 2007-12-12 18:56
Fixed a bug, and added the test that would have caught it.
msg58513 - (view) Author: Josiah Carlson (josiah.carlson) * (Python committer) Date: 2007-12-13 01:54
The patches that giampaolo sent you were for 2.x, not 3.x .  Arguably
they should be applied to 2.6 first, tested, etc., then be run through
the 2.6 to 3.0 converter, then adjusted for str/bytes stuff.

One of my concerns with your changes (which are hard to work out because
the diff is so huge, could you attach your actual files?) is that they
seem to change asyncore/asynchat for no other reason than to be different.

The changes that giampaolo sent you (that were from some earlier work
that I did) preserved compatibility while adding better error handling, etc.

But maybe I'm just not reading the diff correctly.
msg58534 - (view) Author: Daniel Arbuckle (djarb) * Date: 2007-12-13 14:34
Certainly, I'll attach the complete asyncore and asynchat files. I had a
similar problem making heads or tails of the patch that Giampaolo
pointed out to me.

Regarding the "changes for changes sake" argument, I have two responses. 

First, it's perfectly reasonable to use the Dec 10th "asynchat.diff"
file, in which I made no significant changes, aside from the addition of
an adapter that would turn an iterator into a producer (rather than vice
versa). That patch just fixes the problems I was tasked with. It's still
a fairly large patch, but it consists entirely of trivial chunks.

Second, after applying Giampaolo's patch, I felt rather more free to fix
things, since that was what that patch did. Most of those changes you're
complaining about come from there. My own additions were as follows: I
created a default collect_incoming_data, which implements the common
usage case. This will not alter behavior for old implementations. I
applied name mangling to the internal names of async_chat, so as to
prevent children from accidentally stepping on the names (which causes
weird bugs that can be a pain to figure out). This also will not alter
behavior for existing implementations, unless they were doing something
very odd indeed. I replaced producers with iterators, and adjusted
initiate_send to fit. I did that because producers have been redundant
in Python ever since iterators were introduced, and doubly so since
generators. This makes it possible to easily and efficiently do things
like place an itertools.ifilter on the outgoing fifo, etc. It's more
flexible, and has better compatibility with the rest of the standard
library. Again, this change doesn't alter or remove existing
functionality, it just changes the way it's implemented.

Regarding the correct way to port Giampaolo's patches, I followed a
similar course, though I didn't stop there.
msg70474 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-07-31 02:08
Any progress?
msg70496 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2008-07-31 10:37
Now that the major bugs have been fixed I think this patch needs to be
re-adapted to the new asynchat.py currently available in the trunk.
msg70500 - (view) Author: Daniel Arbuckle (djarb) * Date: 2008-07-31 13:15
I'll update the patch and post it again.
msg70520 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2008-07-31 15:22
Thanks.  I'll read it.

Bill

On Thu, Jul 31, 2008 at 6:16 AM, Daniel Arbuckle <report@bugs.python.org>wrote:

>
> Daniel Arbuckle <djarb@highenergymagic.org> added the comment:
>
> I'll update the patch and post it again.
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1563>
> _______________________________________
>
msg70593 - (view) Author: Josiah Carlson (josiahcarlson) * (Python triager) Date: 2008-08-01 21:20
The current revision of 3.0 handles the case where reading from the
socket returns a Python 3.0 str object, which it then translates into
bytes objects.  This is sufficient for passing the 3.0 unittests.  See
asynchat.async_chat.use_encoding and asynchat.async_chat.encoding .

From what I understand, the OP wants to be able to pass unicode strings
across a network connection.  I'm not sure that such is generally
desirable to be within the standard library.  I would actually argue
that being able to transparently pass unicode across a socket is a
misfeature; especially considering 1 unciode character can become 2 or
more bytes when encoded as utf-8, etc., and sockets make no guarantees
about an entire packet being delivered.

In terms of sending (push_str), it should be an error that the user
software is attempting to send textual data (all reasonable RFCs define
protocols in terms of ascii, not utf).  Handling incoming data
(set_terminator_str) follows the same argument against handling unicode
data as push_str, only in reverse.

This should not be backported to any version of Python.


Before discussion about actually applying any patch to
asyncore/asynchat/smtpd continues, I would like to hear of a use-case
for wanting to pass textual unicode data across a socket connection.
msg70601 - (view) Author: Daniel Arbuckle (djarb) * Date: 2008-08-01 22:15
> From what I understand, the OP wants to be able to pass unicode strings
> across a network connection.

That's incorrect.

At the time I formulated this patch, asyncore/asynchat were slated for
removal from the standard lib unless somebody stepped up and made it
work correctly after the str/bytes transition. That's what
asyn_py3k.diff and its predecessors do, as well as adding sorely-needed
docstrings to the module. This patch may be a little more complete than
the currently committed code, or it may be largely redundant. The
docstrings, at least, are still needed.

The asyn_py3k_restructured.diff patch is a superset of asyn_py3k.diff,
which additionally removes the "producer" type from the module,
replacing it with iterators. This is controversial, and seems unlikely
to be committed.
msg70602 - (view) Author: Josiah Carlson (josiahcarlson) * (Python triager) Date: 2008-08-01 22:26
Asyncore and asynchat are not going to be removed, and were not being
seriously discussed as being removable in Python 3.0 since January of
2007 when I took over maintenance.  If there was a miscommunication in
an email thread on python-3000 or python-dev, then I'm sorry, as I was
relatively incommunicado for about a year.

As of right now, the tests for 3.0 pass, and when passing only bytes in
and out of asyncore and asynchat, everything works just fine.  Mixing
and matching might or might not work, but explicitly allowing an error
to pass silently (passing text where bytes should be passed) is the
wrong thing to do.

If you want to change the docstrings, that's fine, submit a patch that
includes docstring changes only, and I'll probably commit it some time
next week.  Functionality changes will need to be discussed.
msg70604 - (view) Author: Daniel Arbuckle (djarb) * Date: 2008-08-01 23:05
As of December 2007, Guido did not believe that you or anyone else cared
enough about asyncore/asynchat to update them to py3k:
http://mail.python.org/pipermail/python-dev/2007-December/075574.html

Thankfully there's been a resurgence of activity on the modules, and
they're well out of danger now.

I _am_not_ proposing that passing unicode into asyncore should do
anything other than fail. I have never proposed such, and my patch
tightened those constraints rather than loosening them. Please stop
beating that horse; it's dead.

On a more positive note, I'll put together that docstring patch for you
at the same time I'm evaluating whether any of the rest of my patch
remains necessary.
msg70622 - (view) Author: Josiah Carlson (josiahcarlson) * (Python triager) Date: 2008-08-02 08:03
Sounds good.  I look forward to seeing the patch :) .
msg88545 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-05-29 22:43
Reducing priority since the critical issues seem to have been resolved
already.  Also retargeting for 3.2 since 3.1 is about to go rc.
msg88559 - (view) Author: Josiah Carlson (josiahcarlson) * (Python triager) Date: 2009-05-30 01:30
You can probably close this unless someone says otherwise.  asyncore/asynchat work in Python 3.0+, as long as only bytes are passed.  
As of right now, this is a request for documentation stating "you can only 
send/receive bytes"...which may be implicit with the fact that we are 
reading/writing to sockets.
msg90489 - (view) Author: Jorrit Posthuma (JPosthuma) Date: 2009-07-13 15:26
It's not 'that' clear you should only work with bytes on a socket. 
Especially not when using a wrapper like asynchat. They are supposed to 
make it easier, and passing a string is easier. Maybe it's possible to do 
a default byte conversion when the user is working with strings.
msg90490 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2009-07-13 15:33
> It's not 'that' clear you should only work with bytes on a socket. 

It's pretty clear to me. :)  That's what sockets can deal with - bytes.
 If you want to transfer something other than bytes via a socket, then
you need to convert it to bytes.  In the case of unicode, there are many
different choices which can be made for how to do this conversion. 
asyncore cannot know what the correct choice is in any particular
situation, so it shouldn't try to make it.

The attached patch forces the application to make this choice,
fortunately.  However, since push_str is only one line, I'm not sure
what the attraction is.  Why is push_str(foo, bar) preferable to
push(foo.encode(bar))?

> Maybe it's possible to do a default byte conversion when the user is
working with strings.

This definitely isn't reasonable and should not be done.  It's also not
what the last proposed patch does, so it doesn't seem to the direction
the other interested parties have been working in.
msg90491 - (view) Author: Jorrit Posthuma (JPosthuma) Date: 2009-07-13 15:46
On 13 jul 2009, at 17:33, Jean-Paul Calderone wrote:
>
> Jean-Paul Calderone <exarkun@divmod.com> added the comment:
>
>> It's not 'that' clear you should only work with bytes on a socket.
>
> It's pretty clear to me. :)  That's what sockets can deal with -  
> bytes.

It's allso clear to me, but there are people that don't know that.

> If you want to transfer something other than bytes via a socket, then
> you need to convert it to bytes.  In the case of unicode, there are  
> many
> different choices which can be made for how to do this conversion.
> asyncore cannot know what the correct choice is in any particular
> situation, so it shouldn't try to make it.
>
> The attached patch forces the application to make this choice,
> fortunately.  However, since push_str is only one line, I'm not sure
> what the attraction is.  Why is push_str(foo, bar) preferable to
> push(foo.encode(bar))?

It's not, I was more thinking of push_str(foo), where it uses a  
default encoding. I think some people don't know what unicode or  
encoding is, or don't want to worry about it.

>
>> Maybe it's possible to do a default byte conversion when the user  
>> is working with strings.
>
> This definitely isn't reasonable and should not be done.  It's also  
> not
> what the last proposed patch does, so it doesn't seem to the direction
> the other interested parties have been working in.

It's not an attack ;), i'm pretty new to Python, and it was just  
something that i noticed (after changing from 2 to 3)
msg90492 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2009-07-13 15:50
> It's not, I was more thinking of push_str(foo), where it uses a
default encoding. I think some people don't know what unicode or
encoding is, or don't want to worry about it.

Nice as it sounds, the problem with this is that those people cannot
write correct programs without understanding this.  A push_str API which
implicitly encodes unicode to str would just be a point for bugs to be
introduced.
msg104013 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-04-23 14:07
Since no doc only patch has been proposed and there is some disagreement as to whether it is needed anyway, I'm closing this per msg88559.
History
Date User Action Args
2010-04-23 14:07:37r.david.murraysetstatus: open -> closed
resolution: out of date
messages: + msg104013

stage: resolved
2010-04-23 14:04:55r.david.murraysetfiles: - unnamed
2009-07-13 15:50:45exarkunsetmessages: + msg90492
2009-07-13 15:46:52JPosthumasetmessages: + msg90491
2009-07-13 15:33:21exarkunsetnosy: + exarkun
messages: + msg90490
2009-07-13 15:26:32JPosthumasetnosy: + JPosthuma
messages: + msg90489
2009-05-30 01:30:24josiahcarlsonsetmessages: + msg88559
2009-05-29 22:43:39r.david.murraysetpriority: critical -> normal
versions: + Python 3.2, - Python 3.0
nosy: + r.david.murray

messages: + msg88545
2009-03-28 12:17:27intgrsetnosy: + intgr
2008-08-02 08:03:02josiahcarlsonsetmessages: + msg70622
2008-08-01 23:05:19djarbsetmessages: + msg70604
2008-08-01 22:26:24josiahcarlsonsetmessages: + msg70602
2008-08-01 22:15:29djarbsetmessages: + msg70601
2008-08-01 21:20:03josiahcarlsonsetmessages: + msg70593
2008-07-31 15:22:10janssensetfiles: + unnamed
messages: + msg70520
2008-07-31 13:15:59djarbsetmessages: + msg70500
2008-07-31 10:38:02giampaolo.rodolasetmessages: + msg70496
2008-07-31 02:08:14benjamin.petersonsetassignee: josiahcarlson
messages: + msg70474
nosy: + josiahcarlson, benjamin.peterson
2008-05-03 23:01:05brett.cannonsetpriority: normal -> critical
2008-02-14 06:00:32djarbsetfiles: + asyn_py3k_restructured.diff
2008-02-14 05:59:25djarbsetfiles: + asyn_py3k.diff
2008-02-14 02:50:07janssensetnosy: + janssen
2008-01-06 22:29:44adminsetkeywords: - py3k
versions: Python 3.0
2007-12-13 14:34:54djarbsetfiles: + aggregate.zip
messages: + msg58534
2007-12-13 01:54:18josiah.carlsonsetnosy: + josiah.carlson
messages: + msg58513
2007-12-12 18:56:02djarbsetfiles: + asyn_aggregate.diff
messages: + msg58504
2007-12-11 18:01:12djarbsetfiles: + asyn_aggregate.diff
messages: + msg58451
2007-12-11 02:37:24giampaolo.rodolasetnosy: + giampaolo.rodola
messages: + msg58406
2007-12-10 16:12:56djarbsetfiles: + asynchat.diff
messages: + msg58352
2007-12-07 18:04:16djarbsetfiles: + asynchat.diff
messages: + msg58277
2007-12-06 20:45:14christian.heimessetpriority: normal
keywords: + py3k
2007-12-06 18:50:51djarbcreate