Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wsgiref package totally broken #48968

Closed
hdima mannequin opened this issue Dec 22, 2008 · 45 comments
Closed

wsgiref package totally broken #48968

hdima mannequin opened this issue Dec 22, 2008 · 45 comments
Labels
release-blocker stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@hdima
Copy link
Mannequin

hdima mannequin commented Dec 22, 2008

BPO 4718
Nosy @pjeby, @pitrou, @benjaminp
Files
  • wsgiref.patch: WSGI 1.0+ fixes for wsgiref (version 2)
  • wsgiref2.patch: WSGI 1.0+ fixes for wsgiref (with wsgi.input.read() argument check)
  • wsgiref-bb.patch
  • no_proxy.patch: patch for test_urllib
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-01-31.23:30:45.350>
    created_at = <Date 2008-12-22.11:30:56.870>
    labels = ['library', 'type-crash', 'release-blocker']
    title = 'wsgiref package totally broken'
    updated_at = <Date 2009-01-31.23:30:45.332>
    user = 'https://bugs.python.org/hdima'

    bugs.python.org fields:

    activity = <Date 2009-01-31.23:30:45.332>
    actor = 'benjamin.peterson'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-01-31.23:30:45.350>
    closer = 'benjamin.peterson'
    components = ['Library (Lib)']
    creation = <Date 2008-12-22.11:30:56.870>
    creator = 'hdima'
    dependencies = []
    files = ['12447', '12454', '12573', '12577']
    hgrepos = []
    issue_num = 4718
    keywords = ['patch']
    message_count = 45.0
    messages = ['78171', '78172', '78188', '78192', '78193', '78194', '78196', '78207', '78214', '78215', '78229', '78232', '78235', '78236', '78237', '78241', '78251', '78261', '78269', '78279', '78280', '78292', '78294', '78298', '78304', '78348', '78349', '78351', '78352', '78353', '78354', '78612', '78738', '78787', '78990', '78996', '78997', '78999', '79000', '79002', '79003', '79005', '79013', '79020', '80889']
    nosy_count = 5.0
    nosy_names = ['pje', 'hdima', 'pitrou', 'benjamin.peterson', 'grahamd']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue4718'
    versions = ['Python 3.0']

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Dec 22, 2008

    It seems the wsgiref package was copied from Python 2.* without any
    modifications. There are already 3 issues about that but they only
    describe a part of the problem so I decided to start a new one. The
    issues was:

    http://bugs.python.org/issue3348
    http://bugs.python.org/issue3795
    http://bugs.python.org/issue4522

    The attached patch fix the problem with the following changes:

    • Fixed headers handling in wsgiref/simple_server.py;

    • Fixed encoding problems. Now WSGI applications must return iterable
      with bytes but start_response() allow status and headers as bytes and as
      strings. "wsgi.input" file-like now use BytesIO instead of StringIO;

    • Fixed tests;

    • Updated documentation examples;

    @hdima hdima mannequin added stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump labels Dec 22, 2008
    @pitrou
    Copy link
    Member

    pitrou commented Dec 22, 2008

    Phillip, do you have time to take a look at it? We really *must* fix
    wsgiref in py3k...

    @pitrou
    Copy link
    Member

    pitrou commented Dec 22, 2008

    FYI, instead of trying to do exhaustive type checking in _check_type(),
    you can just rely on duck typing and catch the TypeError:

    >>> str(b"a", "utf-8")
    'a'
    >>> str(bytearray(b"a"), "utf-8")
    'a'
    >>> str(memoryview(b"a"), "utf-8")
    'a'
    >>> str(1, "utf-8")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: coercing to str: need string or buffer, int found

    @pjeby
    Copy link
    Mannequin

    pjeby mannequin commented Dec 22, 2008

    If you want to change to using bytes, you're going to have to take it to
    the Web-SIG and hash out a revision to PEP-333, which at the moment
    requires the use of strings, period.

    This has nothing to do with the desirability of bytes vs. strings; I am
    sure that if Python had bytes from day 1, bytes would've been the way to
    go with it. But simply changing the reference library is not the way to
    change the spec.

    In the meantime, as far as I'm aware, there are two other patches
    pending to address these issues, but I'm not in a position to assess
    their readiness/correctness since I have yet to even download Py3K. In
    principle, I approve their approaches, so if someone else can handle the
    code review, those fixes could in principle be put in without changing
    the PEP. But to put *this* patch in, the PEP would have to be changed.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 22, 2008

    If you want to change to using bytes, you're going to have to take it
    to the Web-SIG and hash out a revision to PEP-333, which at the moment
    requires the use of strings, period.

    What was called str in 2.x has become the bytes object in py3k.
    What was called unicode in 2.x has become str in py3k.
    (roughly)

    Given the meaning of the term "string" and its possible acceptions have
    dramatically changed between 2.x and py3k, how does this patch violate
    the PEP more than any other?

    Actually, the PEP says:

        HTTP does not directly support Unicode, and neither does this
        interface. All encoding/decoding must be handled by the
        application; all strings passed to or from the server must be
        standard Python *byte strings*, not Unicode objects.
        [emphasis mine]
    

    So, not accepting bytes in py3k is clearly a violation of the PEP!

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Dec 22, 2008

    Antoine Pitrou wrote:

    FYI, instead of trying to do exhaustive type checking in _check_type(),
    you can just rely on duck typing and catch the TypeError:

    Good point! I'll update the patch soon.

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Dec 22, 2008

    Antoine Pitrou wrote:

    > If you want to change to using bytes, you're going to have to take it
    > to the Web-SIG and hash out a revision to PEP-333, which at the moment
    > requires the use of strings, period.

    What was called str in 2.x has become the bytes object in py3k.
    What was called unicode in 2.x has become str in py3k.
    (roughly)

    Agreed, moreover it's time for Python 3.0.1 and we need to decide -
    apply a patch or just remove wsgiref completely for now. Keep wsgiref
    just as nonworking piece of code is the worse solution which can made
    questionable all WSGI effort.

    Given that old str has been replaced by bytes in Python 3 I think the
    patch is a correct implementation of the PEP from the Python 3's point
    of view. To avoid confusion note about the meaning of the term *string*
    can be added to the PEP later.

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Dec 22, 2008

    New version of the patch:

    • Now only Unicode strings are allowed as status and headers because
      allowing bytes leads to big changes in wsgiref.validate and
      wsgiref.handlers;

    @pjeby
    Copy link
    Mannequin

    pjeby mannequin commented Dec 22, 2008

    At 03:37 PM 12/22/2008 +0000, Antoine Pitrou wrote:

    So, not accepting bytes in py3k is clearly a violation of the PEP!

    On the contrary. Please read the two paragraphs *after* the one you quoted.

    @pjeby
    Copy link
    Mannequin

    pjeby mannequin commented Dec 22, 2008

    To be quite clear: this change requires discussion on the Web-SIG and an
    appropriate revision of the PEP. Ideally, the patch should include the
    necessary PEP revision.

    The Web-SIG discussion regarding a switch to bytes should also take into
    consideration the effects of running 2to3 on existing WSGI applications
    and/or servers. Will their code be converted to use bytes, or Unicode?

    The previous choice to use Unicode was based on source compatibility
    across Python implementations, so this shouldn't be thrown out on the
    basis of simple expediency.

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Dec 23, 2008

    OK, I've attached PEP-333 compatible fixes for wsgiref. I think there is
    only one problem remains:

    • wsgiref expects io.BytesIO as input and output streams because of
      http.server module. I didn't find any restrictions on data returned by
      read() method of the "wsgi.input" stream in the PEP. Maybe I've missed
      some details?

    @pitrou
    Copy link
    Member

    pitrou commented Dec 23, 2008

    Please read the two paragraphs *after* the one you quoted.

    I don't see anything forbidding bytes objects in those two paragraphs.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 23, 2008

    Le mardi 23 décembre 2008 à 11:15 +0000, Dmitry Vasiliev a écrit :

    Dmitry Vasiliev <dima@hlabs.spb.ru> added the comment:

    OK, I've attached PEP-333 compatible fixes for wsgiref.

    I may be mistaken, but it seems that your patch forces iso-8859-1
    encoding of http bodies.

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Dec 23, 2008

    Antoine Pitrou wrote:

    Le mardi 23 décembre 2008 à 11:15 +0000, Dmitry Vasiliev a écrit :
    > OK, I've attached PEP-333 compatible fixes for wsgiref.

    I may be mistaken, but it seems that your patch forces iso-8859-1
    encoding of http bodies.

    No, just as PEP said str used as a container for binary data. For
    example to return UTF-8 encoded data you can use the following code:

         def app(environ, start_response):
             ...
             return [data.encode("utf-8").decode("iso-8859-1")]

    I don't like it but I guess it's strictly follow the PEP (actually I
    didn't notice this sections before):

    """
    On Python platforms where the str or StringType type is in fact
    Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all
    "strings" referred to in this specification must contain only code
    points representable in ISO-8859-1 encoding (\u0000 through \u00FF,
    inclusive). It is a fatal error for an application to supply strings
    containing any other Unicode character or code point. Similarly, servers
    and gateways must not supply strings to an application containing any
    other Unicode characters.

    Again, all strings referred to in this specification must be of type str
    or StringType, and must not be of type unicode or UnicodeType. And, even
    if a given platform allows for more than 8 bits per character in
    str/StringType objects, only the lower 8 bits may be used, for any value
    referred to in this specification as a "string".
    """

    We definitely need to use bytes in the future but it requires PEP update
    and some migration guide.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 23, 2008

    No, just as PEP said str used as a container for binary data.

    This is clearly the wrong thing to do. The only (immutable) string-like
    object adequate for binary data in py3k is bytes, not str.

    I understand the desire to stick to the PEP, but the PEP was devised
    even before the first py3k alphas, and it clearly wasn't written with
    py3k in mind. For example the following sentence becomes nonsensical:

        Again, all strings referred to in this specification must be of
        type str or StringType, and must not be of type unicode or
        UnicodeType.
    

    since "str" objects *are* of type UnicodeType in py3k (and the C API is
    still named PyUnicode_*)...

    When a legal text becomes nonsensical wrt. reality, one has to adapt his
    interpretation of the text to reality, not adapt reality to match the
    nonsense.

    In other words, wsgiref should accept/expose HTTP bodies as bytes, not
    str. Confusing binary data with iso-8859-1 text is the kind of mess py3k
    was designed to avoid.

    @pjeby
    Copy link
    Mannequin

    pjeby mannequin commented Dec 23, 2008

    Antoine, you have three choices here:

    1. Follow the PEP,
    2. Take it to the Web-SIG and get the appropriate discussion,
      consensus, and PEP revision, or
    3. Drop wsgiref from the current release of Py3K until Rename README to README.rst and enhance formatting #2 can be done.

    Which would you prefer?

    Please note that your arguments regarding what revision should take
    place are irrelevant here; the correct forum for that discussion is
    the Web-SIG. Personally, I think they are valid arguments; WSGI
    simply did not have the benefit of having a sane (and standard!)
    "bytes" type available, and were we writing the spec today, I would
    absolutely have specified it in a bytes-oriented way, and treated
    older Pythons as the special case.

    However, we have to take into consideration how applications will be
    *migrated* to Py3K. I am not an expert in this, nor do I personally
    have huge volumes of code that will be migrated. Therefore, the
    correct forum for discussing migration impact and how best to write
    the spec is the Web-SIG.

    Making the change to bytes is not something to be undertaken on a
    whim: the spec must include how to handle inadvertent mixing of bytes
    and unicode, in order to allow unambiguous error handling and
    migration support. It will not be solved by the fiat of one
    individual: certainly not by you or I. And it has absolutely nothing
    to do with what is "right" in the technical sense, because it is not
    a technical problem. A specification is a social construct, not a
    technical one, so changing wsgiref by itself solves nothing.

    And that's why those three choices are the only available options, so
    far as I am aware.

    @grahamd
    Copy link
    Mannequin

    grahamd mannequin commented Dec 23, 2008

    Note that the page:

    http://www.wsgi.org/wsgi/Amendments_1.0

    contains clarifications for WSGI PEP in respect of Python 3.0. This list
    was previously come up with on WEB-SIG list.

    As another reference implementation for Python 3.0, you might look at
    mod_wsgi (source code from subversion trunk), as that has been updated to
    support Python 3.0 in line with those list of proposed clarifications for
    WSGI PEP.

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Dec 24, 2008

    Attached new WSGI 1.0+ version of the patch.

    @pjeby
    Copy link
    Mannequin

    pjeby mannequin commented Dec 24, 2008

    Graham: thanks for pointing that out; I completely forgot we already
    *had* the migration discussion on the Web-SIG! It just slipped my
    mind because I didn't have any 3.0 work on the horizon.

    Dmitry: A question about the new patch. Are bytearray and memoryview
    objects in 3.0 really the same as bytestrings? It seems to me that
    allowing mutable bytes objects is a mistake from a bug-finding
    standpoint, even if it could be a win from a performance
    standpoint. I think it might be better to be more restrictive to
    start out, and then let people lobby for supporting other types,
    rather than the other way around, where we'll never get to narrow the
    list. Apart from that, the patch looks pretty good. Thank you!

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Dec 25, 2008

    Phillip J. Eby wrote:

    Graham: thanks for pointing that out; I completely forgot we already
    *had* the migration discussion on the Web-SIG! It just slipped my
    mind because I didn't have any 3.0 work on the horizon.

    Good to see we came to conclusion. Actually my first patch went in the
    right direction. :-)

    Dmitry: A question about the new patch. Are bytearray and memoryview
    objects in 3.0 really the same as bytestrings? It seems to me that
    allowing mutable bytes objects is a mistake from a bug-finding
    standpoint, even if it could be a win from a performance
    standpoint. I think it might be better to be more restrictive to
    start out, and then let people lobby for supporting other types,
    rather than the other way around, where we'll never get to narrow the
    list. Apart from that, the patch looks pretty good. Thank you!

    Actually I thought about functionality, not performance but I think
    you're right and mutable bytes objects also can open doors for
    unexpected side effects. I'll update the patch today.

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Dec 25, 2008

    Attached updated version of the patch.

    @grahamd
    Copy link
    Mannequin

    grahamd mannequin commented Dec 26, 2008

    If making changes in wsgireg.validate, may be worthwhile also fixing up one area where it isn't strictly correct
    according to WSGI PEP.

    As per discussion:

    http://groups.google.com/group/python-web-sig/browse_frm/thread/b14b862ec4c620c0

    the check for number of arguments supplied to wsgi.input.read() is wrong as it allows for an optional argument,
    when argument is supposed to mandatory.

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Dec 26, 2008

    Graham Dumpleton wrote:

    the check for number of arguments supplied to wsgi.input.read() is wrong as it allows for an optional argument,
    when argument is supposed to mandatory.

    I think it's a good idea. I'll update the patch soon.

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Dec 26, 2008

    Added check for wsgi.input.read() argument.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 26, 2008

    Hi,

    The patch looks ok to me, although the tests against mutable byte-like
    types are probably useless.

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Dec 27, 2008

    Antoine Pitrou wrote:

    The patch looks ok to me, although the tests against mutable byte-like
    types are probably useless.

    Hmm, it's strange because such tests was removed two versions ago (per
    discussion with Phillip). But at the time they really was needed.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 27, 2008

    Hmm, it's strange because such tests was removed two versions ago (per
    discussion with Phillip). But at the time they really was needed.

    Not a big deal anyway, let's keep them and we'll see.

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Dec 27, 2008

    Antoine Pitrou wrote:

    > Hmm, it's strange because such tests was removed two versions ago (per
    > discussion with Phillip). But at the time they really was needed.

    Not a big deal anyway, let's keep them and we'll see.

    I'm afraid I've lost your point here. Are you proposing to add back
    tests for mutable bytes-like objects?

    @pitrou
    Copy link
    Member

    pitrou commented Dec 27, 2008

    Why do you say they were removed? I see code like "assert
    isinstance(value, bytes)".

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Dec 27, 2008

    Antoine Pitrou wrote:

    Why do you say they were removed? I see code like "assert
    isinstance(value, bytes)".

    Support and tests for mutable "bytearray" and "memoryview" was removed.
    All subclasses of "bytes" must be immutable so isinstance() should be OK
    here.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 27, 2008

    Ok, sorry for the misunderstanding.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 31, 2008

    Philip, Graham, do you have any objections to the current patch?
    Otherwise I think I'm gonna commit it soon.

    @grahamd
    Copy link
    Mannequin

    grahamd mannequin commented Jan 1, 2009

    One interesting thing of note that has occurred to me looking at the patch
    is that although with Python <3.0 you technically could return a str as
    iterable from application, ie., because iteration over str returns str for
    each character, the same doesn't really apply to bytes in Python 3.0. This
    is because iterating over bytes yields an int fdor each item.

    Thus have odd situation where with Python 3.0, one could technically return
    str as iterable, with rule that would apply would be that each str returned
    would then be converted to bytes by way of latin-1 conversion, but for
    bytes returned as iterable, should fail.

    Not sure how this plays out in wsgiref server yet as haven't looked.
    Anyway, make the validator code:

    @@ -426,6 +436,6 @@
         # Technically a string is legal, which is why it's a really bad
         # idea, because it may cause the response to be returned
         # character-by-character
    -    assert_(not isinstance(iterator, str),
    +    assert_(not isinstance(iterator, (str, bytes)),
             "You should not return a string as your application iterator, "
             "instead return a single-item list containing that string.")

    quite a good thing to have.

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Jan 2, 2009

    Graham Dumpleton wrote:

    Thus have odd situation where with Python 3.0, one could technically return
    str as iterable, with rule that would apply would be that each str returned
    would then be converted to bytes by way of latin-1 conversion, but for
    bytes returned as iterable, should fail.

    Not sure how this plays out in wsgiref server yet as haven't looked.

    If application return bytes instead of an iterable AssertionError will
    be raised in handlers.BaseHandler.write().

    @pitrou
    Copy link
    Member

    pitrou commented Jan 3, 2009

    It is committed now in py3k and the 3.0 maintenance branch. Thanks all
    for your participation!

    @pitrou pitrou closed this as completed Jan 3, 2009
    @pitrou
    Copy link
    Member

    pitrou commented Jan 3, 2009

    Reopening, the patch actually produces failures when run with "python
    -bb", that is there are comparisons between str and bytes.

    See the errors at the end of
    http://www.python.org/dev/buildbot/3.x.stable/ppc%20Debian%20unstable%203.0/builds/26/step-test/0

    @pitrou pitrou reopened this Jan 3, 2009
    @pitrou
    Copy link
    Member

    pitrou commented Jan 3, 2009

    People, does this patch look ok to you?

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Jan 3, 2009

    Antoine Pitrou wrote:

    People, does this patch look ok to you?

    Oh, didn't know about -bb.
    The patch looks OK for me.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 3, 2009

    Le samedi 03 janvier 2009 à 20:24 +0000, Dmitry Vasiliev a écrit :

    Dmitry Vasiliev <dima@hlabs.spb.ru> added the comment:

    Antoine Pitrou wrote:
    > People, does this patch look ok to you?

    Oh, didn't know about -bb.

    Well, it's meant to catch potential bugs. str and bytes always compare
    unequal in py3k (i.e. "a" != b"a"), so it's good to use -bb when porting
    some stuff from 2.x.

    There's another problem in that buildbot failure with the environment
    variable "NO_PROXY". We'll see if it's still there after the patch.

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Jan 3, 2009

    Antoine Pitrou wrote:

    There's another problem in that buildbot failure with the environment
    variable "NO_PROXY". We'll see if it's still there after the patch.

    Strange error and it seems there is only part of the traceback. I've
    already seen such "partially displayed" Python 3 traceback and error
    actually can be in very different place.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 3, 2009

    Strange error and it seems there is only part of the traceback. I've
    already seen such "partially displayed" Python 3 traceback and error
    actually can be in very different place.

    If you can reproduce such a problem, please open a bug.

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Jan 3, 2009

    Antoine Pitrou wrote:

    > Strange error and it seems there is only part of the traceback. I've
    > already seen such "partially displayed" Python 3 traceback and error
    > actually can be in very different place.

    If you can reproduce such a problem, please open a bug.

    OK, I'll try to reproduce.

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Jan 3, 2009

    Attached patch for test_urllib, possible source of the "NO_PROXY" problem.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 3, 2009

    Nice catch! I've committed the two patches and we'll see whether it
    makes the buildbots feel better.

    @benjaminp
    Copy link
    Contributor

    I assume the buildbots were placated?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    release-blocker stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants