classification
Title: Thread hangs on str.encode() when locale is not set
Type: behavior Stage:
Components: Documentation Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: brett.cannon, docs@python, eric.snow, ezio.melotti, joshpurvis, ncoghlan, r.david.murray, vstinner
Priority: normal Keywords:

Created on 2016-06-25 22:03 by joshpurvis, last changed 2016-06-27 18:21 by r.david.murray.

Messages (6)
msg269262 - (view) Author: Josh Purvis (joshpurvis) Date: 2016-06-25 22:03
This bug manifest itself in at least one very specific situation:

    1. No locale is set on the machine
    2. A file (test1.py) imports a second (test2.py)
    3. The second file (test2.py) calls str.encode() from inside a thread
    4. Running Python 2.7

[Environment with no locale set]:

    # both of these are unset:
    $ echo $LC_CTYPE

    $ echo $LANG        

    $

[test1.py]:

    import test2

[test2.py]:

    from threading import Thread

    class TestThread(Thread):
        def run(self):
            msg = 'Error from server: code=000a'
            print msg
            msg = msg.encode('utf-8')

    t = TestThread()
    t.start()
    t.join()

    print 'done'

[Expected behavior]:

    $ python test1.py                                                                         
    Error from server: code=000a
    done

[Actual behavior]: 

    $ python test1.py                                                                         
    Error from server: code=000a
    [script hangs here indefinitely]

Much thanks to Alan Boudreault, a developer of the cassandra-driver Python package, for helping me locate this bug and further narrow it down to the threading module. The above code snippet was copied from his comment on my issue over there (https://datastax-oss.atlassian.net/browse/PYTHON-592).

Another curious behavior is that if you modify test1.py to decode any string prior to the import, it implicitly fixes the issue:

[test1.py']:

    "any string".decode('utf-8')
    import test2

I realize that one should probably always have a locale set, however, this proved to be very difficult to isolate, especially given that it works if no import occurs or a string is decoded prior to the import.
msg269281 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-26 08:58
It is a deadlock on the import lock. You should avoid creating and waiting
for a thread when a module is imported. Defer the creation of the thread.
msg269310 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-06-26 20:54
This situation is warned about explicitly in the threading docs (https://docs.python.org/2/library/threading.html#importing-in-threaded-code).  The import deadlock is fixed in python3, but it is still a really bad idea to launch threads on module import.

What isn't obvious, of course, is that calling encode for the first time for a given encoding does an implicit import of the relevant encoding.  I don't think encodings is the only stdlib module that does implicit imports, but it is probably the most used case.  Maybe it is worth adding a warning to that section of the 2.7 docs about implicit imports in general and encode/decode in particular?
msg269386 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-06-27 16:33
Adding a note to the docs sounds reasonable.
msg269387 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-27 16:50
> Maybe it is worth adding a warning to that section of the 2.7 docs about implicit imports in general and encode/decode in particular?

Ok to add a note to str.encode and str.decode methods to explain that
an import is needed the first time that an encoding is used.

I'm not ok for a warning, we should not discourage developers to use
these methods! They are not dangerous by themself.
msg269392 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-06-27 18:21
No, I'm talking about the threading docs, not the encoding docs.  I think that's the only place it matters.  Specifically, in the section that I linked to, in the bullet point that warns against launching threads on import, it can note that even if you try to make your own code avoid the import lock, implicit imports such as the one done by encode/decode can trip you up.
History
Date User Action Args
2016-06-27 18:21:39r.david.murraysetmessages: + msg269392
2016-06-27 16:50:39vstinnersetmessages: + msg269387
2016-06-27 16:33:41brett.cannonsetmessages: + msg269386
2016-06-26 20:54:07r.david.murraysetnosy: + r.david.murray, docs@python
messages: + msg269310

assignee: docs@python
components: + Documentation, - Interpreter Core, Unicode
2016-06-26 08:58:24vstinnersetmessages: + msg269281
2016-06-26 05:44:45serhiy.storchakasetnosy: + brett.cannon, ncoghlan, eric.snow
2016-06-25 22:03:39joshpurviscreate