Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread hangs on str.encode() when locale is not set #71574

Closed
joshpurvis mannequin opened this issue Jun 25, 2016 · 7 comments
Closed

Thread hangs on str.encode() when locale is not set #71574

joshpurvis mannequin opened this issue Jun 25, 2016 · 7 comments
Labels
docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error

Comments

@joshpurvis
Copy link
Mannequin

joshpurvis mannequin commented Jun 25, 2016

BPO 27387
Nosy @ncoghlan, @vstinner, @ezio-melotti, @bitdancer, @ericsnowcurrently, @iritkatriel

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-11-30.10:15:56.191>
created_at = <Date 2016-06-25.22:03:39.351>
labels = ['type-bug', 'docs']
title = 'Thread hangs on str.encode() when locale is not set'
updated_at = <Date 2020-11-30.10:15:56.190>
user = 'https://bugs.python.org/joshpurvis'

bugs.python.org fields:

activity = <Date 2020-11-30.10:15:56.190>
actor = 'iritkatriel'
assignee = 'docs@python'
closed = True
closed_date = <Date 2020-11-30.10:15:56.191>
closer = 'iritkatriel'
components = ['Documentation']
creation = <Date 2016-06-25.22:03:39.351>
creator = 'joshpurvis'
dependencies = []
files = []
hgrepos = []
issue_num = 27387
keywords = []
message_count = 7.0
messages = ['269262', '269281', '269310', '269386', '269387', '269392', '382136']
nosy_count = 8.0
nosy_names = ['ncoghlan', 'vstinner', 'ezio.melotti', 'r.david.murray', 'docs@python', 'eric.snow', 'joshpurvis', 'iritkatriel']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue27387'
versions = ['Python 2.7']

@joshpurvis
Copy link
Mannequin Author

joshpurvis mannequin commented Jun 25, 2016

This bug manifest itself in at least one very specific situation:

1. No locale is set on the machine
2. A file (test1.py) imports a second (test2.py)
3. The second file (test2.py) calls str.encode() from inside a thread
4. Running Python 2.7

[Environment with no locale set]:

# both of these are unset:
$ echo $LC_CTYPE
    $ echo $LANG        
$

[test1.py]:

    import test2

[test2.py]:

    from threading import Thread

    class TestThread(Thread):
        def run(self):
            msg = 'Error from server: code=000a'
            print msg
            msg = msg.encode('utf-8')

    t = TestThread()
    t.start()
    t.join()
print 'done'

[Expected behavior]:

    $ python test1.py                                                                         
    Error from server: code=000a
    done

[Actual behavior]:

    $ python test1.py                                                                         
    Error from server: code=000a
    [script hangs here indefinitely]

Much thanks to Alan Boudreault, a developer of the cassandra-driver Python package, for helping me locate this bug and further narrow it down to the threading module. The above code snippet was copied from his comment on my issue over there (https://datastax-oss.atlassian.net/browse/PYTHON-592).

Another curious behavior is that if you modify test1.py to decode any string prior to the import, it implicitly fixes the issue:

[test1.py']:

"any string".decode('utf-8')
import test2

I realize that one should probably always have a locale set, however, this proved to be very difficult to isolate, especially given that it works if no import occurs or a string is decoded prior to the import.

@joshpurvis joshpurvis mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode type-bug An unexpected behavior, bug, or error labels Jun 25, 2016
@vstinner
Copy link
Member

It is a deadlock on the import lock. You should avoid creating and waiting
for a thread when a module is imported. Defer the creation of the thread.

@bitdancer
Copy link
Member

This situation is warned about explicitly in the threading docs (https://docs.python.org/2/library/threading.html#importing-in-threaded-code). The import deadlock is fixed in python3, but it is still a really bad idea to launch threads on module import.

What isn't obvious, of course, is that calling encode for the first time for a given encoding does an implicit import of the relevant encoding. I don't think encodings is the only stdlib module that does implicit imports, but it is probably the most used case. Maybe it is worth adding a warning to that section of the 2.7 docs about implicit imports in general and encode/decode in particular?

@bitdancer bitdancer added docs Documentation in the Doc dir and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode labels Jun 26, 2016
@brettcannon
Copy link
Member

Adding a note to the docs sounds reasonable.

@vstinner
Copy link
Member

Maybe it is worth adding a warning to that section of the 2.7 docs about implicit imports in general and encode/decode in particular?

Ok to add a note to str.encode and str.decode methods to explain that
an import is needed the first time that an encoding is used.

I'm not ok for a warning, we should not discourage developers to use
these methods! They are not dangerous by themself.

@bitdancer
Copy link
Member

No, I'm talking about the threading docs, not the encoding docs. I think that's the only place it matters. Specifically, in the section that I linked to, in the bullet point that warns against launching threads on import, it can note that even if you try to make your own code avoid the import lock, implicit imports such as the one done by encode/decode can trip you up.

@iritkatriel
Copy link
Member

Python 2 issue.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants