Author william.ayd
Recipients william.ayd
Date 2019-12-21.03:32:54
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1576899174.72.0.883256961456.issue39113@roundup.psfhosted.org>
In-reply-to
Content
With the attached extension module, if I run the following in the REPL:

>>> import libtest
>>>
>>> libtest.error_if_not_utf8("foo")
'foo'
>>> libtest.error_if_not_utf8("\ud83d")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud83d' in position 0: surrogates not allowed
>>> libtest.error_if_not_utf8("foo")
'foo'

Things seem OK. But the next invocation of

>>> libtest.error_if_not_utf8("\ud83d")

Then causes a segfault. Note that the order of the input seems important; simply repeating the call with the invalid surrogate doesn't cause the segfault
History
Date User Action Args
2019-12-21 03:32:54william.aydsetrecipients: + william.ayd
2019-12-21 03:32:54william.aydsetmessageid: <1576899174.72.0.883256961456.issue39113@roundup.psfhosted.org>
2019-12-21 03:32:54william.aydlinkissue39113 messages
2019-12-21 03:32:54william.aydcreate