classification
Title: collections.UserString encode method returns a string
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Mariatta, cheryl.sabella, dfortunov, mblahay, rhettinger, trey, xtreak
Priority: normal Keywords: easy, patch

Created on 2019-04-09 23:38 by trey, last changed 2019-05-06 21:23 by dfortunov.

Pull Requests
URL Status Linked Edit
PR 13138 open dfortunov, 2019-05-06 21:19
Messages (7)
msg339818 - (view) Author: Trey Hunner (trey) * Date: 2019-04-09 23:38
It looks like the encode method for UserString incorrectly wraps its return value in a str call.

```
>>> from collections import UserString
>>> UserString("hello").encode('utf-8') == b'hello'
False
>>> UserString("hello").encode('utf-8')
"b'hello'"
>>> type(UserString("hello").encode('utf-8'))
<class 'collections.UserString'>
```
msg339824 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-04-10 05:22
Trey, would you like to submit a PR to fix this?  (Be sure to add a test case).
msg341550 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-05-06 15:55
I think this is an easy issue. The relevant code is at https://github.com/python/cpython/blob/cec01849f142ea96731b4725975b89d3af757656/Lib/collections/__init__.py#L1210 where the encoded result has to be fixed. Trey, if you haven't started working on it I think it's a good first issue for sprints.

A simple unittest patch that fails on master. This can have additional tests with both encoding and errors present and both of them absent hitting all three code paths in the function.

diff --git a/Lib/test/test_userstring.py b/Lib/test/test_userstring.py
index 71528223d3..81a4908dbd 100644
--- a/Lib/test/test_userstring.py
+++ b/Lib/test/test_userstring.py
@@ -39,6 +39,11 @@ class UserStringTest(
         # we don't fix the arguments, because UserString can't cope with it
         getattr(object, methodname)(*args)

+    def test_encode(self):
+        data = UserString("hello")
+        self.assertEqual(data.encode(encoding='utf-8'), b'hello')

 if __name__ == "__main__":
     unittest.main()
msg341563 - (view) Author: Daniel Fortunov (dfortunov) * Date: 2019-05-06 16:55
I'll pick this up in the PyCon US 2019 sprint this afternoon.
msg341593 - (view) Author: Michael Blahay (mblahay) * Date: 2019-05-06 18:38
I will pick this on up
msg341594 - (view) Author: Michael Blahay (mblahay) * Date: 2019-05-06 18:42
My mistake, dfortunov is already working on this one.
msg341649 - (view) Author: Daniel Fortunov (dfortunov) * Date: 2019-05-06 21:23
PR submitted here:
https://github.com/python/cpython/pull/13138

Rather than adding three different tests for the different code paths I chose to collapse the three different code paths by surfacing the underlying str.encode() defaults in the method signature of UserString.encode(), taking it down to a one-line implementation.

@xtreak: Thanks for the super-helpful triage and failing test case!
History
Date User Action Args
2019-05-06 21:23:42dfortunovsetmessages: + msg341649
2019-05-06 21:19:49dfortunovsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request13051
2019-05-06 18:42:31mblahaysetmessages: + msg341594
2019-05-06 18:38:51mblahaysetnosy: + mblahay
messages: + msg341593
2019-05-06 16:55:11dfortunovsetnosy: + dfortunov
messages: + msg341563
2019-05-06 15:56:32serhiy.storchakasetkeywords: + easy
stage: needs patch
2019-05-06 15:55:10xtreaksetnosy: + xtreak, cheryl.sabella, Mariatta
messages: + msg341550
2019-04-10 05:22:31rhettingersetassignee: rhettinger
type: behavior
messages: + msg339824
versions: + Python 3.8
2019-04-09 23:51:04xtreaksetnosy: + rhettinger
2019-04-09 23:38:53treycreate