This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gvanrossum
Recipients gvanrossum
Date 2013-07-15.23:37:12
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1373931432.85.0.594082146564.issue18468@psf.upfronthosting.co.za>
In-reply-to
Content
I discovered that the Python 3 version of
the re module's Match object behaves subtly different from the Python
2 version when the target string (i.e. the haystack, not the needle)
is a buffer object.

In Python 2, the type of the return value of group() is always either
a Unicode string or an 8-bit string, and the type is determined by
looking at the target string -- if the target is unicode, group()
returns a unicode string, otherwise, group() returns an 8-bit string.
In particular, if the target is a buffer object, group() returns an
8-bit string. I think this is the appropriate behavior: otherwise
using regular expression matching to extract a small substring from a
large target string would unnecessarily keep the large target string
alive as long as the substring is alive.

But in Python 3, the behavior of group() has changed so that its
return type always matches that of the target string. I think this is
bad -- apart from the lifetime concern, it means that if your target
happens to be a bytearray, the return value isn't even hashable!

Proper behavior should be that .group() returned a bytes object if the input was binary data and a str object if the input was unicode data (str) regardless of specific types containing the input target data.

Probably not much, if anything, would be depending on getting a bytearray out of that. Fix this in 3.4? 3.3 and earlier users are stuck with an extra bytes() call and data copy in these cases.

[Further discussion at http://mail.python.org/pipermail/python-dev/2013-July/127332.html]
History
Date User Action Args
2013-07-15 23:37:12gvanrossumsetrecipients: + gvanrossum
2013-07-15 23:37:12gvanrossumsetmessageid: <1373931432.85.0.594082146564.issue18468@psf.upfronthosting.co.za>
2013-07-15 23:37:12gvanrossumlinkissue18468 messages
2013-07-15 23:37:12gvanrossumcreate