Title: re.compile's repr truncates patterns at 200 characters
Type: behavior Stage:
Components: Regular Expressions Versions: Python 3.8
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, ezio.melotti, matpi, mrabarnett, serhiy.storchaka, xtreak
Priority: normal Keywords:

Created on 2020-06-15 13:29 by matpi, last changed 2020-06-16 13:06 by matpi.

Messages (8)
msg371541 - (view) Author: Quentin Wenger (matpi) Date: 2020-06-15 13:29
This seems somewhat arbitrary and yields unusable results, going against the doc:

> repr(object)
> Return a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object. A class can control what this function returns for its instances by defining a __repr__() method.

The truncated representation neither "yields an object with the same value" (it raises a SyntaxError, of course, due to the missing quote and closing parenthesis), nor is "enclosed in angle brackets".

>>> import re
>>> re.compile("()"*99)
>>> re.compile("()"*100)
msg371542 - (view) Author: Quentin Wenger (matpi) Date: 2020-06-15 13:33
Note: it actually truncates at 200 characters, counting the initial quote of the argument's repr.
msg371556 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2020-06-15 14:53
This seems similar to
msg371627 - (view) Author: Quentin Wenger (matpi) Date: 2020-06-16 09:59
Pardon me, but I see an important difference with the other bug report: that one is about a repr in angle brackets, and as such does not require an exact output, so an ellipsis is good enough.

In this bug, the output of repr gives a string than can, at least for small enough patterns, be passed to eval() to recontruct an object. So there is no good reason that this can be done for patterns up to 200 characters but not above; furthermore it is undocumented and goes against the doc on repr.

Compare with a complexly-nested structure of, say, lists, dicts and strings: The repr will always be "reconstructible", even if it is well above 200 characters.

Also, a common way to write repr is to draw the outer "container" as a string, and fill it with the (full!) repr of the object's parameters. E.g. the repr of a list containing a 1000-character string will simply write square brackets around the 1002-character repr of the string. re.compile doesn't conform to this "rule".
msg371632 - (view) Author: Quentin Wenger (matpi) Date: 2020-06-16 10:26
All in all, it is simply a matter of compliance. The doc of repr says that a repr is either

- a string that can be eval()'ed back to (an equivalent of) the original object
- or a "more loose" angle-bracket representation.

re.compile with small patterns falls in the first category. The other bug report corresponds to the second one, no problem.

However, re.compile with large patterns doesn't fall in either category, nor would it if changed to use an ellipsis.
msg371640 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-06-16 12:28
Any change to the repr should take place on the other issue. I don't feel very strongly that the repr must be eval-able, but in any event it should be raised on issue39949 if you feel strongly about it.

I do think it's reasonable to say that if the repr is truncated, it must not be eval-able. Maybe the ellipsis should go outside the quotes, or something else to make it fail. But this should be discussed on issue39949.

I'm going to close this.
msg371649 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-06-16 12:54
Re-opening this because issue39949 is about match objects, not compiled re objects.

Still, I don't think the repr "rule" about being eval-able is hard and fast. Although changing the repr to be in brackets wouldn't be unreasonable just to drive the point home.
msg371653 - (view) Author: Quentin Wenger (matpi) Date: 2020-06-16 13:06
I welcome any counter-example to the eval()'able property in the stdlib.

I do believe in this rule as hard and fast, because it works for small patterns, only bitting you when you grow, probably programmatically (so exactly when you actually could need the repr).

Furthermore, 200 seems very low anyway by today standards. I mean, if you want a repr in the first place, then chances are that you want it full if (reasonably) possible.

If a string repr's itself fully no matter what, why should re.compile arbitrarily decide to truncate its argument?
Date User Action Args
2020-06-16 13:06:47matpisetmessages: + msg371653
2020-06-16 12:54:54eric.smithsetstatus: closed -> open
superseder: truncating match in regular expression match objects repr ->
messages: + msg371649

resolution: duplicate ->
stage: resolved ->
2020-06-16 12:28:14eric.smithsetstatus: open -> closed

nosy: + eric.smith
messages: + msg371640
resolution: duplicate

superseder: truncating match in regular expression match objects repr
2020-06-16 10:26:05matpisetmessages: + msg371632
2020-06-16 09:59:08matpisetstatus: closed -> open
resolution: duplicate -> (no value)
messages: + msg371627
2020-06-16 07:08:07rhettingersetstatus: open -> closed
resolution: duplicate
stage: resolved
2020-06-15 14:53:38xtreaksetnosy: + serhiy.storchaka, xtreak
messages: + msg371556
2020-06-15 13:33:51matpisettitle: re.compile's repr truncates patterns at 199 characters -> re.compile's repr truncates patterns at 200 characters
2020-06-15 13:33:13matpisetmessages: + msg371542
2020-06-15 13:29:51matpicreate