This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re.compile() repr end quote truncated
Type: behavior Stage:
Components: Regular Expressions Versions: Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ThiefMaster, abarry, ezio.melotti, gvanrossum, mrabarnett, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-01-09 21:18 by ThiefMaster, last changed 2022-04-11 14:58 by admin.

Messages (11)
msg257860 - (view) Author: ThiefMaster (ThiefMaster) Date: 2016-01-09 21:18
```
Python 3.4.3 (default, Jan  5 2016, 23:13:10)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> for n in range(198, 201):
...     print(re.compile('x' * n))
...
re.compile('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
re.compile('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
re.compile('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
```

The closing quote in the repr goes away once the regex exceeds a certain length. This smells like an off-by-one somewhere that results in the last character to be lost. In any case, it's pretty ugly since the repr clearly pretends to be executable Python code, which is not the case anymore with this quote missing.
msg257861 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-01-09 21:27
It's simply the effect of the "%.200R" format here:
https://hg.python.org/cpython/file/default/Modules/_sre.c#l1417

I recommend not bothering to fix this: it would just be more code, and to what end?
msg257863 - (view) Author: ThiefMaster (ThiefMaster) Date: 2016-01-09 21:30
I think it's pretty ugly to have a repr that is valid python code in most cases and suddenly stops being so.

The repr of a string is not truncated, so why truncate it in a pattern object?

With the truncation, why not use a repr style that's clearly not executable to recreate the original object, e.g. `<re pattern %.200R>`
msg257865 - (view) Author: Anilyka Barry (abarry) * (Python triager) Date: 2016-01-09 21:39
Truncating at 200 characters is actually a common occurrence in the C code, just barely anyone notice this, as it's not common to need more than 200 characters for most expressions.

I don't think this needs to be changed at all; the rare case should not affect the common ones. If eval()'ing your repr fails, you can always access the full regex from `exp.pattern` - which doesn't truncate at 200 characters :)
msg257866 - (view) Author: ThiefMaster (ThiefMaster) Date: 2016-01-09 21:40
Not eval'ing it, just wondered why the repr looks so weird when printing an object containing a compiled regex ;)
msg257867 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2016-01-09 21:42
I'm going to have to agree with ThiefMaster. String literals don't truncate like that, nor do lists, nor tuples.

Are there any similar cases of truncation elsewhere in the standard library?
msg257868 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-01-09 21:44
Yes, the C code pretty much always uses %.<number><format>, in order to
protect itself from buffer overflows. Pulling off an unabbreviated str()
here would be a major piece of work.
msg257869 - (view) Author: ThiefMaster (ThiefMaster) Date: 2016-01-09 21:46
Would it be possible to preserve the quotes even in case of truncation?
msg257871 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-01-09 21:51
That would be an interesting exercise. You'd have to patch
PyUnicode_FromFormat(). It would be nice to have this. It should probably
also insert some dots (the universal sign to indicate that something was
truncated).
msg258028 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2016-01-11 21:55
FTR the repr was added in #13592.
msg258094 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-12 09:56
More smart truncating with closing quote and dots was the idea that I want to propose since introducing %R, but I didn't have time to formulate this. Opened issue26090 for this.
History
Date User Action Args
2022-04-11 14:58:26adminsetgithub: 70256
2016-01-12 09:56:19serhiy.storchakasetmessages: + msg258094
2016-01-11 21:55:05ezio.melottisetversions: + Python 3.5, Python 3.6, - Python 3.4
nosy: + serhiy.storchaka

messages: + msg258028

type: behavior
2016-01-09 21:51:10gvanrossumsetmessages: + msg257871
2016-01-09 21:46:07ThiefMastersetmessages: + msg257869
2016-01-09 21:44:43gvanrossumsetmessages: + msg257868
2016-01-09 21:42:04mrabarnettsetmessages: + msg257867
2016-01-09 21:40:42ThiefMastersetmessages: + msg257866
2016-01-09 21:39:30abarrysetnosy: + abarry
messages: + msg257865
2016-01-09 21:30:12ThiefMastersetmessages: + msg257863
2016-01-09 21:27:36gvanrossumsetnosy: + gvanrossum
messages: + msg257861
2016-01-09 21:18:18ThiefMastercreate