Add a text truncation function #62785

pitrou · 2013-07-29T12:49:11Z

BPO	18585
Nosy	@warsaw, @birkenfeld, @pitrou, @ezio-melotti, @stevendaprano, @bitdancer, @dholth, @serhiy-storchaka, @vajrasky
Files	summarize.patch summarize2.patch shorten.patch shorten2.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2013-08-12.20:40:22.114>
created_at = <Date 2013-07-29.12:49:10.600>
labels = ['type-feature', 'library']
title = 'Add a text truncation function'
updated_at = <Date 2014-01-03.04:22:26.775>
user = 'https://github.com/pitrou'

bugs.python.org fields:

activity = <Date 2014-01-03.04:22:26.775>
actor = 'dholth'
assignee = 'none'
closed = True
closed_date = <Date 2013-08-12.20:40:22.114>
closer = 'pitrou'
components = ['Library (Lib)']
creation = <Date 2013-07-29.12:49:10.600>
creator = 'pitrou'
dependencies = []
files = ['31072', '31084', '31250', '31257']
hgrepos = []
issue_num = 18585
keywords = ['patch']
message_count = 27.0
messages = ['193862', '193867', '193873', '193874', '193879', '193880', '193898', '193920', '193921', '193975', '194187', '194188', '194189', '194483', '194484', '194486', '194501', '194987', '195005', '195012', '195013', '195020', '195412', '195413', '200050', '207203', '207204']
nosy_count = 10.0
nosy_names = ['barry', 'georg.brandl', 'pitrou', 'ezio.melotti', 'steven.daprano', 'r.david.murray', 'dholth', 'python-dev', 'serhiy.storchaka', 'vajrasky']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue18585'
versions = ['Python 3.4']

pitrou · 2013-07-29T12:49:10Z

Following patch proposed to add a function named textwrap.summarize():

   >>> textwrap.summarize("Hello  world!", width=12)
   'Hello world!'
   >>> textwrap.summarize("Hello  world!", width=11)
   'Hello (...)'

pitrou · 2013-07-29T13:55:31Z

Perhaps the "placeholder" argument should actually include the last whitespace, to allow people to omit the whitespace, or use a non-breaking space instead?

   >>> textwrap.summarize("Hello  world!", width=11, placeholder='...')
   'Hello...'

warsaw · 2013-07-29T15:20:38Z

On Jul 29, 2013, at 01:55 PM, Antoine Pitrou wrote:

Perhaps the "placeholder" argument should actually include the last whitespace, to allow people to omit the whitespace, or use a non-breaking space instead?

>>> textwrap.summarize("Hello world!", width=11, placeholder='...')
'Hello...'

I guess the placeholder default is ' (...)' then?

pitrou · 2013-07-29T15:34:38Z

>Perhaps the "placeholder" argument should actually include the last
>whitespace, to allow people to omit the whitespace, or use a
>non-breaking space instead?
>
> >>> textwrap.summarize("Hello world!", width=11,
> >>> placeholder='...')
> 'Hello...'

I guess the placeholder default is ' (...)' then?

Yeah.
(of course, another default could be used)

vajrasky · 2013-07-29T16:09:02Z

Something is not right if we use more than one space.

>>> textwrap.summarize('hello      world!', width=12)
'hello world!'
>>> textwrap.summarize('hello      world!', width=11)
'hello (...)'
>>> textwrap.summarize('hello      world!', width=10)
'(...)'

I expect the last statement would give result: 'hello (...)' because 'hello' is just 5 characters, less than 10.

vajrasky · 2013-07-29T16:33:24Z

Beside of that, I notice the new lines are deleted silently.

>>> textwrap.summarize('republicans are red,\ndemocrats are blue,\nneither one of them,\ncares about you.', width=46)
'republicans are red, democrats are blue, (...)'

pitrou · 2013-07-29T20:30:22Z

Vajrasky, thanks. The former is a bug, but the latter is a feature. summarize() re-uses the textwrap machinery to normalize spaces.

pitrou · 2013-07-30T09:34:30Z

Oops, sorry, I was mistaken. There is no bug actually here:

>>> textwrap.summarize('hello      world!', width=10)
'(...)'

'hello (...)' cannot be the right answer since its len() is 11, greater than 10.

pitrou · 2013-07-30T10:05:43Z

Updated patch to not add any space before the placeholder, with " (...)" as default placeholder value.

vajrasky · 2013-07-31T06:11:41Z

Monsieur Pitrou, thanks for the explanation. Actually, IMHO I prefer, 'hello (...)' should be the minimum words we can use not '(...)' because '(...)' does not make any sense. But, anyway, it's your call. :)

Anyway, using your summarize2.patch:

>>> textwrap.summarize('hello      world!', width=6)
'(...)'

>>> textwrap.summarize('hello      world!', width=5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ethan/Documents/code/python/cpython/Lib/textwrap.py", line 378, in summarize
    return w.summarize(text, placeholder=placeholder)
  File "/home/ethan/Documents/code/python/cpython/Lib/textwrap.py", line 314, in summarize
    raise ValueError("placeholder too large for max width")
ValueError: placeholder too large for max width

Why? '(...)' is 5 characters only. I checked the patch and found out that the placeholder is ' (...)' (with space) and you compare the width with the placeholder.

stevendaprano · 2013-08-02T17:12:05Z

A function like this often gets called to truncate lots of lines. Unfortunately for many use-cases, the part truncated is the most significant part of the line. E.g.:

Scanning file:
/home/fred/documents/datafil...
/home/fred/documents/datafil...
/home/fred/documents/datafil...
/home/fred/documents/datafil...

It's often better in cases like this to truncate in the middle:

Scanning file:
/home/fred/doc...datafile01.txt
/home/fred/doc...datafile02.txt
/home/fred/doc...datafile03.txt
/home/fred/doc...datafile04.txt

(I believe Mac OS-X routinely truncates long lines in the middle in this way.)

Perhaps there could be an argument controlling where to truncate (left, right or centre). A good use-case for the new Enums, perhaps? :-)

stevendaprano · 2013-08-02T17:19:05Z

Bike-shedding here... why "(...)"? Is it common to use round brackets for this purpose? In English-speaking countries, it is usual to use square brackets for editorial comments, including ellipsis "[...]".

Either way, if you wanted to be more Unicode aware, you could save two characters by using \N{HORIZONTAL ELLIPSIS} "(…)" as the default.

pitrou · 2013-08-02T17:40:43Z

Bike-shedding here... why "(...)"? Is it common to use round brackets
for this purpose? In English-speaking countries, it is usual to use
square brackets for editorial comments, including ellipsis "[...]".

Ah, really? French uses "[...]" but I thought English-speaking people,
being different, used "(...)". So I'll change it.

Either way, if you wanted to be more Unicode aware, you could save two
characters by using \N{HORIZONTAL ELLIPSIS} "(…)" as the default.

I'd rather stay on the ASCII side of things here.
(also, the ellipsis character doesn't often look nice)

ezio-melotti · 2013-08-05T14:41:28Z

[...] and ASCII are fine with me.

Perhaps there could be an argument controlling where to truncate
(left, right or centre). A good use-case for the new Enums, perhaps? :-)

I wrote a similar function once and in addition to the width it had this feature too (defaulting on "center"), so it sounds like a reasonable addition to me. Back then I was simply passing a "left"/"right"/"center" string -- not sure it's worth adding an enum for this (FWIW for text alignment there are 3 separate methods: ljust, center, and rjust).

ezio-melotti · 2013-08-05T14:48:10Z

Perhaps "shorten" would be a better name -- "summarize" sounds smarter than it actually is :)

bitdancer · 2013-08-05T15:14:26Z

Looking just at the proposed functionality (taking a prefix) and ignoring the requested complexification :), the usual name for the text produced by this process is a "lead" (http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Lead_section), although formally a lead is actually written to be used as such, as opposed to just taking a prefix, so that word really has the same problem as 'summarize'.

I think 'truncate' would be a better name. Or, if you don't mind being wordier, extract_prefix. The fact that it is part of the textwrap module should be enough clue that the truncation happens at whitespace. Truncate could also apply to the expanded version if you squint a little, if Antoine is interested in that. On the other hand, the use case presented for that is not going to be served by this function anyway, since this function (being part of textwrap) breaks on whitespace...it shouldn't (IMO) elide text other than at whitespace. If you want that functionality it belongs in some other module, I think.

The placeholder argument could alternatively be named 'ellipsis', but placeholder is certainly fine.

shorten would probably be better if you are going with the expanded version, but I like truncate. It is probably significant that that is what the title of the issue calls it :)

pitrou · 2013-08-05T18:36:11Z

Looking just at the proposed functionality (taking a prefix) and
ignoring the requested complexification :), the usual name for the
text produced by this process is a
"lead" (http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Lead_section), although formally a lead is actually written to be used as such, as opposed to just taking a prefix, so that word really has the same problem as 'summarize'.

Good point.

The placeholder argument could alternatively be named 'ellipsis', but
placeholder is certainly fine.

I would certainly like ellipsis if it didn't already mean something else
in Python.

shorten would probably be better if you are going with the expanded
version, but I like truncate. It is probably significant that that is
what the title of the issue calls it :)

I'm a bit negative towards truncate(), mostly because I've worked on the
I/O stack and truncate means something much less careful there (e.g.
StringIO.truncate()).
But really, I'm fine with either shorten() or truncate(). I agree
summarize() may try to look a bit too smart.

pitrou · 2013-08-12T19:15:30Z

Updated patch renaming summarize() to shorten(), and adding docs and a fix for a nit reported by Vajrasky.

pitrou · 2013-08-12T20:05:23Z

Updated patch addressing Ezio's comments.

python-dev · 2013-08-12T20:39:19Z

New changeset c27ec198d3d1 by Antoine Pitrou in branch 'default':
Issue bpo-18585: Add :func:`textwrap.shorten` to collapse and truncate a piece of text to a given length.
http://hg.python.org/cpython/rev/c27ec198d3d1

pitrou · 2013-08-12T20:39:48Z

Ok, I've committed after having addressed (most of) RDM's comments.

serhiy-storchaka · 2013-08-12T21:22:11Z

What about a multiline summarize? The textwrap module is designed to work with multiline text.

Let we want wrap 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.' in 40 column and shorten it to three lines:

Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore (...)

For this we need to add two arguments for TextWrapper: max_lines and placeholder. Then shorten() will be just fill() with max_lines=1.

python-dev · 2013-08-16T20:31:21Z

New changeset be5481bf4c57 by Antoine Pitrou in branch 'default':
Fix the default placeholder in textwrap.shorten() to be " [...]".
http://hg.python.org/cpython/rev/be5481bf4c57

pitrou · 2013-08-16T20:31:54Z

(Ezio noticed that I had left the placeholder as " (...)". This is now fixed.)

python-dev · 2013-10-16T10:08:47Z

New changeset 0bd257cd3e88 by Serhiy Storchaka in branch 'default':
Add shorten to __all_ (issues bpo-18585 and bpo-18725).
http://hg.python.org/cpython/rev/0bd257cd3e88

python-dev · 2014-01-03T04:20:49Z

New changeset 536a2cf5f1d2 by Daniel Holth in branch 'default':
Issue bpo-18585: speed zipfile import by only generating zipfile._ZipDecryptor on demand
http://hg.python.org/cpython/rev/536a2cf5f1d2

dholth · 2014-01-03T04:22:27Z

Previous changeset was meant for bpo-18515

pitrou added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jul 29, 2013

pitrou closed this as completed Aug 12, 2013

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a text truncation function #62785

Add a text truncation function #62785

pitrou commented Jul 29, 2013

pitrou commented Jul 29, 2013

pitrou commented Jul 29, 2013

warsaw commented Jul 29, 2013

pitrou commented Jul 29, 2013

vajrasky mannequin commented Jul 29, 2013

vajrasky mannequin commented Jul 29, 2013

pitrou commented Jul 29, 2013

pitrou commented Jul 30, 2013

pitrou commented Jul 30, 2013

vajrasky mannequin commented Jul 31, 2013

stevendaprano commented Aug 2, 2013

stevendaprano commented Aug 2, 2013

pitrou commented Aug 2, 2013

ezio-melotti commented Aug 5, 2013

ezio-melotti commented Aug 5, 2013

bitdancer commented Aug 5, 2013

pitrou commented Aug 5, 2013

pitrou commented Aug 12, 2013

pitrou commented Aug 12, 2013

python-dev mannequin commented Aug 12, 2013

pitrou commented Aug 12, 2013

serhiy-storchaka commented Aug 12, 2013

python-dev mannequin commented Aug 16, 2013

pitrou commented Aug 16, 2013

python-dev mannequin commented Oct 16, 2013

python-dev mannequin commented Jan 3, 2014

dholth mannequin commented Jan 3, 2014

Add a text truncation function #62785

Add a text truncation function #62785

Comments

pitrou commented Jul 29, 2013

pitrou commented Jul 29, 2013

pitrou commented Jul 29, 2013

warsaw commented Jul 29, 2013

pitrou commented Jul 29, 2013

vajrasky mannequin commented Jul 29, 2013

vajrasky mannequin commented Jul 29, 2013

pitrou commented Jul 29, 2013

pitrou commented Jul 30, 2013

pitrou commented Jul 30, 2013

vajrasky mannequin commented Jul 31, 2013

stevendaprano commented Aug 2, 2013

stevendaprano commented Aug 2, 2013

pitrou commented Aug 2, 2013

ezio-melotti commented Aug 5, 2013

ezio-melotti commented Aug 5, 2013

bitdancer commented Aug 5, 2013

pitrou commented Aug 5, 2013

pitrou commented Aug 12, 2013

pitrou commented Aug 12, 2013

python-dev mannequin commented Aug 12, 2013

pitrou commented Aug 12, 2013

serhiy-storchaka commented Aug 12, 2013

python-dev mannequin commented Aug 16, 2013

pitrou commented Aug 16, 2013

python-dev mannequin commented Oct 16, 2013

python-dev mannequin commented Jan 3, 2014

dholth mannequin commented Jan 3, 2014