msg193862 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-07-29 12:49 |
Following patch proposed to add a function named textwrap.summarize():
>>> textwrap.summarize("Hello world!", width=12)
'Hello world!'
>>> textwrap.summarize("Hello world!", width=11)
'Hello (...)'
|
msg193867 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-07-29 13:55 |
Perhaps the "placeholder" argument should actually include the last whitespace, to allow people to omit the whitespace, or use a non-breaking space instead?
>>> textwrap.summarize("Hello world!", width=11, placeholder='...')
'Hello...'
|
msg193873 - (view) |
Author: Barry A. Warsaw (barry) * |
Date: 2013-07-29 15:20 |
On Jul 29, 2013, at 01:55 PM, Antoine Pitrou wrote:
>Perhaps the "placeholder" argument should actually include the last whitespace, to allow people to omit the whitespace, or use a non-breaking space instead?
>
> >>> textwrap.summarize("Hello world!", width=11, placeholder='...')
> 'Hello...'
I guess the placeholder default is ' (...)' then?
|
msg193874 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-07-29 15:34 |
> >Perhaps the "placeholder" argument should actually include the last
> >whitespace, to allow people to omit the whitespace, or use a
> >non-breaking space instead?
> >
> > >>> textwrap.summarize("Hello world!", width=11,
> > >>> placeholder='...')
> > 'Hello...'
>
> I guess the placeholder default is ' (...)' then?
Yeah.
(of course, another default could be used)
|
msg193879 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-07-29 16:09 |
Something is not right if we use more than one space.
>>> textwrap.summarize('hello world!', width=12)
'hello world!'
>>> textwrap.summarize('hello world!', width=11)
'hello (...)'
>>> textwrap.summarize('hello world!', width=10)
'(...)'
I expect the last statement would give result: 'hello (...)' because 'hello' is just 5 characters, less than 10.
|
msg193880 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-07-29 16:33 |
Beside of that, I notice the new lines are deleted silently.
>>> textwrap.summarize('republicans are red,\ndemocrats are blue,\nneither one of them,\ncares about you.', width=46)
'republicans are red, democrats are blue, (...)'
|
msg193898 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-07-29 20:30 |
Vajrasky, thanks. The former is a bug, but the latter is a feature. summarize() re-uses the textwrap machinery to normalize spaces.
|
msg193920 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-07-30 09:34 |
Oops, sorry, I was mistaken. There is no bug actually here:
>>> textwrap.summarize('hello world!', width=10)
'(...)'
'hello (...)' cannot be the right answer since its len() is 11, greater than 10.
|
msg193921 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-07-30 10:05 |
Updated patch to not add any space before the placeholder, with " (...)" as default placeholder value.
|
msg193975 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-07-31 06:11 |
Monsieur Pitrou, thanks for the explanation. Actually, IMHO I prefer, 'hello (...)' should be the minimum words we can use not '(...)' because '(...)' does not make any sense. But, anyway, it's your call. :)
Anyway, using your summarize2.patch:
>>> textwrap.summarize('hello world!', width=6)
'(...)'
>>> textwrap.summarize('hello world!', width=5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ethan/Documents/code/python/cpython/Lib/textwrap.py", line 378, in summarize
return w.summarize(text, placeholder=placeholder)
File "/home/ethan/Documents/code/python/cpython/Lib/textwrap.py", line 314, in summarize
raise ValueError("placeholder too large for max width")
ValueError: placeholder too large for max width
Why? '(...)' is 5 characters only. I checked the patch and found out that the placeholder is ' (...)' (with space) and you compare the width with the placeholder.
|
msg194187 - (view) |
Author: Steven D'Aprano (steven.daprano) * |
Date: 2013-08-02 17:12 |
A function like this often gets called to truncate lots of lines. Unfortunately for many use-cases, the part truncated is the most significant part of the line. E.g.:
Scanning file:
/home/fred/documents/datafil...
/home/fred/documents/datafil...
/home/fred/documents/datafil...
/home/fred/documents/datafil...
It's often better in cases like this to truncate in the middle:
Scanning file:
/home/fred/doc...datafile01.txt
/home/fred/doc...datafile02.txt
/home/fred/doc...datafile03.txt
/home/fred/doc...datafile04.txt
(I believe Mac OS-X routinely truncates long lines in the middle in this way.)
Perhaps there could be an argument controlling where to truncate (left, right or centre). A good use-case for the new Enums, perhaps? :-)
|
msg194188 - (view) |
Author: Steven D'Aprano (steven.daprano) * |
Date: 2013-08-02 17:19 |
Bike-shedding here... why "(...)"? Is it common to use round brackets for this purpose? In English-speaking countries, it is usual to use square brackets for editorial comments, including ellipsis "[...]".
Either way, if you wanted to be more Unicode aware, you could save two characters by using \N{HORIZONTAL ELLIPSIS} "(…)" as the default.
|
msg194189 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-08-02 17:40 |
> Bike-shedding here... why "(...)"? Is it common to use round brackets
> for this purpose? In English-speaking countries, it is usual to use
> square brackets for editorial comments, including ellipsis "[...]".
Ah, really? French uses "[...]" but I thought English-speaking people,
being different, used "(...)". So I'll change it.
> Either way, if you wanted to be more Unicode aware, you could save two
> characters by using \N{HORIZONTAL ELLIPSIS} "(…)" as the default.
I'd rather stay on the ASCII side of things here.
(also, the ellipsis character doesn't often look nice)
|
msg194483 - (view) |
Author: Ezio Melotti (ezio.melotti) * |
Date: 2013-08-05 14:41 |
[...] and ASCII are fine with me.
> Perhaps there could be an argument controlling where to truncate
> (left, right or centre). A good use-case for the new Enums, perhaps? :-)
I wrote a similar function once and in addition to the width it had this feature too (defaulting on "center"), so it sounds like a reasonable addition to me. Back then I was simply passing a "left"/"right"/"center" string -- not sure it's worth adding an enum for this (FWIW for text alignment there are 3 separate methods: ljust, center, and rjust).
|
msg194484 - (view) |
Author: Ezio Melotti (ezio.melotti) * |
Date: 2013-08-05 14:48 |
Perhaps "shorten" would be a better name -- "summarize" sounds smarter than it actually is :)
|
msg194486 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2013-08-05 15:14 |
Looking just at the proposed functionality (taking a prefix) and ignoring the requested complexification :), the usual name for the text produced by this process is a "lead" (http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Lead_section), although formally a lead is actually written to be used as such, as opposed to just taking a prefix, so that word really has the same problem as 'summarize'.
I think 'truncate' would be a better name. Or, if you don't mind being wordier, extract_prefix. The fact that it is part of the textwrap module should be enough clue that the truncation happens at whitespace. Truncate could also apply to the expanded version if you squint a little, if Antoine is interested in that. On the other hand, the use case presented for that is not going to be served by this function anyway, since this function (being part of textwrap) breaks on whitespace...it shouldn't (IMO) elide text other than at whitespace. If you want that functionality it belongs in some other module, I think.
The placeholder argument could alternatively be named 'ellipsis', but placeholder is certainly fine.
shorten would probably be better if you are going with the expanded version, but I like truncate. It is probably significant that that is what the title of the issue calls it :)
|
msg194501 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-08-05 18:36 |
> Looking just at the proposed functionality (taking a prefix) and
> ignoring the requested complexification :), the usual name for the
> text produced by this process is a
> "lead" (http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Lead_section), although formally a lead is actually written to be used as such, as opposed to just taking a prefix, so that word really has the same problem as 'summarize'.
Good point.
> The placeholder argument could alternatively be named 'ellipsis', but
> placeholder is certainly fine.
I would certainly like ellipsis if it didn't already mean something else
in Python.
> shorten would probably be better if you are going with the expanded
> version, but I like truncate. It is probably significant that that is
> what the title of the issue calls it :)
I'm a bit negative towards truncate(), mostly because I've worked on the
I/O stack and truncate means something much less careful there (e.g.
StringIO.truncate()).
But really, I'm fine with either shorten() or truncate(). I agree
summarize() may try to look a bit too smart.
|
msg194987 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-08-12 19:15 |
Updated patch renaming summarize() to shorten(), and adding docs and a fix for a nit reported by Vajrasky.
|
msg195005 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-08-12 20:05 |
Updated patch addressing Ezio's comments.
|
msg195012 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-08-12 20:39 |
New changeset c27ec198d3d1 by Antoine Pitrou in branch 'default':
Issue #18585: Add :func:`textwrap.shorten` to collapse and truncate a piece of text to a given length.
http://hg.python.org/cpython/rev/c27ec198d3d1
|
msg195013 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-08-12 20:39 |
Ok, I've committed after having addressed (most of) RDM's comments.
|
msg195020 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-08-12 21:22 |
What about a multiline summarize? The textwrap module is designed to work with multiline text.
Let we want wrap 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.' in 40 column and shorten it to three lines:
Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore (...)
For this we need to add two arguments for TextWrapper: max_lines and placeholder. Then shorten() will be just fill() with max_lines=1.
|
msg195412 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-08-16 20:31 |
New changeset be5481bf4c57 by Antoine Pitrou in branch 'default':
Fix the default placeholder in textwrap.shorten() to be " [...]".
http://hg.python.org/cpython/rev/be5481bf4c57
|
msg195413 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-08-16 20:31 |
(Ezio noticed that I had left the placeholder as " (...)". This is now fixed.)
|
msg200050 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-10-16 10:08 |
New changeset 0bd257cd3e88 by Serhiy Storchaka in branch 'default':
Add shorten to __all_ (issues #18585 and #18725).
http://hg.python.org/cpython/rev/0bd257cd3e88
|
msg207203 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2014-01-03 04:20 |
New changeset 536a2cf5f1d2 by Daniel Holth in branch 'default':
Issue #18585: speed zipfile import by only generating zipfile._ZipDecryptor on demand
http://hg.python.org/cpython/rev/536a2cf5f1d2
|
msg207204 - (view) |
Author: Daniel Holth (dholth) * |
Date: 2014-01-03 04:22 |
Previous changeset was meant for #18515
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:48 | admin | set | github: 62785 |
2014-01-03 04:22:26 | dholth | set | nosy:
+ dholth messages:
+ msg207204
|
2014-01-03 04:20:48 | python-dev | set | messages:
+ msg207203 |
2013-10-16 10:08:46 | python-dev | set | messages:
+ msg200050 |
2013-08-16 20:31:54 | pitrou | set | messages:
+ msg195413 |
2013-08-16 20:31:21 | python-dev | set | messages:
+ msg195412 |
2013-08-12 21:22:10 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg195020
|
2013-08-12 20:40:22 | pitrou | set | status: open -> closed |
2013-08-12 20:39:48 | pitrou | set | resolution: fixed messages:
+ msg195013 stage: patch review -> resolved |
2013-08-12 20:39:19 | python-dev | set | nosy:
+ python-dev messages:
+ msg195012
|
2013-08-12 20:05:22 | pitrou | set | files:
+ shorten2.patch
messages:
+ msg195005 |
2013-08-12 19:15:30 | pitrou | set | files:
+ shorten.patch
messages:
+ msg194987 |
2013-08-05 18:36:11 | pitrou | set | messages:
+ msg194501 |
2013-08-05 15:14:26 | r.david.murray | set | nosy:
+ r.david.murray messages:
+ msg194486
|
2013-08-05 14:48:10 | ezio.melotti | set | messages:
+ msg194484 |
2013-08-05 14:41:28 | ezio.melotti | set | messages:
+ msg194483 |
2013-08-02 17:40:43 | pitrou | set | messages:
+ msg194189 |
2013-08-02 17:19:04 | steven.daprano | set | messages:
+ msg194188 |
2013-08-02 17:12:04 | steven.daprano | set | nosy:
+ steven.daprano messages:
+ msg194187
|
2013-07-31 06:11:41 | vajrasky | set | messages:
+ msg193975 |
2013-07-30 10:05:44 | pitrou | set | files:
+ summarize2.patch
messages:
+ msg193921 |
2013-07-30 09:34:29 | pitrou | set | messages:
+ msg193920 |
2013-07-29 20:30:21 | pitrou | set | messages:
+ msg193898 |
2013-07-29 16:33:24 | vajrasky | set | messages:
+ msg193880 |
2013-07-29 16:09:02 | vajrasky | set | nosy:
+ vajrasky messages:
+ msg193879
|
2013-07-29 15:36:40 | ezio.melotti | set | nosy:
+ ezio.melotti
|
2013-07-29 15:34:37 | pitrou | set | messages:
+ msg193874 |
2013-07-29 15:20:38 | barry | set | messages:
+ msg193873 |
2013-07-29 13:55:30 | pitrou | set | messages:
+ msg193867 |
2013-07-29 12:49:10 | pitrou | create | |