This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: shorten function of textwrap module is susceptible to non-normalized whitespaces
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, pitrou, vajrasky
Priority: normal Keywords: patch

Created on 2013-08-13 06:26 by vajrasky, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
fix_for_non_normalized_whitespaces_in_placeholder.patch vajrasky, 2013-08-13 06:26 review
fix_for_non_normalized_whitespaces_in_placeholder_v2.patch vajrasky, 2013-08-13 11:05 review
fix_for_non_normalized_whitespaces_in_placeholder_v3.patch vajrasky, 2013-08-13 13:22 review
fix_for_non_normalized_whitespaces_in_placeholder_v4.patch vajrasky, 2013-08-13 15:10 review
fix_for_non_normalized_whitespaces_in_placeholder_v5.patch vajrasky, 2013-08-13 15:48 review
Messages (11)
msg195045 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-13 06:26
In shorten function of textwrap module, the placeholder becomes a hole where we can inject non-normalized whitespaces to the text.

>>> text = "Hello there, how are you this fine day? I'm glad to hear it!"
>>> from textwrap import shorten
>>> shorten(text, 40, placeholder="              ")
'Hello there, how are you'

We normalize the only-whitespaces placeholder.

But....

>>> shorten(text, 17, placeholder="  \n\t(...)  \n\t[...]  \n\t")
'(...)  \n\t[...]'
>>> shorten(text, 40, placeholder="  \n\t(...)  \n\t[...]  \n\t")
'Hello there, how  \n\t(...)  \n\t[...]'

Attached the patch to normalize the non-normalized whitespaces in placeholder.
msg195046 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-13 06:44
I'm not convinced this is a bug. The whitespace right-stripping is more of an implementation detail. You can really put what you want inside the placeholder.
msg195047 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-13 06:54
Okay, nevermind about non-normalized whitespaces in placeholder, but what about this case?

>>> text = "Hello there, how are you this fine day? I'm glad to hear it!"
>>> from textwrap import shorten
>>> shorten(text, 10, placeholder="     ")
'Hello'
>>> shorten(text, 9, placeholder="     ")
''
>>> len('Hello')
5

Isn't that weird?
msg195059 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-13 09:23
> Isn't that weird?

Agreed, this one is a bug. The stripping in shorten() should be smarter, i.e. it should not affect the placeholder's own spaces.
Do you want to write a patch?
msg195060 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-13 09:24
Correcting myself:

> i.e. it should not affect the placeholder's own spaces.

... except for leading whitespace in case the placeholder ends up alone in the result.
i.e. shorten("somethingtoolong") should return "(...)", not " (...)".
msg195067 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-13 11:05
> Do you want to write a patch?
My pleasure.

Attached the second version of the patch to accomodate Pitrou's request.
msg195073 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-13 13:19
Attached more refined patch. Removed unnecessary test. Added more robust test. Added shorten in __all__.
msg195079 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-13 15:10
I missed this case:

>>> from textwrap import shorten
>>> shorten('hell', 4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py", line 386, in shorten
    return w.shorten(text, placeholder=placeholder)
  File "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py", line 322, in shorten
    raise ValueError("placeholder too large for max width")
ValueError: placeholder too large for max width

Also, in this patch, I removed the unnecessary stripping of the text part.
msg195080 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-13 15:34
> I missed this case:
> 
> >>> from textwrap import shorten
> >>> shorten('hell', 4)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File
>   "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py",
>   line 386, in shorten
>     return w.shorten(text, placeholder=placeholder)
>   File
>   "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py",
>   line 322, in shorten
>     raise ValueError("placeholder too large for max width")
> ValueError: placeholder too large for max width

This is by design. Passing a placeholder larger than the width is a
programming error, regardless of whether the text is small enough.
msg195081 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-13 15:48
Okay, attached the fifth version not to care about the case where text is smaller than the placeholder.
msg200038 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-10-16 07:38
Serhiy's commit http://hg.python.org/cpython/rev/2e8c424dc638 fixed this issue already. So I closed this ticket.
History
Date User Action Args
2022-04-11 14:57:49adminsetgithub: 62923
2013-10-16 07:38:56vajraskysetstatus: open -> closed
resolution: fixed
messages: + msg200038
2013-08-13 15:48:41vajraskysetfiles: + fix_for_non_normalized_whitespaces_in_placeholder_v5.patch

messages: + msg195081
2013-08-13 15:34:15pitrousetmessages: + msg195080
title: shorten function of textwrap module is susceptible to non-normalized whitespaces -> shorten function of textwrap module is susceptible to non-normalized whitespaces
2013-08-13 15:10:07vajraskysetfiles: + fix_for_non_normalized_whitespaces_in_placeholder_v4.patch

messages: + msg195079
2013-08-13 13:22:45vajraskysetfiles: + fix_for_non_normalized_whitespaces_in_placeholder_v3.patch
2013-08-13 13:21:33vajraskysetfiles: - fix_for_non_normalized_whitespaces_in_placeholder_v3.patch
2013-08-13 13:19:47vajraskysetfiles: + fix_for_non_normalized_whitespaces_in_placeholder_v3.patch

messages: + msg195073
2013-08-13 11:05:29vajraskysetfiles: + fix_for_non_normalized_whitespaces_in_placeholder_v2.patch

messages: + msg195067
2013-08-13 09:24:07pitrousetmessages: + msg195060
2013-08-13 09:23:09pitrousetmessages: + msg195059
2013-08-13 09:12:42ezio.melottisetnosy: + ezio.melotti

stage: patch review
2013-08-13 06:54:54vajraskysetmessages: + msg195047
2013-08-13 06:44:06pitrousetmessages: + msg195046
2013-08-13 06:26:52vajraskycreate