This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: textwrap should treat Unicode em-dash like ASCII em-dash
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jonathaneunice, methane, r.david.murray
Priority: normal Keywords:

Created on 2017-06-15 19:09 by jonathaneunice, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL Status Linked Edit
PR 2224 open jonathaneunice, 2017-06-15 19:29
Messages (4)
msg296124 - (view) Author: Jonathan Eunice (jonathaneunice) * Date: 2017-06-15 19:09
The textwrap module goes to great lengths to "do the right thing" when it finds the ASCII simulation of an em-dash (two or more consecutive hyphens), but it does nothing to recognize and similarly treat true (Unicode) em-dashes (aka '\N{EM DASH}', '\u2014', or U+2014). Real em-dashes should get at least as good a treatment as simulated em-dashes.
msg296126 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-15 19:35
This seems sensible to me (I haven't looked at the PR, I'm talking about adding the support).  When textwrap was written python was pretty ascii oriented, so it is not too much of a surprise that unicode em dashes were not supported.
msg296127 - (view) Author: Jonathan Eunice (jonathaneunice) * Date: 2017-06-15 20:10
Agreed. It makes great sense that textwrap started as highly ASCII-centric. But in the Python 3, Unicode-friendly era, ASCII-biased isn't where we should leave things.
msg379189 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-10-21 04:31
> Agreed. It makes great sense that textwrap started as highly ASCII-centric. But in the Python 3, Unicode-friendly era, ASCII-biased isn't where we should leave things.

It needs Unicode experts. If we support Unicode, we should implemente UAX #14.
http://www.unicode.org/reports/tr14/tr14-45.html

But I am not sure some core developer love textwrap and Unicode enough to implement it.
It can be implemented in 3rd party package before adding it in stdlib.

Then, is U+2014 really important to implement even though we can not implement UAX#14 in foreseeable future?
It doesn't make sense to me.
History
Date User Action Args
2022-04-11 14:58:47adminsetgithub: 74865
2020-10-21 04:33:09methanesetversions: + Python 3.10, - Python 3.7
2020-10-21 04:31:49methanesetnosy: + methane
messages: + msg379189
2017-06-15 20:10:35jonathaneunicesetmessages: + msg296127
2017-06-15 19:35:30r.david.murraysetnosy: + r.david.murray

messages: + msg296126
stage: patch review
2017-06-15 19:29:45jonathaneunicesetpull_requests: + pull_request2269
2017-06-15 19:09:00jonathaneunicecreate