classification
Title: textwrap.dedent doesn't work properly with strings containing CRLF
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.3, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: alexis.d, emilyemorehouse, georg.brandl, ncoghlan, pitrou, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2013-11-02 14:50 by alexis.d, last changed 2017-06-27 05:22 by serhiy.storchaka.

Files
File name Uploaded Description Edit
dedent.patch alexis.d, 2013-11-06 22:43 review
Messages (5)
msg201975 - (view) Author: Alexis Daboville (alexis.d) * Date: 2013-11-02 14:50
If a string contains an empty line and is using CRLF newlines instead of LF newlines textwrap.dedent doesn't work properly: it returns the original string w/o dedenting it.

As far as I can tell it's because it considers the empty string to be the longest common indent (http://hg.python.org/cpython/file/2.7/Lib/textwrap.py#l372, '[^ \t\n]' matches '\r').

Expected behavior: textwrap.dedent should work the same way whether lines are separated by a single LF character or by CRLF.

To repro:

 ✓ 15:26 dabovill @ morag in /tmp/dedent $ cat dedent.py
import textwrap

lf = '\ta\n\tb\n\n\tc'
crlf = '\ta\r\n\tb\r\n\r\n\tc'

print('- lf')
print(lf)
print('- dedent(lf)')
print(textwrap.dedent(lf))
print('- crlf')
print(crlf)
print('- dedent(crlf)')
print(textwrap.dedent(crlf))
 ✓ 15:26 dabovill @ morag in /tmp/dedent $ python2.7 dedent.py
- lf
        a
        b

        c
- dedent(lf)
a
b

c
- crlf
        a
        b

        c
- dedent(crlf)
        a
        b

        c
 ✓ 15:26 dabovill @ morag in /tmp/dedent $ python3.3 dedent.py
- lf
        a
        b

        c
- dedent(lf)
a
b

c
- crlf
        a
        b

        c
- dedent(crlf)
        a
        b

        c
msg202294 - (view) Author: Alexis Daboville (alexis.d) * Date: 2013-11-06 22:43
Added patch.
msg296984 - (view) Author: Emily Morehouse (emilyemorehouse) * (Python committer) Date: 2017-06-27 01:30
@georg.brandl and @terry.reedy, this issue was mentioned again recently (http://bugs.python.org/issue30754). 

Would you like to revisit it?
msg297001 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-06-27 05:03
Emily, do you have any opinion on the issue?  The complaint seems plausible, but I have not looked at the docs, nor the code to understand the import of '[^ \t\n]' (re for 'anything but space, tab, newline') matches '\r'.

Alexis, you must sign the PSF contributor license agreement,
https://www.python.org/psf/contrib/
before we can use your patch.
msg297004 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-06-27 05:22
I'm not sure that textwrap.dedent() should support the CRLF line separator. Usually the conversion between different line separators (CRLF, CR, LF) used in in external files and LF used internally is done at I/O level.

In any case this looks to me like rather a new feature than a bug fix.
History
Date User Action Args
2017-06-27 05:22:32serhiy.storchakasetnosy: + pitrou, serhiy.storchaka, ncoghlan
messages: + msg297004
2017-06-27 05:03:17terry.reedysetmessages: + msg297001
2017-06-27 01:30:46emilyemorehousesetnosy: + emilyemorehouse
messages: + msg296984
2013-11-08 23:35:33terry.reedysetnosy: + georg.brandl, terry.reedy
2013-11-06 22:43:42alexis.dsetfiles: + dedent.patch
keywords: + patch
messages: + msg202294
2013-11-02 14:50:18alexis.dcreate