This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: add 'rbU' mode to open()
Type: Stage:
Components: Library (Lib) Versions: Python 3.0, Python 2.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, georg.brandl, skip.montanaro, techtonik
Priority: normal Keywords:

Created on 2008-07-15 05:21 by techtonik, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (18)
msg69673 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-15 05:21
'rU' universal newline support is useless, because read lines end with
'\n' regardless of actual line end in the source file. Applications that
care about line ends still open file in binary mode and gather the stats
manually. 

So, to make this mode useful - the 'rbU' should be addded. Otherwise it
doesn't worth complication both in C code and in documentation.
msg69679 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-07-15 11:47
The whole idea of universal newline mode is that the various possible
line endings ('\r', '\n' and '\r\n') are all mapped to '\n' precisely
so the user doesn't have to detect and fiddle with them.  Using 'b' and
'U' together makes no sense.

* If you really want to see the line endings use 'rb'.
* If you don't care about the line endings regardless of source, use 'rU'.
* Otherwise use 'r'.
msg69709 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-15 19:05
If you open file with 'r' - all line endings will be mapped precisely to
'\n' anyways, so it has nothing to do with 'U' mode.
msg69742 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-07-16 01:39
> If you open file with 'r' - all line endings will be mapped precisely to
> '\n' anyways, so it has nothing to do with 'U' mode.

No they won't -- only the platform-specific newline will. On Unix, 'r'
and 'rb' are the same.
msg69764 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-16 05:08
That's weird and the worst is that it is not documented. Manual says:

"If Python is built without universal newline support a mode with 'U' is
the same as normal text mode." 

but no information about what is "normal text mode" behaviour.

The way Python works that you describe is weird, but true. If developer
uses Windows platform - Unix and Windows files will be handled in the
same way, but not files from Mac platform. The worst that developer
can't know this, because he is unlikely to have any Mac files to test.

This behavior is like a long standing mine to collate Windows and Mac
Python users. Why not to fix it?
msg69845 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-07-16 22:05
This behavior is inherited from the C-level fopen() and therefore
"normal text mode" is whatever that defines.

Is this really nowhere documented?
msg69862 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-07-17 01:01
anatoly> If you open file with 'r' - all line endings will be mapped
    anatoly> precisely to '\n' anyways, so it has nothing to do with 'U'
    anatoly> mode.

Before 3.0 at least, if you copy a text file from, say, Windows to Mac, and
open it with 'r', you get lines which end in '\r\n'.  Here's a simple
example:

    >>> open("dos.txt", "rb").read()
    'a single line\r\nanother line\r\n'
    >>> f = open("dos.txt")
    >>> f.next()
    'a single line\r\n'
    >>> f = open("dos.txt", "r")
    >>> f.next()
    'a single line\r\n'
    >>> f.next()
    'another line\r\n'

If, on the other hand, you open it with 'rU', the '\r\n' literal line ending
is converted, even though CRLF is not the canonical Mac line ending:

    >>> f = open("dos.txt", "rU")
    >>> f.next()
    'a single line\n'
    >>> f.next()
    'another line\n'

Skip
msg69876 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-17 06:46
> This behavior is inherited from the C-level fopen() and therefore
> "normal text mode" is whatever that defines.

> Is this really nowhere documented?

Relation to fopen() function may be documented, but there is no
explanation of what "normal text mode" is. Is it really pythonic that a
script writer without former experience with C, stdio and fopen should
be aware of inherited fopen "behavior" when programming Python?
msg70030 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-07-19 13:50
At least the 2.6 docs say

"The default is to use text mode, which may convert ``'\n'`` characters
to a platform-specific representation on writing and back on reading."
msg70068 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-20 09:09
That's fine with me. I just need a 'rbU' mode to know in which format
should I write the output file if I want to preserve proper line endings
regardless of platform.

As for Python 2.6 note - I would replace "may convert" with "converts".
msg70069 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-07-20 10:03
If you want to write your own line endings, read with "rU" and write
with "rb".
msg70098 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-21 06:12
If lineends are mixed I would like to leave them as is.
msg70130 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-07-22 01:10
Did you look at the io.open() function?
It's a new module in python2.6, but also the builtin "open" in py3k!

"""
    * On input, if newline is None, universal newlines mode is
      enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
      these are translated into '\n' before being returned to the
      caller. If it is '', universal newline mode is enabled, but line
      endings are returned to the caller untranslated. If it has any of
      the other legal values, input lines are only terminated by the given
      string, and the line ending is returned to the caller untranslated.
"""

I suggest to try
    io.open(filename, newline="")
msg70180 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-07-23 18:14
As I indicated in msg69679 if you want to see the line endings just open
the file in binary mode ('rb').
msg70202 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-24 12:39
Thanks for the hints. It appeared that "universal text mode" is not for
crossplatform but for platform-specific programming. =)

So I gave it up and ended with my own 'rb' newlines counter and 'wb'
writer which inserts lines in required format.

As for 2.6 io.open()
http://docs.python.org/dev/library/io.html#module-io
- can anybody point what's the difference between text mode with
newlines='' and binary mode?
- the comment about newline=<string>
"If it is '', universal newline mode is enabled, but line endings are
returned to the caller untranslated. If it has any of the other legal
values, input lines are only terminated by the given string, and the
line ending is returned to the caller untranslated."
does it mean that if newline='\r\n' is specified all single '\n'
characters are returned inline?
msg70204 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-07-24 14:20
> does it mean that if newline='\r\n' is specified all single '\n'
> characters are returned inline?
Yes.

Let's take a file with mixed newlines:
>>> io.open("c:/temp/t", "rb").read()
'a\rb\r\nc\nd\n'

rb mode splits only on '\r\n' (I'm on Windows)
>>> io.open("c:/temp/t", "rb").readlines()
['a\rb\r\n', 'c\n', 'd\n']

rU mode splits on every newline, and converts everything to \n
>>> io.open("c:/temp/t", "rU").readlines()
[u'a\n', u'b\n', u'c\n', u'd\n']

newline='' splits like rU, but does not translate newlines:
>>> io.open("c:/temp/t", newline='').readlines()
[u'a\r', u'b\r\n', u'c\n', u'd\n']

newline='\r\n' only splits on the specified string:
>>> io.open("c:/temp/t", newline='\r\n').readlines()
[u'a\rb\r\n', u'c\nd\n']
msg70218 - (view) Author: anatoly techtonik (techtonik) Date: 2008-07-24 17:32
This '\r' makes things worse. I am also on Windows and didn't thought
that "rb" processes '\r\n' linefeeds as a side-effect of '\n' being the
last character. Thanks.

newline='' is just what I need. I guess there is no alternative to it in
2.5 series except splitting lines returned from binary read manually.
What about file.newlines attribute - is it preserved in 2.6/Py3k?

BTW, it would be nice to have this example in manual.
msg70219 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-07-24 18:51
Please read
http://docs.python.org/dev/library/io.html#io.TextIOBase.newlines
History
Date User Action Args
2022-04-11 14:56:36adminsetgithub: 47609
2008-07-24 18:51:45amaury.forgeotdarcsetmessages: + msg70219
2008-07-24 17:32:57techtoniksetmessages: + msg70218
2008-07-24 14:20:18amaury.forgeotdarcsetmessages: + msg70204
2008-07-24 12:39:16techtoniksetmessages: + msg70202
2008-07-23 18:14:19skip.montanarosetmessages: + msg70180
2008-07-22 01:10:40amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg70130
2008-07-21 06:13:00techtoniksetmessages: + msg70098
2008-07-20 10:03:09georg.brandlsetmessages: + msg70069
2008-07-20 09:09:51techtoniksetmessages: + msg70068
2008-07-19 13:50:12georg.brandlsetmessages: + msg70030
2008-07-17 06:46:41techtoniksetmessages: + msg69876
2008-07-17 01:01:36skip.montanarosetmessages: + msg69862
2008-07-16 22:05:15georg.brandlsetmessages: + msg69845
2008-07-16 05:08:16techtoniksetmessages: + msg69764
2008-07-16 01:39:10georg.brandlsetnosy: + georg.brandl
messages: + msg69742
2008-07-15 19:05:40techtoniksetmessages: + msg69709
2008-07-15 11:47:47skip.montanarosetstatus: open -> closed
resolution: not a bug
messages: + msg69679
nosy: + skip.montanaro
2008-07-15 05:21:48techtonikcreate