classification
Title: FileInput does not allow 'rt' mode, but all its existing delegates do
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, natgaertner, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2019-05-09 13:22 by natgaertner, last changed 2019-06-06 16:49 by natgaertner.

Pull Requests
URL Status Linked Edit
PR 13221 open natgaertner, 2019-05-09 17:23
Messages (3)
msg341978 - (view) Author: Nathaniel Gaertner (natgaertner) * Date: 2019-05-09 13:22
While looking at https://bugs.python.org/issue5758 I noticed that 'rt' support had been added to gzip and bz2 in 3.3, but FileInput still limited its mode options to 'r', 'rb', 'rU', and 'U'. It seems like 'rt' should be just fine here as well, and would help clarify issue 5758, since 'r' defaults to 'rt' for open(...) but defaults to 'rb' for gzip and bz2.

I wrote up the code and modified the mode unit test to try every allowed mode. I'll attach a PR to this once I have it ready.
msg342255 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2019-05-12 11:40
Thank you for the report and the PR!

I think accepting 'rt' mode is a good idea. However, it's a new feature and it can only go into 3.8.

It seems to me that the root cause of the issue is that the fileinput module wasn't properly converted to support Python 3. Perhaps we could change 'r' to 'rt' in FileInput.__init__() or hook_compressed() to make it work properly, so the example in issue 5758 would work as expected without changing any user code:

# test.py

import fileinput

for line in fileinput.FileInput(openhook=fileinput.hook_compressed):
    print(line.rstrip())


$ ./python.exe test.py mike.txt mike.txt.gz
Hello from Mike.
This is the second line.
Why did the robot cross the road?
Hello from Mike.
This is the second line.
Why did the robot cross the road?
msg344838 - (view) Author: Nathaniel Gaertner (natgaertner) * Date: 2019-06-06 16:49
Hey sorry for the delay in responding.

My thought with forcing 'rt' mode is that it would actually reduce the flexibility of the FileInput class. For 5758, I suspect the issue arose out of a confusion about what strings meant in python 2 vs 3. If I understand correctly, a "string" in 2 is actually an array of binary data, displayed as if it were ASCII encoded text. So when it prints the binary data from the gzip file in the example given on that issue, it's happy to say "aha this is ASCII encoded text, let's print it like a string." This leads to the case where 2 "works" (does not mark the printed data from gzip explicitly as binary).

But in 3 strings and binary arrays are totally different kinds of objects! I am unfamiliar with the history of introducing 'rt', but I'm guessing it has to do with disambiguating 'r', since text is now stored in its own unique object type and goes through an explicit encoding process to get there. With the explicit 'rt' and 'rb' modes, 'r' becomes explicitly ambiguous (an oxymoron I know), so if a user provides 'r' they are expressing no preference between text and binary. If they have a preference 'rt' and 'rb' give them the ability to express it.

I may be totally on the wrong track here, or missing some important backward compatibility issues, but that's my thoughts. Thanks!
History
Date User Action Args
2019-06-06 16:49:27natgaertnersetmessages: + msg344838
2019-05-12 11:40:00berker.peksagsetnosy: + serhiy.storchaka
messages: + msg342255
2019-05-12 10:31:00SilentGhostsetnosy: + berker.peksag

versions: - Python 3.5, Python 3.6, Python 3.7
2019-05-09 17:23:38natgaertnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request13131
2019-05-09 13:22:23natgaertnercreate