classification
Title: fileinput handling of unicode errors from standard input
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: SilentGhost, jmb236, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-04-14 14:32 by jmb236, last changed 2016-05-06 09:24 by serhiy.storchaka. This issue is now closed.

Messages (4)
msg263409 - (view) Author: Joel Barry (jmb236) Date: 2016-04-14 14:32
The openhook for fileinput currently will not be called when the input
is from sys.stdin.  However, if the input contains invalid UTF-8
sequences, a program with a hook that specifies errors='replace' will
not behave as expected:

  $ cat x.py
  import fileinput
  import sys
  
  def hook(filename, mode):
      print('hook called')
      return open(filename, mode, errors='replace')
  
  for line in fileinput.input(openhook=hook):
      sys.stdout.write(line)


  $ echo -e "foo\x80bar" >in.txt

  $ python3 x.py in.txt
  hook called
  foo´┐Żbar

Good.  Hook is called, and replacement character is observed.

  $ python3 x.py <in.txt
  Traceback (most recent call last):
    File "x.py", line 8, in <module>
      for line in fileinput.input(openhook=hook):
    File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/fileinput.py", line 263, in __next__
      line = self.readline()
    File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/fileinput.py", line 363, in readline
      self._buffer = self._file.readlines(self._bufsize)
    File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/codecs.py", line 319, in decode
      (result, consumed) = self._buffer_decode(data, self.errors, final)
  UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3: invalid start byte

Hook was not called, and so we get the UnicodeDecodeError.

Should fileinput attempt to apply the hook code to stdin?
msg263421 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2016-04-14 16:30
While documentation seems not entirely clear, the openhook only applies to files.

I'm not sure what is the logic behind the suggested change, what would openhook do in your situation?
msg263422 - (view) Author: Joel Barry (jmb236) Date: 2016-04-14 17:46
I was suggesting that the openhook could somehow be applied to a
*reopening* of sys.stdin.  Something like this:

326c326,329
<                     self._file = sys.stdin
---
>                     if self._openhook:
>                         self._file = self._openhook(self._filename, self._mode)
>                     else:
>                         self._file = sys.stdin

But this won't work because self._filename here is '<stdin>' which
isn't a real filename.  In conjunction with a change to my hook:

   def hook(filename, mode):
       if filename == '<stdin>':
           return io.TextIOWrapper(sys.stdin.buffer, errors='replace')
       return open(filename, mode, errors='replace')

things would work, but this is a bit awkward.

This works for me without changing my hook:

326c326,329
<                     self._file = sys.stdin
---
>                     if self._openhook:
>                         self._file = self._openhook('/dev/stdin', self._mode)
>                     else:
>                         self._file = sys.stdin

but I realize that using /dev/stdin is not portable.

The desired outcome is really just to control Unicode behavior from
stdin, not necessary the ability to provide a generic hook.  Adding an
'errors' keyword to apply to stdin would solve my case, but if you
open up 'errors', someone may also want 'encoding', and the others,
which is why it would be nicer if this could somehow be solved with
the existing openhook interface.
msg263605 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-17 10:55
Calling the openhook for the stdin will break existing code. Third-party openhooks don't special case the '<stdin>' name, which is legitimate file name.

Instead I recommend to patch sys.stdin explicitly in your program.

    sys.stdin = io.TextIOWrapper(sys.stdin.buffer, errors='replace')
    for line in fileinput.input(openhook=hook):
        ...
History
Date User Action Args
2016-05-06 09:24:36serhiy.storchakasetstatus: open -> closed
resolution: not a bug
stage: resolved
2016-04-17 10:55:15serhiy.storchakasetmessages: + msg263605
2016-04-14 17:46:44jmb236setmessages: + msg263422
2016-04-14 16:30:54SilentGhostsetversions: + Python 3.5, Python 3.6, - Python 3.4
nosy: + SilentGhost, serhiy.storchaka

messages: + msg263421

components: + Library (Lib)
2016-04-14 14:32:27jmb236create