This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: socket.makefile(mode = 'r').readline() silently removes carriage return
Type: behavior Stage: resolved
Components: IO, Library (Lib), Unicode Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, kaizhu, pitrou, r.david.murray
Priority: normal Keywords: patch

Created on 2010-10-07 08:12 by kaizhu, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
socket.makefile.newline.kwarg.patch kaizhu, 2010-10-08 04:07 socket.makefile invokes io.TextIOWrapper w/ 'newline' arg in incorrect position
makefile.errors.newline.patch kaizhu, 2010-10-12 01:46
socket.makefile.with.unittest.patch kaizhu, 2010-10-12 03:50
socket.makefile.with.unittest.v2.patch kaizhu, 2010-10-12 04:50
socket.makefile.with.unittest.v3.patch kaizhu, 2010-10-12 23:53
Messages (13)
msg118095 - (view) Author: kai zhu (kaizhu) Date: 2010-10-07 08:12
i'm working on an independent py2to3 utility which directly imports py2x modules, by reverse compiling ast trees (code.google.com/p/asciiporn/source/browse/stable.py)

while forward porting the python2x redis client, this issue came up.
i kno its bad to use strings in sockets, but it seems webapps use it exploiting the fact utf8 is becoming a defacto web 'binary' standard



$ python3.1 echo.py
connected <socket.socket object, fd=4, family=2, type=1, proto=0> ('127.0.0.1', 41115)

$ python3.1 client.py 
b'hello\r\n' recv()
b'hello\r\n' makefile(mode = "rb")
'hello\n' makefile(mode = "r")


## echo.py - echo server program
import socket
serv = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
  serv.bind(('localhost', 12345))
  serv.listen(1)
  while True:
    conn, addr = serv.accept()
    print( 'connected', conn, addr )
    while True:
      data = conn.recv(4096)
      if not data:
        conn.close()
        break
      conn.send(data)
finally:
  serv.close()



## client.py - client program
data = b'hello\r\n'
import socket
clie = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
  clie.connect(('localhost', 12345))

  clie.send(data)
  data = clie.recv(4096)
  print(repr(data), 'recv()')

  clie.send(data)
  file = clie.makefile('rb')
  data = file.readline()
  print(repr(data), 'makefile(mode = "rb")')

  clie.send(data)
  file = clie.makefile('r')
  data = file.readline()
  print(repr(data), 'makefile(mode = "r")') ## '\r' is silently dropped
finally:
  clie.close()
msg118107 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-10-07 12:35
Isn't this just the normal universal newline handling? When you open it in binary mode you see all of the characters, but in text mode (the absence of "b") you get normalized newlines (that is, they're converted to "\n").
msg118149 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-07 22:34
As Eric said. Please see the socket.makefile doc:

“Return a file object associated with the socket. The exact returned type depends on the arguments given to makefile(). These arguments are interpreted the same way as by the built-in open() function.”

(http://docs.python.org/py3k/library/socket.html#socket.socket.makefile)

And in turn, the built-in open() function des cribes the `newline` parameters, which sets whether newline characters are translated.
msg118161 - (view) Author: kai zhu (kaizhu) Date: 2010-10-08 03:24
my bad for not rtfm, but it seems the newline argument has no effect in socket.makefile.

the TextIOWrapper signatures don't seem to match.  a hack to put newline parameter in 4th position or making it a keyword arg doesn't work either (scratch my head...)

socket.py source <line 162>
        text = io.TextIOWrapper(buffer, encoding, newline)

textio.c <line 807>
static int
textiowrapper_init(textio *self, PyObject *args, PyObject *kwds)
{
    char *kwlist[] = {"buffer", "encoding", "errors",
                      "newline", "line_buffering",
                      NULL};




$ python3 echo.py ## from previous example

$ python3 client.py 
b'hello\r\n' recv()
b'hello\r\n' makefile(mode = "rb")
'hello\n' makefile(mode = "r", newline = "")



# echo client program
data = b'hello\r\n'
import socket
clie = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
  clie.connect(('localhost', 12345))

  clie.send(data)
  data = clie.recv(4096)
  print(repr(data), 'recv()')

  clie.send(data)
  file = clie.makefile('rb')
  data = file.readline()
  print(repr(data), 'makefile(mode = "rb")')

  clie.send(data)
  file = clie.makefile('r', newline = '')
  data = file.readline()
  print(repr(data), 'makefile(mode = "r", newline = "")') ## '\r' is still silently dropped
finally:
  clie.close()
msg118162 - (view) Author: kai zhu (kaizhu) Date: 2010-10-08 03:31
my bad again, hacking newline parameter to the correct argument position works (its in the position where error should b).

a one line patch would be:

socket.py <line 163>
-        text = io.TextIOWrapper(buffer, encoding, newline)
+        text = io.TextIOWrapper(buffer, encoding, newline = newline)
msg118172 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-08 09:55
Ah, sorry for closing, then. You are right, this is a genuine issue.
msg118398 - (view) Author: kai zhu (kaizhu) Date: 2010-10-12 01:46
np antoine :)

this 2 line patch will match socket.makefile() signature with open().
any chance it can b committed b4 python3.2 beta?

i rely on this patch in order to forward-port redis to python3
msg118399 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-12 02:30
Kai: could you write a unit test for this?
msg118401 - (view) Author: kai zhu (kaizhu) Date: 2010-10-12 03:50
added unittest to patch
tested test.test_socket on debian colinux running under winxp



i get 2 unrelated errors (in both patched and unpatched version) from testRDM and testStream about socket.AF_TIPC being unsupported:



public@colinux 3 ~/build/py3k.patch: ./python -m unittest test.test_socket
..................................................................s..........................................EE....................
======================================================================
ERROR: testRDM (test.test_socket.TIPCTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/public/build/py3k.patch/Lib/test/test_socket.py", line 1683, in testRDM
    srv = socket.socket(socket.AF_TIPC, socket.SOCK_RDM)
  File "/home/public/build/py3k.patch/Lib/socket.py", line 94, in __init__
    _socket.socket.__init__(self, family, type, proto, fileno)
socket.error: [Errno 97] Address family not supported by protocol

======================================================================
ERROR: testStream (test.test_socket.TIPCThreadableTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/public/build/py3k.patch/Lib/test/test_socket.py", line 131, in _setUp
    self.__setUp()
  File "/home/public/build/py3k.patch/Lib/test/test_socket.py", line 1707, in setUp
    self.srv = socket.socket(socket.AF_TIPC, socket.SOCK_STREAM)
  File "/home/public/build/py3k.patch/Lib/socket.py", line 94, in __init__
    _socket.socket.__init__(self, family, type, proto, fileno)
socket.error: [Errno 97] Address family not supported by protocol

----------------------------------------------------------------------
Ran 131 tests in 15.960s

FAILED (errors=2, skipped=1)
msg118402 - (view) Author: kai zhu (kaizhu) Date: 2010-10-12 04:50
updated unittest patch
msg118466 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-12 20:22
The proposed test case doesn't test a lot, IMHO. It would be better if it sent binary from one end and received unicode on the other end, or vice-versa (with explicit encoding and errors, preferably).
msg118488 - (view) Author: kai zhu (kaizhu) Date: 2010-10-12 23:53
done ;p
added separate unicode read, write, and readwrite test cases (which all pass)

keep in mind my issue is specific to truncation of carriage return (imo a  priority for py3k migration).  i think this can b resolved by python 3.2

more general unicode issues should b raised in separate bug reports.
msg118537 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-13 16:26
Thank you very much! I've committed the patch in r85420 (3.2) and r85421 (3.1).
History
Date User Action Args
2022-04-11 14:57:07adminsetgithub: 54250
2010-10-13 16:26:33pitrousetstatus: open -> closed
resolution: fixed
messages: + msg118537

stage: resolved
2010-10-12 23:53:21kaizhusetfiles: + socket.makefile.with.unittest.v3.patch

messages: + msg118488
2010-10-12 20:22:20pitrousetmessages: + msg118466
2010-10-12 04:50:27kaizhusetfiles: + socket.makefile.with.unittest.v2.patch

messages: + msg118402
2010-10-12 03:50:19kaizhusetfiles: + socket.makefile.with.unittest.patch

messages: + msg118401
2010-10-12 02:30:07r.david.murraysetnosy: + r.david.murray
messages: + msg118399
2010-10-12 01:46:17kaizhusetfiles: + makefile.errors.newline.patch

messages: + msg118398
2010-10-08 09:55:02pitrousetmessages: + msg118172
2010-10-08 04:07:22kaizhusetfiles: + socket.makefile.newline.kwarg.patch
keywords: + patch
2010-10-08 03:31:23kaizhusetmessages: + msg118162
2010-10-08 03:24:46kaizhusetstatus: closed -> open
resolution: not a bug -> (no value)
messages: + msg118161
2010-10-07 22:34:29pitrousetstatus: open -> closed

nosy: + pitrou
messages: + msg118149

resolution: not a bug
2010-10-07 12:35:32eric.smithsetnosy: + eric.smith
messages: + msg118107
components: - 2to3 (2.x to 3.x conversion tool)
2010-10-07 08:12:58kaizhucreate