Message356224
I ran into a line count mismatch bug and I narrowed it down to 9 lines where the line break handling is causing an issue. Please find the attachment named line_break_err.txt to reproduce the below.
$ md5sum line_break_err.txt
5dea501b8e299a0ece94d85977728545 line_break_err.txt
# wc says there are 9 lines
$ wc -l line_break_err.txt
9 line_break_err.txt
# if I read from sys.stdin, I get 9 lines
$ python -c 'import sys; print(sum(1 for x in sys.stdin))' < line_break_err.txt
# but... if I use a open() call, i get 18
$ python -c 'import sys; print("Linecount=", sum(1 for x in open(sys.argv[1])))' line_break_err.txt
Linecount= 18
# changing encoding or error handling has no effect
$ python -c 'import sys; print("Linecount=", sum(1 for x in open(sys.argv[1], "r", encoding="utf-8", errors="replace")))' line_break_err.txt
Linecount= 18
$ python -c 'import sys; print("Linecount=", sum(1 for x in open(sys.argv[1], "r", encoding="utf-8", errors="ignore")))' line_break_err.txt
Linecount= 18
# but, not just wc, even awk says there are only 9 lines
$ awk 'END {print "Linecount=", NR}' line_break_err.txt
Linecount= 9
# let's see python 2 using io
# python2 -c 'import sys,io; print("Linecount=", sum(1 for x in io.open(sys.argv[1], encoding="ascii", errors="ignore")))' line_break_err.txt
('Linecount=', 18)
# But this one which we no longer use somehow gets it right
$ python2 -c 'import sys; print("Linecount=", sum(1 for x in open(sys.argv[1])))' line_break_err.txt
('Linecount=', 9)
Tested it on
1. Linux
Python 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21)
[GCC 7.3.0] :: Anaconda, Inc. on linux
2. OSX
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
3. python 2 on OSX
Python 2.7.16 (default, Jun 19 2019, 07:40:37)
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.4)] on darwin
----
P.S.
this is my first issue created. If this issue is a duplicate, I am happy to close it. |
|
Date |
User |
Action |
Args |
2019-11-08 06:01:02 | thammegowda | set | recipients:
+ thammegowda, vstinner, ezio.melotti |
2019-11-08 06:01:02 | thammegowda | set | messageid: <1573192862.15.0.176901555432.issue38740@roundup.psfhosted.org> |
2019-11-08 06:01:02 | thammegowda | link | issue38740 messages |
2019-11-08 06:01:01 | thammegowda | create | |
|