Message31804
This bug is similar but not exactly the same as bug215974. (http://sourceforge.net/tracker/?group_id=5470&atid=105470&aid=215974&func=detail)
In my test, even multiple write() within an open()~close() lifespan will not cause the multi BOM phenomena mentioned in bug215974. Maybe it is because bug 215974 was somehow fixed during the past 7 years, although Lemburg classified it as WontFix.
However, if a file is appended for more than once, by an "codecs.open('file.txt', 'a', 'utf16')", the multi BOM appears.
At the same time, the saying of "(Extra unnecessary) BOM marks are removed from the input stream by the Python UTF-16 codec" in bug215974 is not true even in today, on Python2.4.4 and Python2.5.1c1 on Windows XP.
Iceberg
------------------
PS: Did not find the "File Upload" checkbox mentioned in this web page, so I think I'd better paste the code right here...
import codecs, os
filename = "test.utf-16"
if os.path.exists(filename): os.unlink(filename) # reset
def myOpen():
return codecs.open(filename, "a", 'UTF-16')
def readThemBack():
return list( codecs.open(filename, "r", 'UTF-16') )
def clumsyPatch(raw): # you can read it after your first run of this program
for line in raw:
if line[0] in (u'\ufffe', u'\ufeff'): # get rid of the BOMs
yield line[1:]
else:
yield line
fout = myOpen()
fout.write(u"ab\n") # to simplify the problem, I only use ASCII chars here
fout.write(u"cd\n")
fout.close()
print readThemBack()
assert readThemBack() == [ u'ab\n', u'cd\n' ]
assert os.stat(filename).st_size == 14 # Only one BOM in the file
fout = myOpen()
fout.write(u"ef\n")
fout.write(u"gh\n")
fout.close()
print readThemBack()
#print list( clumsyPatch( readThemBack() ) ) # later you can enable this fix
assert readThemBack() == [ u'ab\n', u'cd\n', u'ef\n', u'gh\n' ] # fails here
assert os.stat(filename).st_size == 26 # not to mention here: multi BOM appears
|
|
| Date |
User |
Action |
Args |
| 2007-08-23 14:53:09 | admin | link | issue1701389 messages |
| 2007-08-23 14:53:09 | admin | create | |
|