Author iceberg4ever
Recipients
Date 2007-04-16.10:05:22
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
This bug is similar but not exactly the same as bug215974.  (http://sourceforge.net/tracker/?group_id=5470&atid=105470&aid=215974&func=detail)

In my test, even multiple write() within an open()~close() lifespan will not cause the multi BOM phenomena mentioned in bug215974. Maybe it is because bug 215974 was somehow fixed during the past 7 years, although Lemburg classified it as WontFix. 

However, if a file is appended for more than once, by an "codecs.open('file.txt', 'a', 'utf16')", the multi BOM appears.

At the same time, the saying of "(Extra unnecessary) BOM marks are removed from the input stream by the Python UTF-16 codec" in bug215974 is not true even in today, on Python2.4.4 and Python2.5.1c1 on Windows XP.

Iceberg
------------------

PS: Did not find the "File Upload" checkbox mentioned in this web page, so I think I'd better paste the code right here...

import codecs, os

filename = "test.utf-16"
if os.path.exists(filename): os.unlink(filename)  # reset

def myOpen():
  return codecs.open(filename, "a", 'UTF-16')
def readThemBack():
  return list( codecs.open(filename, "r", 'UTF-16') )
def clumsyPatch(raw): # you can read it after your first run of this program
  for line in raw:
    if line[0] in (u'\ufffe', u'\ufeff'): # get rid of the BOMs
      yield line[1:]
    else:
      yield line

fout = myOpen()
fout.write(u"ab\n") # to simplify the problem, I only use ASCII chars here
fout.write(u"cd\n")
fout.close()
print readThemBack()
assert readThemBack() == [ u'ab\n', u'cd\n' ]
assert os.stat(filename).st_size == 14  # Only one BOM in the file

fout = myOpen()
fout.write(u"ef\n")
fout.write(u"gh\n")
fout.close()
print readThemBack()
#print list( clumsyPatch( readThemBack() ) )  # later you can enable this fix
assert readThemBack() == [ u'ab\n', u'cd\n', u'ef\n', u'gh\n' ] # fails here
assert os.stat(filename).st_size == 26  # not to mention here: multi BOM appears
History
Date User Action Args
2007-08-23 14:53:09adminlinkissue1701389 messages
2007-08-23 14:53:09admincreate