This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tungwaiyip
Recipients
Date 2006-06-23.01:31:10
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=561546

Turns out the code is already written but disabled. Simply 
turning it on would work.

tokenizer.c(321):
#if 0
	/* Disable support for UTF-16 BOMs until a decision
	   is made whether this needs to be supported.  */
	} else if (ch == 0xFE) {
		ch = get_char(tok); if (ch != 0xFF) goto NON_
BOM;
		if (!set_readline(tok, "utf-16-be")) return 0;
		tok->decoding_state = -1;
	} else if (ch == 0xFF) {
		ch = get_char(tok); if (ch != 0xFE) goto NON_
BOM;
		if (!set_readline(tok, "utf-16-le")) return 0;
		tok->decoding_state = -1;
#endif


Executing an utf-16 text file with BOM file would work. 
However if I also include an encoding declaration plus BOM 
like this

  # -*- coding: UTF-16le -*-


It would result in this error, for some logic in the code 
that I couldn't sort out {tokenizer.c(291)}:


  g:\bin\py_repos\python-svn\PCbuild>python_d.exe test16le.
py
    File "test16le.py", line 1
  SyntaxError: encoding problem: utf-8


If you need a justification for checking the UTF-16 BOM, it 
is Microsoft. As an early adopter of unicode before UTF-8 
is popularized, there is some software that generates UTF-
16 by default. Not a fatal issue. But I see no reason not 
to support it either.
History
Date User Action Args
2007-08-23 14:40:30adminlinkissue1503789 messages
2007-08-23 14:40:30admincreate