Message 28769 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tungwaiyip
Recipients
Date	2006-06-23.01:31:10
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Logged In: YES user_id=561546 Turns out the code is already written but disabled. Simply turning it on would work. tokenizer.c(321): #if 0 /* Disable support for UTF-16 BOMs until a decision is made whether this needs to be supported. / } else if (ch == 0xFE) { ch = get_char(tok); if (ch != 0xFF) goto NON_ BOM; if (!set_readline(tok, "utf-16-be")) return 0; tok->decoding_state = -1; } else if (ch == 0xFF) { ch = get_char(tok); if (ch != 0xFE) goto NON_ BOM; if (!set_readline(tok, "utf-16-le")) return 0; tok->decoding_state = -1; #endif Executing an utf-16 text file with BOM file would work. However if I also include an encoding declaration plus BOM like this # -- coding: UTF-16le -*- It would result in this error, for some logic in the code that I couldn't sort out {tokenizer.c(291)}: g:\bin\py_repos\python-svn\PCbuild>python_d.exe test16le. py File "test16le.py", line 1 SyntaxError: encoding problem: utf-8 If you need a justification for checking the UTF-16 BOM, it is Microsoft. As an early adopter of unicode before UTF-8 is popularized, there is some software that generates UTF- 16 by default. Not a fatal issue. But I see no reason not to support it either.

Logged In: YES 
user_id=561546

Turns out the code is already written but disabled. Simply 
turning it on would work.

tokenizer.c(321):
#if 0
	/* Disable support for UTF-16 BOMs until a decision
	   is made whether this needs to be supported.  */
	} else if (ch == 0xFE) {
		ch = get_char(tok); if (ch != 0xFF) goto NON_
BOM;
		if (!set_readline(tok, "utf-16-be")) return 0;
		tok->decoding_state = -1;
	} else if (ch == 0xFF) {
		ch = get_char(tok); if (ch != 0xFE) goto NON_
BOM;
		if (!set_readline(tok, "utf-16-le")) return 0;
		tok->decoding_state = -1;
#endif


Executing an utf-16 text file with BOM file would work. 
However if I also include an encoding declaration plus BOM 
like this

  # -*- coding: UTF-16le -*-


It would result in this error, for some logic in the code 
that I couldn't sort out {tokenizer.c(291)}:


  g:\bin\py_repos\python-svn\PCbuild>python_d.exe test16le.
py
    File "test16le.py", line 1
  SyntaxError: encoding problem: utf-8


If you need a justification for checking the UTF-16 BOM, it 
is Microsoft. As an early adopter of unicode before UTF-8 
is popularized, there is some software that generates UTF-
16 by default. Not a fatal issue. But I see no reason not 
to support it either.

History
Date	User	Action	Args
2007-08-23 14:40:30	admin	link	issue1503789 messages
2007-08-23 14:40:30	admin	create