classification
Title: UTF-8-Sig codec
Type: Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: loewis Nosy List: doerwalter, loewis
Priority: normal Keywords: patch

Created on 2005-04-05 19:26 by doerwalter, last changed 2006-01-08 10:46 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
diff.txt doerwalter, 2005-04-05 19:26
diff2.txt doerwalter, 2005-04-05 20:28 Better handling of partial BOMs
diff3.txt doerwalter, 2005-08-09 13:41
UnicodeBOM.txt doerwalter, 2005-12-26 16:51
Messages (7)
msg48154 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2005-04-05 19:26
This patch implements a UTF-8-Sig codec. This codec
works like UTF-8 but adds a BOM on writing and skips
(at most) one BOM on reading.
msg48155 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2005-04-05 20:28
Logged In: YES 
user_id=89016

This second version of the patch will return starting bytes
immediately, if they don't look like a BOM.
msg48156 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-08-07 21:51
Logged In: YES 
user_id=21627

The patch looks fine, but lacks documentation changes.
msg48157 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2005-08-09 13:41
Logged In: YES 
user_id=89016

This version (diff3.txt) of the patch adds a note to
Misc/NEWS and a section to Doc/lib/libcodecs.tex. Is this
the correct place to add the documentation?
msg48158 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-08-09 18:53
Logged In: YES 
user_id=21627

The place is right, but I feel this documentation is
incomplete still. The library reference should explain
somewhere what the difference between utf-8 and utf-8-sig
is. Perhaps a footnote could be added. I think I would
prefer a separate subsection on the BOM, explaining byte
order in UTF-{16,32}, and how the BOM can be used as a magic
signature for UTF-8.
msg48159 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2005-12-26 16:51
Logged In: YES 
user_id=89016

OK, here's a text that explains what the BOM is used for in
various Unicode encodings. I hope that this can be turned
into something useful.
msg48160 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-01-08 10:46
Logged In: YES 
user_id=21627

Thanks for the patch. Committed as 41977.
History
Date User Action Args
2005-04-05 19:26:11doerwaltercreate