This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vinay.sajip
Recipients vinay.sajip, zmk
Date 2012-04-11.13:12:22
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1334149943.11.0.215406763651.issue14452@psf.upfronthosting.co.za>
In-reply-to
Content
I have a possible suggestion about how to resolve this issue:

The SysLogHandler will not do BOM insertion unless the message is Unicode. If it is Unicode, it will add the attribute 'UTF8BOM' to the LogRecord, with the value u'\ufeff'. The record will then be formatted; if the format string contains the UTF8BOM placeholder, it will be replaced with the value u'\ufeff', which, when encoded, results in the UTF-8 BOM value '\xef\xbb\xbf'. The user of the format string is responsible for ensuring that:

1. If there's no UTF8BOM placeholder in the format string, everything in the formatted result must always encode to plain ASCII when encoded using UTF-8.

2. If there is a UTF8BOM placeholder in the format string, everything in the formatted result prior to the placeholder must always encode to plain ASCII when encoded using UTF-8. The stuff following can, of course, be free of that restriction.

3. The end result of encoding should be a prefix which is bytes of pure ASCII, then the BOM (if the placeholder is present in the format string), then bytes of UTF-8 encoded Unicode.

In any case, a Unicode string will be encoded using UTF-8. If no UTF8BOM placeholder was present, no BOM will be; the message can be considered to just be a set of octets, which just happens to be UTF-8 encoded Unicode. If the placeholder was present, the BOM should appear at the appropriate place to comply with RFC 5424.

On 3.2, the message will always be Unicode, and the above processing will take place (whereas on 2.x it will be conditional on the type of the formatted message string being Unicode).

This seems to provide a resolution to the issue which can be solved without API changes, and with changes to the format string if BOM insertion is needed. With no UTF8BOM placeholder, the BOM will simply not be inserted. Can you comment on this suggestion?
History
Date User Action Args
2012-04-11 13:12:23vinay.sajipsetrecipients: + vinay.sajip, zmk
2012-04-11 13:12:23vinay.sajipsetmessageid: <1334149943.11.0.215406763651.issue14452@psf.upfronthosting.co.za>
2012-04-11 13:12:22vinay.sajiplinkissue14452 messages
2012-04-11 13:12:22vinay.sajipcreate