Created on 2011-05-24 11:44 by Carl.Crowder, last changed 2011-06-10 17:54 by python-dev. This issue is now closed.
|msg136743 - (view)||Author: Carl Crowder (Carl.Crowder)||Date: 2011-05-24 11:44|
logging.handlers.SysLogHandler contains this class variable and comment: # curious: when talking to the unix-domain '/dev/log' socket, a # zero-terminator seems to be required. this string is placed # into a class variable so that it can be overridden if # necessary. log_format_string = '<%d>%s\000' And separately, in emit: msg = self.format(record) + '\000' The assumption here is that a null character must be appended to delimit the syslog message. In RFC5424, there is no mention of a message delimiter, and in fact the previous syslog RFC, RFC3164, specifically states: > The MSG part will fill the remainder of the syslog packet. This will usually contain some additional information of the process that generated the message, and then the text of the message. There is no ending delimiter to this part. I believe this comment and behaviour is due to an older version of syslogd. Checking the manpage for an older version of rsyslog for example includes this piece of information : > There is probably one important consideration when installing rsyslogd. It is dependent on proper formatting of messages by the syslog function. The functioning of the syslog function in the shared libraries changed somewhere in the region of libc.so.4.[2-4].n. The specific change was to null-terminate the message before transmitting it to the /dev/log socket. Proper functioning of this version of rsyslogd is dependent on null-termination of the message. I'm running Ubuntu 11.04 with rsyslogd 4.6.4 (that is, the standard version). In the manpage for this version of rsyslogd, there is no reference to null-terminators. Removing "+ '\000'" from the SysLogHandler results in messages still being received correctly. Problem behaviour: 1) When running any RFC compliant syslog receiver that handles syslog messages, such as flume, this null character is not stripped as it is not expected to be present. Current versions of syslog cope because previously they assumed it existed. 2) The log_format_string class variable is not actually used anywhere, so it cannot be overridden usefully. Removing the null terminator will cause older typical versions of syslogd to fail to receive messages, however including it causes any normal receiver that does not implement the non-standard behaviour to receive an additional unwanted null. Suggestion for a fix is either to properly use the log_format_string class variable, or to allow an optional "append_null" argument to the SysLogHandler constructor. By default, this should be True, as it will continue to work with the main use case, which is unix syslog demons. Having the option will allow other use cases to also use the SysLogHandler.  http://manpages.ubuntu.com/manpages/hardy/man8/rsyslogd.8.html#contenttoc8  http://www.cloudera.com/blog/category/flume/
|msg136767 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2011-05-24 16:37|
While I agree that it should ideally be possible to actually control this behavior as indicated by the comment, any syslog handler that does not sanitize the messages it receives is broken.
|msg136768 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2011-05-24 16:38|
|msg136771 - (view)||Author: Vinay Sajip (vinay.sajip) *||Date: 2011-05-24 17:06|
Even though expecting the NUL-terminator is legacy behaviour, newer versions of the syslog daemons presumably follow Postel's rule of "be conservative in what you do, be liberal in what you accept from others". Can flume not be changed to follow this principle, given that ordinarily you wouldn't expect a NUL terminator byte on syslog messages? When the implementation of SysLogHandler was changed to comply (more closely) with RFC5424, the log_format_string value was inadvertently left behind in Python 2.x, and it makes sense to remove it since changing it doesn't do anything useful. Having an append_nul parameter to control NUL-termination is certainly doable, and this approach will keep the 2.x and 3.x implementations more or less the same (as 3.x doesn't have a log_format_string attribute). I suggest "nul" rather than "null" since we're talking about appending a NUL byte to the message. However, having it as an additional keyword parameter to the constructor could cause dictConfig failures if configurations specifying it are used with an old version of Python (e.g. 3.2.0) which doesn't have the parameter. For this reason, it might be better to have an append_nul class attribute (defaulting to True) which can be overridden with an instance attribute set to False. (This approach can still be used with dictConfig, as you can use a factory function other than the SysLogHandler class to construct a SysLogHandler instance.)
|msg136847 - (view)||Author: Carl Crowder (Carl.Crowder)||Date: 2011-05-25 14:12|
Flume certainly could avoid parsing certain values. However, while a syslog application "should avoid octet values below 32", they are still "legal" . I don't think that adjusting flume to reject legal values due to legacy behaviour in some unix syslog daemons is the Right Thing™ here.  http://tools.ietf.org/html/rfc5424#section-6.4
|msg136856 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2011-05-25 15:28|
And it goes on to say: "[...] but a syslog application MAY modify these characters upon reception. For example, it might change them into an escape sequence (e.g., value 0 may be changed to "\0")" Flume should not break if it receives NULs. But there should be a way to prevent generating them, and Vinay has agreed with that. Now we just need a patch :) (he's probably working on one...)
|msg136857 - (view)||Author: Carl Crowder (Carl.Crowder)||Date: 2011-05-25 15:31|
Oh, I understand. Flume doesn't break, it handles the \0 just fine, the problem is that I ended up with a message with that additional byte on the end. Sorry for the confusion!
|msg136858 - (view)||Author: Petri Lehtinen (petri.lehtinen) *||Date: 2011-05-25 15:35|
FWIW, I once stumbled on this problem, and solved it by making my own log handler that uses functions from the syslog module.
|msg136859 - (view)||Author: Vinay Sajip (vinay.sajip) *||Date: 2011-05-25 16:02|
> Petri Lehtinen <email@example.com> added the comment: > > FWIW, I once stumbled on this problem, and solved it by making my own log >handler that uses functions from the syslog module. @Petri: It's great that it worked for you, but IIRC the syslog module has had thread-safety issues at least on some platforms, which is why it's not used in the SysLogHandler implementation (disclaimer: this was a while ago, I don't know if the syslog module is thread-safe now - I believe the thread-safety issues were in the underlying C APIs). I am working on a patch, watch this space :-)
|msg136860 - (view)||Author: Petri Lehtinen (petri.lehtinen) *||Date: 2011-05-25 16:06|
Vinay Sajip wrote: > @Petri: It's great that it worked for you, but IIRC the syslog module has had > thread-safety issues at least on some platforms, which is why it's not used in > the SysLogHandler implementation (disclaimer: this was a while ago, I don't know > if the syslog module is thread-safe now - I believe the thread-safety issues > were in the underlying C APIs). Ah, I was not aware of thread safety issues. A quick search revealed that there has been a non-standard syslog_r() function that's thread safe, but nowadays POSIX requires syslog() to be thread safe, too.
|msg136875 - (view)||Author: Vinay Sajip (vinay.sajip) *||Date: 2011-05-25 17:16|
Please note: this change is too late for inclusion in Python 3.2.1 :-( as it has already rc status. So I will probably push this change once 3.2.1 is out.
|msg137996 - (view)||Author: Roundup Robot (python-dev)||Date: 2011-06-09 15:55|
New changeset 260b84851d1f by Vinay Sajip in branch '3.2': Issue #12168: SysLogHandler now allows NUL termination to be controlled using a new 'append_nul' attribute on the handler. http://hg.python.org/cpython/rev/260b84851d1f New changeset ac1217099b3f by Vinay Sajip in branch 'default': Merged fix for issue #12168 from 3.2. http://hg.python.org/cpython/rev/ac1217099b3f
|msg138112 - (view)||Author: Éric Araujo (eric.araujo) *||Date: 2011-06-10 17:02|
Should this new attribute be documented?
|msg138127 - (view)||Author: Roundup Robot (python-dev)||Date: 2011-06-10 17:54|
New changeset 3cf5d61fd6d7 by Vinay Sajip in branch '3.2': Documented change for Issue #12168. http://hg.python.org/cpython/rev/3cf5d61fd6d7 New changeset 6658b9b9f5f3 by Vinay Sajip in branch 'default': Merged documentation change for Issue #12168. http://hg.python.org/cpython/rev/6658b9b9f5f3
|2011-06-10 17:54:27||python-dev||set||messages: + msg138127|
messages: + msg138112
|2011-06-09 17:38:19||vinay.sajip||set||status: open -> closed|
messages: + msg137996
|2011-05-25 17:16:34||vinay.sajip||set||messages: + msg136875|
|2011-05-25 16:06:56||petri.lehtinen||set||messages: + msg136860|
|2011-05-25 16:02:11||vinay.sajip||set||messages: + msg136859|
messages: + msg136858
|2011-05-25 15:31:21||Carl.Crowder||set||messages: + msg136857|
|2011-05-25 15:28:39||r.david.murray||set||messages: + msg136856|
title: SysLogHandler incorrectly appents \000 to messages -> SysLogHandler incorrectly appends \000 to messages
|2011-05-24 17:06:35||vinay.sajip||set||messages: + msg136771|
|2011-05-24 16:38:13||r.david.murray||set||messages: + msg136768|
messages: + msg136767
|2011-05-24 16:35:24||eric.araujo||set||assignee: vinay.sajip|
nosy: + vinay.sajip
versions: + Python 3.1, Python 3.2, Python 3.3