Return-Path: ralph@inputplus.co.uk Delivery-Date: Sun May 8 14:57:15 2011 Return-Path: X-Original-To: ralph Delivered-To: ralph@inputplus.co.uk Received: by orac.inputplus.co.uk (Postfix, from userid 1000) id 5110D2BC6D; Sun, 8 May 2011 14:57:15 +0100 (BST) Received: from orac.inputplus.co.uk (localhost [127.0.0.1]) by orac.inputplus.co.uk (Postfix) with ESMTP id 37A3228872; Sun, 8 May 2011 14:57:15 +0100 (BST) To: Python tracker from: ralph-pythonbugs@inputplus.co.uk Subject: Re: [issue10713] re module doesn't describe string boundaries for \b In-reply-to: <1304780986.97.0.366935592017.issue10713@psf.upfronthosting.co.za> References: <1304780986.97.0.366935592017.issue10713@psf.upfronthosting.co.za> Comments: In-reply-to =?utf-8?q?=C3=89ric_Araujo?= message dated "Sat, 07 May 2011 15:09:47 -0000." Date: Sun, 08 May 2011 14:57:15 +0100 Sender: ralph@inputplus.co.uk Message-Id: <20110508135715.5110D2BC6D@orac.inputplus.co.uk> Examining the source of Ubuntu's python2.6 2.6.6-5ubuntu1 package suggests beyond the limits of the string is considered \W, like Perl. Modules/_sre.c: 336 LOCAL(int) 337 SRE_AT(SRE_STATE* state, SRE_CHAR* ptr, SRE_CODE at) 338 { 339 /* check if pointer is at given position */ 340 341 Py_ssize_t thisp, thatp; ... 365 case SRE_AT_BOUNDARY: 366 if (state->beginning == state->end) 367 return 0; 368 thatp = ((void*) ptr > state->beginning) ? 369 SRE_IS_WORD((int) ptr[-1]) : 0; 370 thisp = ((void*) ptr < state->end) ? 371 SRE_IS_WORD((int) ptr[0]) : 0; 372 return thisp != thatp; SRE_IS_WORD() returns 16 for the 63 \w characters, 0 otherwise. This is born out by tests. Note, 366 above confirms it's never true for an empty string. The documentation states that \B "is just the opposite of \b" yet re.match(r'\b', '') returns None and so does \B so \B isn't the opposite of \b in all cases.