classification
Title: regex matching on bytes considers zero byte as end
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Simmo Saan, ezio.melotti, mrabarnett, serhiy.storchaka, tim.peters
Priority: normal Keywords:

Created on 2016-04-30 15:02 by Simmo Saan, last changed 2016-04-30 15:38 by tim.peters. This issue is now closed.

Messages (3)
msg264561 - (view) Author: Simmo Saan (Simmo Saan) Date: 2016-04-30 15:02
Regex functions on bytes consider zero byte as end and stop matching at that point. This is completely nonsensical since python has no problems working with zero bytes otherwise.

For example:
  Matches as expected: re.match(b'a', b'abc')
  Does not match unexpectedly: re.match(b'a', b'\x00abc')
msg264562 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-30 15:23
There is no bug.

The pattern b'a' matches bytes that starts with byte 97 (ord(b'a')), but b'\x00abc' starts with byte 0.
msg264565 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-04-30 15:38
Do note that `.match()` is constrained to match starting at the first byte.  `.search()` is not (it can start matching at any position), and your example works fine if `.search()` is used instead.

This is all expected, and intended, and documented.
History
Date User Action Args
2016-04-30 15:38:11tim.peterssetnosy: + tim.peters
messages: + msg264565
2016-04-30 15:23:39serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg264562

resolution: not a bug
stage: resolved
2016-04-30 15:02:11Simmo Saancreate