classification
Title: re库匹配问题
Type: security Stage: resolved
Components: Library (Lib) Versions: Python 3.6
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: andy.ye.jx, hongweipeng, malin, xtreak
Priority: normal Keywords:

Created on 2020-12-03 02:38 by andy.ye.jx, last changed 2020-12-03 06:41 by andy.ye.jx. This issue is now closed.

Messages (9)
msg382367 - (view) Author: ye andy (andy.ye.jx) Date: 2020-12-03 02:38
import re
a = """0xd26935a5ee4cd542e8a3a7e74fb7a99855975b59\n"""

eth_re = re.compile(r'^0x[0-9a-fA-F]{40}$')

print(eth_re.match(a))
print(len(a)) # 长度43
msg382377 - (view) Author: Ma Lin (malin) * Date: 2020-12-03 05:31
This issue can be closed.

'0x'  2
'd26935a5ee4cd542e8a3a7e74fb7a99855975b59'  40
'\n'  1

2+40+1 = 43
msg382378 - (view) Author: ye andy (andy.ye.jx) Date: 2020-12-03 05:53
What I mean by that is that the regex That I wrote should match successfully is a 42-bit string, but it is also successful when we add more newlines
msg382379 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python triager) Date: 2020-12-03 05:57
Maybe you're looking for re.fullmatch:

https://docs.python.org/3/library/re.html#re.fullmatch
msg382380 - (view) Author: ye andy (andy.ye.jx) Date: 2020-12-03 06:02
My regex requires ending with 0-9a-fa-f, and I now end with a line break, which in theory should not work. Why did it work?
msg382381 - (view) Author: ye andy (andy.ye.jx) Date: 2020-12-03 06:07
My regulus requires the beginning of 0x, the end of 0-9A-fa-f, my ending \n, he also shows success, my expected result is failure, I wrote the problem?
msg382383 - (view) Author: hongweipeng (hongweipeng) * Date: 2020-12-03 06:25
Maybe you need use `eth_re.match(a, re.MULTILINE)` or `eth_re.fullmatch(a)` .
msg382384 - (view) Author: ye andy (andy.ye.jx) Date: 2020-12-03 06:29
Okay, I just thought it was weird
msg382385 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2020-12-03 06:30
https://docs.python.org/3/howto/regex.html#more-metacharacters

$
Matches at the end of a line, which is defined as either the end of the string, or any location followed by a newline character.

\Z
Matches only at the end of the string.

>>> eth_re = re.compile(r'^0x[0-9a-fA-F]{40}\Z')
>>> print(eth_re.match(a))
None
>>> eth_re = re.compile(r'^0x[0-9a-fA-F]{40}$')
>>> print(eth_re.match(a))
<re.Match object; span=(0, 42), match='0xd26935a5ee4cd542e8a3a7e74fb7a99855975b59'>

You can also use re.DEBUG to see the difference


>>> re.match(r'^0x[0-9a-fA-F]{40}$', a, re.DEBUG)
AT AT_BEGINNING
LITERAL 48
LITERAL 120
MAX_REPEAT 40 40
  IN
    RANGE (48, 57)
    RANGE (97, 102)
    RANGE (65, 70)
AT AT_END

 0. INFO 4 0b0 42 42 (to 5)
 5: AT BEGINNING
 7. LITERAL 0x30 ('0')
 9. LITERAL 0x78 ('x')
11. REPEAT_ONE 16 40 40 (to 28)
15.   IN 11 (to 27)
17.     CHARSET [0x00000000, 0x03ff0000, 0x0000007e, 0x0000007e, 0x00000000, 0x00000000, 0x00000000, 0x00000000]
26.     FAILURE
27:   SUCCESS
28: AT END
30. SUCCESS
<re.Match object; span=(0, 42), match='0xd26935a5ee4cd542e8a3a7e74fb7a99855975b59'>


>>> re.match(r'^0x[0-9a-fA-F]{40}\Z', a, re.DEBUG)
AT AT_BEGINNING
LITERAL 48
LITERAL 120
MAX_REPEAT 40 40
  IN
    RANGE (48, 57)
    RANGE (97, 102)
    RANGE (65, 70)
AT AT_END_STRING

 0. INFO 4 0b0 42 42 (to 5)
 5: AT BEGINNING
 7. LITERAL 0x30 ('0')
 9. LITERAL 0x78 ('x')
11. REPEAT_ONE 16 40 40 (to 28)
15.   IN 11 (to 27)
17.     CHARSET [0x00000000, 0x03ff0000, 0x0000007e, 0x0000007e, 0x00000000, 0x00000000, 0x00000000, 0x00000000]
26.     FAILURE
27:   SUCCESS
28: AT END_STRING
30. SUCCESS
History
Date User Action Args
2020-12-03 06:41:40andy.ye.jxsetstatus: open -> closed
stage: resolved
2020-12-03 06:30:40xtreaksetnosy: + xtreak
messages: + msg382385
2020-12-03 06:29:37andy.ye.jxsetmessages: + msg382384
2020-12-03 06:25:51hongweipengsetnosy: + hongweipeng
messages: + msg382383
2020-12-03 06:07:15andy.ye.jxsetmessages: + msg382381
2020-12-03 06:02:20andy.ye.jxsetnosy: - Dennis Sweeney
messages: + msg382380
2020-12-03 05:57:47Dennis Sweeneysetnosy: + Dennis Sweeney
messages: + msg382379
2020-12-03 05:53:19andy.ye.jxsetmessages: + msg382378
2020-12-03 05:31:20malinsetnosy: + malin
messages: + msg382377
2020-12-03 02:38:33andy.ye.jxcreate