Message363234
Relevant base python library-- C:\Users\User\AppData\Local\Programs\Python\Python38\lib\_markupbase.py
The issue- After parsing over 900 SEC filings using beautifulsoup4, I get this user warning.
UserWarning: unknown status keyword 'ERF' in marked section
warnings.warn(msg)
Followed by a traceback
....
File "C:\Users\XXXX\AppData\Local\Programs\Python\Python38\lib\site-packages\bs4\__init__.py", line 325, in __init__
self._feed()
....
File "C:\Users\XXXX\AppData\Local\Programs\Python\Python38\lib\_markupbase.py", line 160, in parse_marked_section
if not match:
UnboundLocalError: local variable 'match' referenced before assignment
It's probably to due to malformed input from on of the docs.
144 lines into _markupbase lib we have:
# Internal -- parse a marked section
# Override this to handle MS-word extension syntax <![if word]>content<![endif]>
def parse_marked_section(self, i, report=1):
rawdata= self.rawdata
assert rawdata[i:i+3] == '<![', "unexpected call to parse_marked_section()"
sectName, j = self._scan_name( i+3, i )
if j < 0:
return j
if sectName in {"temp", "cdata", "ignore", "include", "rcdata"}:
# look for standard ]]> ending
match= _markedsectionclose.search(rawdata, i+3)
elif sectName in {"if", "else", "endif"}:
# look for MS Office ]> ending
match= _msmarkedsectionclose.search(rawdata, i+3)
else:
self.error('unknown status keyword %r in marked section' % rawdata[i+3:j])
if not match:
return -1
if report:
j = match.start(0)
self.unknown_decl(rawdata[i+3: j])
return match.end(0)
`match` should be set to None in the fall-through else statement right before `if not match`. |
|
Date |
User |
Action |
Args |
2020-03-03 02:16:06 | SanJacintoJoe | set | recipients:
+ SanJacintoJoe |
2020-03-03 02:16:05 | SanJacintoJoe | set | messageid: <1583201765.99.0.950637389492.issue39833@roundup.psfhosted.org> |
2020-03-03 02:16:05 | SanJacintoJoe | link | issue39833 messages |
2020-03-03 02:16:05 | SanJacintoJoe | create | |
|