Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getpos() for sgmllib #39750

Closed
d98dzone mannequin opened this issue Jan 1, 2004 · 2 comments
Closed

getpos() for sgmllib #39750

d98dzone mannequin opened this issue Jan 1, 2004 · 2 comments
Labels
type-feature A feature request or enhancement

Comments

@d98dzone
Copy link
Mannequin

d98dzone mannequin commented Jan 1, 2004

BPO 868908
Nosy @devdanzin
Superseder
  • bpo-849097: Request: getpos() for sgmllib
  • Files
  • diff.txt: Unix diff on The updated version and the CVS version(1.46)
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-02-13.05:19:37.787>
    created_at = <Date 2004-01-01.20:01:48.000>
    labels = ['type-feature']
    title = 'getpos() for sgmllib'
    updated_at = <Date 2009-02-13.05:19:37.757>
    user = 'https://bugs.python.org/d98dzone'

    bugs.python.org fields:

    activity = <Date 2009-02-13.05:19:37.757>
    actor = 'ajaksu2'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-02-13.05:19:37.787>
    closer = 'ajaksu2'
    components = ['None']
    creation = <Date 2004-01-01.20:01:48.000>
    creator = 'd98dzone'
    dependencies = []
    files = ['8236']
    hgrepos = []
    issue_num = 868908
    keywords = []
    message_count = 2.0
    messages = ['54083', '81882']
    nosy_count = 2.0
    nosy_names = ['d98dzone', 'ajaksu2']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = None
    status = 'closed'
    superseder = '849097'
    type = 'enhancement'
    url = 'https://bugs.python.org/issue868908'
    versions = []

    @d98dzone
    Copy link
    Mannequin Author

    d98dzone mannequin commented Jan 1, 2004

    Placed here instead of in Bugs since it really isn't a bug.

    During the process of making my masters thesis I
    discovered the need for a working getpos() in
    sgmllib.py. As it is now you can successfully call it
    since it is inherited from markupbase.py but you will
    always get the answer (1,0) since it is never updated.

    To fix this one needs to change the goahead function.
    This is my own implementation of this change, in part
    influenced by the "sister" goahead-function in
    HTLMParser.py:


    def goahead(self, end):
    rawdata = self.rawdata
    i = 0
    k = 0
    n = len(rawdata)
    tmp=0
    while i < n:
    if self.nomoretags:
    self.handle_data(rawdata[i:n])
    i = n
    break
    match = interesting.search(rawdata, i)
    if match: j = match.start()
    else: j = n
    if i < j:
    self.handle_data(rawdata[i:j])
    tmp = self.updatepos(i, j)
    i = j
    if i == n: break
    startswith = rawdata.startswith
    if rawdata[i] == '<':
    if starttagopen.match(rawdata, i):
    if self.literal:
    self.handle_data(rawdata[i])
    tmp = self.updatepos(i, i+1)
    i = i+1
    continue
    k = self.parse_starttag(i)
    if k < 0: break
    tmp = self.updatepos(i, k)
    i = k
    continue
    if rawdata.startswith("</", i):
    k = self.parse_endtag(i)
    if k < 0: break
    tmp = self.updatepos(i, k)
    i = k
    self.literal = 0
    continue
    if self.literal:
    if n > (i + 1):
    self.handle_data("<")
    i = i+1
    tmp = self.updatepos(i, k)
    else:
    # incomplete
    break
    continue
    if rawdata.startswith("<!--", i):
    # Strictly speaking, a comment
    is --.*--
    # within a declaration tag <!...>.
    # This should be removed,
    # and comments handled only in
    parse_declaration.
    k = self.parse_comment(i)

    if k < 0: break
    tmp = self.updatepos(i, k)
    i = k

    continue
    if rawdata.startswith("<?", i):
    k = self.parse_pi(i)
    if k < 0: break
    tmp = self.updatepos(i, k)
    i = i+k
    continue
    if rawdata.startswith("<!", i):
    # This is some sort of declaration;
    in "HTML as
    # deployed," this should only be
    the document type
    # declaration ("<!DOCTYPE html...>").
    k = self.parse_declaration(i)
    if k < 0: break
    tmp = self.updatepos(i, k)
    i = k
    continue
    tmp = self.updatepos(i, k)
    elif rawdata[i] == '&':

    if self.literal:
    self.handle_data(rawdata[i])
    #tmp = self.updatepos(i,i+1)#added
    i = i+1
    continue
    match = charref.match(rawdata, i)
    if match:
    name = match.group()[2:-1]
    self.handle_charref(name)
    k = match.end()
    if not startswith(';', k-1):
    k = k - 1
    tmp = self.updatepos(i, k)
    i = k
    continue
    match = entityref.match(rawdata, i)
    if match:
    name = match.group(1)
    self.handle_entityref(name)
    k = match.end()
    if not startswith(';', k-1):
    k = k - 1
    tmp = self.updatepos(i, k)
    i = k
    continue

    else:
    self.error('neither < nor & ??')
    # We get here only if incomplete matches but
    # nothing else
    match = incomplete.match(rawdata, i)
    if not match:
    self.handle_data(rawdata[i])
    i = i+1
    continue
    j = match.end(0)
    if j == n:
    break # Really incomplete
    self.handle_data(rawdata[i:j])

    i = j
    
    
    # end while
    if end and i < n:
    self.handle_data(rawdata[i:n])
    tmp = self.updatepos(i, n)
    i = n
    self.rawdata = rawdata[i:]
    # XXX if end: check for empty stack

    # Extensions for the DOCTYPE scanner:
    _decl_otherchars = '='


    The major diffrence is the updatepos functions. It
    seems to work fine, or at least it has worked fine for
    me so far.

    Posted a diff taken againts the CVS version(1.46).
    It har three parts. The first is just updatepos
    inserted at the correct places in the function goahead. The
    second is from the part of the goahead function which
    handles the &-characters. I had a hard time making it work
    with the current model and changed it to a version inspired
    by the same part of the goahead-function in HTMLParser.py.
    The last is the printouts in the testfunction to check if
    the function performs ok.

    @d98dzone d98dzone mannequin added type-feature A feature request or enhancement labels Jan 1, 2004
    @devdanzin
    Copy link
    Mannequin

    devdanzin mannequin commented Feb 13, 2009

    Duplicate of bpo-849097.

    @devdanzin devdanzin mannequin closed this as completed Feb 13, 2009
    @devdanzin devdanzin mannequin closed this as completed Feb 13, 2009
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    0 participants