Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv.Sniffer.sniff() regex error #74343

Closed
jake-jake-jake mannequin opened this issue Apr 25, 2017 · 13 comments
Closed

csv.Sniffer.sniff() regex error #74343

jake-jake-jake mannequin opened this issue Apr 25, 2017 · 13 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir

Comments

@jake-jake-jake
Copy link
Mannequin

jake-jake-jake mannequin commented Apr 25, 2017

BPO 30157
Nosy @vstinner, @bitdancer, @serhiy-storchaka, @jake-jake-jake
PRs
  • bpo-30157: Fix csv.Sniffer.sniff() regex pattern #1273
  • bpo-30157: Fix csv.Sniffer.sniff() regex pattern. #5601
  • [3.7] bpo-30157: Fix csv.Sniffer.sniff() regex pattern. (GH-5601) #5602
  • [3.6] bpo-30157: Fix csv.Sniffer.sniff() regex pattern. (GH-5601) #5603
  • [2.7] bpo-30157: Fix csv.Sniffer.sniff() regex pattern. (GH-5601) #5604
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-02-09.22:03:21.935>
    created_at = <Date 2017-04-25.01:51:06.156>
    labels = ['3.8', '3.7', 'library']
    title = 'csv.Sniffer.sniff() regex error'
    updated_at = <Date 2018-02-09.22:03:21.934>
    user = 'https://github.com/jake-jake-jake'

    bugs.python.org fields:

    activity = <Date 2018-02-09.22:03:21.934>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-02-09.22:03:21.935>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2017-04-25.01:51:06.156>
    creator = 'jcdavis1983'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 30157
    keywords = ['patch']
    message_count = 13.0
    messages = ['292249', '292254', '292267', '292282', '292290', '292294', '292434', '311898', '311902', '311908', '311909', '311910', '311911']
    nosy_count = 5.0
    nosy_names = ['vstinner', 'mrabarnett', 'r.david.murray', 'serhiy.storchaka', 'jcdavis1983']
    pr_nums = ['1273', '5601', '5602', '5603', '5604']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue30157'
    versions = ['Python 2.7', 'Python 3.6', 'Python 3.7', 'Python 3.8']

    @jake-jake-jake
    Copy link
    Mannequin Author

    jake-jake-jake mannequin commented Apr 25, 2017

    Line 220 of Lib/csv.py has an extra > in the first group:

    r'(?P<delim>>[^\\w\\n"\\'])

    @jake-jake-jake jake-jake-jake mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir labels Apr 25, 2017
    @mlouielu mlouielu mannequin changed the title csn.Sniffer.sniff() regex error csv.Sniffer.sniff() regex error Apr 25, 2017
    @vstinner
    Copy link
    Member

    What is the consequence of this change? Does it change the syntax of the parser? Which kind of format wasn't parsed correctly?

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Apr 25, 2017

    There are 4 patterns. They try to determine the delimiter and quote by looking for matches. Each pattern supposedly covers one of 4 cases:

    1. Delimiter, quote, value, quote, delimiter.

    2. Start of line/text, quote, value, quote, delimiter.

    3. Delimiter, quote, value, quote, end of line/text.

    4. Start of line/text, quote, value, quote, end of line/text.

    On that basis, case 3 looks wrong because the pattern for delimiter is:

    >[^\w\n"\']
    

    instead of the expected:

    [^\w\n"\']
    

    Looks like a bug to me.

    @vstinner
    Copy link
    Member

    Can you please try to write a unit test to check for non-regression? Or at least give an example?

    @bitdancer
    Copy link
    Member

    If it is a bug that indicates there is at least one missing unit test :) Maybe the OP will contribute a test.

    @jake-jake-jake
    Copy link
    Mannequin Author

    jake-jake-jake mannequin commented Apr 26, 2017

    Will do! I will try to get a regression proof test into test_csv.py in the next 24 hours. Essentially I will make sure that the sniffer returns a positive match for each of the patterns that the regex is intended to hit.

    @jake-jake-jake
    Copy link
    Mannequin Author

    jake-jake-jake mannequin commented Apr 27, 2017

    I've added some unittests for Sniffer._guess_quote_and_delimiter(); they should prevent regression.

    @serhiy-storchaka
    Copy link
    Member

    Since the original author didn't respond for long time I have recreated PR 1273 as PR 5601.

    @serhiy-storchaka serhiy-storchaka added the 3.8 only security fixes label Feb 9, 2018
    @serhiy-storchaka
    Copy link
    Member

    New changeset 2411292 by Serhiy Storchaka in branch 'master':
    bpo-30157: Fix csv.Sniffer.sniff() regex pattern. (GH-5601)
    2411292

    @serhiy-storchaka
    Copy link
    Member

    New changeset 2ef69a1 by Serhiy Storchaka (Miss Islington (bot)) in branch '3.7':
    bpo-30157: Fix csv.Sniffer.sniff() regex pattern. (GH-5601) (GH-5602)
    2ef69a1

    @serhiy-storchaka
    Copy link
    Member

    New changeset 504f191 by Serhiy Storchaka in branch '3.6':
    [3.6] bpo-30157: Fix csv.Sniffer.sniff() regex pattern. (GH-5601) (GH-5603)
    504f191

    @serhiy-storchaka
    Copy link
    Member

    New changeset e719793 by Serhiy Storchaka in branch '2.7':
    [2.7] bpo-30157: Fix csv.Sniffer.sniff() regex pattern. (GH-5601) (GH-5604)
    e719793

    @serhiy-storchaka
    Copy link
    Member

    Thank you for your contribution Jake!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants