Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] CSV DictReader default dialect name 'excel' is misleading, as MS Excel doesn't actually use ',' as a separator. #71939

Closed
lockywolf mannequin opened this issue Aug 13, 2016 · 7 comments
Labels
3.11 only security fixes docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@lockywolf
Copy link
Mannequin

lockywolf mannequin commented Aug 13, 2016

BPO 27752
Nosy @ambv, @ztane, @lockywolf, @miss-islington, @jdevries3133
PRs
  • bpo-27752: improve documentation of csv.Dialect #26795
  • [3.10] bpo-27752: improve documentation of csv.Dialect (GH-26795) #27643
  • [3.9] bpo-27752: improve documentation of csv.Dialect (GH-26795) #27644
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-08-06.20:34:41.107>
    created_at = <Date 2016-08-13.06:49:56.128>
    labels = ['3.11', 'type-bug', 'library', 'docs']
    title = "[doc] CSV DictReader default dialect name 'excel' is misleading, as MS Excel doesn't actually use ',' as a separator."
    updated_at = <Date 2021-08-06.20:34:41.106>
    user = 'https://github.com/lockywolf'

    bugs.python.org fields:

    activity = <Date 2021-08-06.20:34:41.106>
    actor = 'lukasz.langa'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2021-08-06.20:34:41.107>
    closer = 'lukasz.langa'
    components = ['Documentation', 'Library (Lib)']
    creation = <Date 2016-08-13.06:49:56.128>
    creator = 'lockywolf'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 27752
    keywords = ['patch']
    message_count = 7.0
    messages = ['272579', '272580', '396103', '399137', '399141', '399142', '399143']
    nosy_count = 5.0
    nosy_names = ['docs@python', 'lukasz.langa', 'ztane', 'lockywolf', 'miss-islington', 'jack__d']
    pr_nums = ['26795', '27643', '27644']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue27752'
    versions = ['Python 3.11']

    @lockywolf
    Copy link
    Mannequin Author

    lockywolf mannequin commented Aug 13, 2016

    Hello, everyone.

    I want to report a minor usability issue:

    I wanted to use the csv module to load CSV's and the documentation says that the default dialect for reading CSVs is 'excel'.

    However, the delimiter used with this dialect in Python is a comma (','), whereas in fact (even though is's called _comma_ separated values) MS Excel (2016) uses a semicolon (';') as a delimiter.
    Therefore, the Python's 'excel' actually doesn't read Excel generated files.

    @lockywolf lockywolf mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Aug 13, 2016
    @ztane
    Copy link
    Mannequin

    ztane mannequin commented Aug 13, 2016

    Excel's behaviour has always been locale-dependent. If the user's locale uses , as the decimal mark , then ; has been used as the column separator in "C"SV. However, even if you use autodetection with sniff, it is impossible to detect with 100 % accuracy, e.g, is the following csv row comma or semicolon separated:

    1,2;3;4,5;6,7;8;9
    

    The dialect could be documented better though, as currently it simply says:

    The excel class defines the usual properties of an Excel-generated CSV file. It is registered with the dialect name 'excel'.
    

    And there really should be a separate dialect for Excel-semicolon separated values, as a couple billion people would see ; in their CSV.

    @iritkatriel iritkatriel added 3.11 only security fixes docs Documentation in the Doc dir labels Jun 18, 2021
    @iritkatriel iritkatriel changed the title CSV DictReader default dialect name 'excel' is misleading, as MS Excel doesn't actually use ',' as a separator. [doc] CSV DictReader default dialect name 'excel' is misleading, as MS Excel doesn't actually use ',' as a separator. Jun 18, 2021
    @jdevries3133
    Copy link
    Mannequin

    jdevries3133 mannequin commented Jun 18, 2021

    If you need semicolon delimiters, can't you just pass delimiter=';' to the reader or writer? I don't think there's a need for a separate dialect class for that, since dialect classes should only provide a baseline for the most broad use cases. Users have plenty of options for extending or customizing behavior without adding more dialect classes.

    I also think the docs around dialects are confusing. I remember being confused by them when I was learning! I made quite a few changes to try to add clarity around dialects to the documentation. Let me know if anybody has feedback!

    @ambv
    Copy link
    Contributor

    ambv commented Aug 6, 2021

    New changeset 0ffdced by Jack DeVries in branch 'main':
    bpo-27752: improve documentation of csv.Dialect (GH-26795)
    0ffdced

    @miss-islington
    Copy link
    Contributor

    New changeset 2fd1f21 by Miss Islington (bot) in branch '3.10':
    bpo-27752: improve documentation of csv.Dialect (GH-26795)
    2fd1f21

    @ambv
    Copy link
    Contributor

    ambv commented Aug 6, 2021

    New changeset 62bce24 by Miss Islington (bot) in branch '3.9':
    bpo-27752: improve documentation of csv.Dialect (GH-26795) (GH-27644)
    62bce24

    @ambv
    Copy link
    Contributor

    ambv commented Aug 6, 2021

    Thanks for the patch, Jack! ✨ 🍰 ✨

    @ambv ambv closed this as completed Aug 6, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants