CSV Dataset

Overview

The CSV Dataset of ArXiv and OpenReview data provides a simple, accessible format of data interface. Built with pandas dataframes, it enables quick exploratory analysis and data manipulation in Python.

Best for: Quick data exploration; pandas-based analysis; portability across tools; simpler workflows without database setup.

Key Features

Pandas Compatible

Directly loadable into pandas DataFrames with proper data types and no additional dependencies.

Easy to Share

Portable format that works across programming languages without database infrastructure.

Human Readable

Plain text format that can be inspected directly or opened in spreadsheet applications.

Low Dependency

No database setup required. CSV dataset can be loaded by any compatible library.

File Structure

The CSV dataset represents entities and relations as separate CSV files. Each file corresponds to one specific entity or relation defined in the linked documentation. Each row represents a single entity or relationship instance with consistent column headers.

ArXiv Entity Files

File Name	Description	Key Columns
`arxiv_papers.csv`	Academic papers from ArXiv with version tracking	id, arxiv_id, base_arxiv_id, version, title, abstract, submit_date, metadata
`arxiv_authors.csv`	Authors with Semantic Scholar integration	id, semantic_scholar_id, name, homepage
`arxiv_categories.csv`	ArXiv subject categories	id, name, description
`arxiv_sections.csv`	Paper sections with full content	id, content, title, appendix, paper_arxiv_id, section_in_paper_id
`arxiv_paragraphs.csv`	Individual paragraphs within sections	id, section_id, content, paragraph_in_section_id
`arxiv_figures.csv`	Paper figures with captions and labels	id, paper_arxiv_id, path, caption, label, name
`arxiv_tables.csv`	Paper tables with content	id, paper_arxiv_id, path, caption, label, table_text

OpenReview Entity Files

File Name	Description	Key Columns
`openreview_papers.csv`	Papers submitted to OpenReview venues	venue, paper_openreview_id, title, abstract, paper_decision, paper_pdf_link
`openreview_authors.csv`	Authors with affiliation information	venue, author_openreview_id, author_full_name, email, affiliation, homepage, dblp
`openreview_paragraphs.csv`	Paragraph-level content from papers	venue, paper_openreview_id, paragraph_idx, section, content
`openreview_reviews.csv`	Peer reviews with structured content	venue, review_openreview_id, replyto_openreview_id, writer, title, content, time
`openreview_revisions.csv`	Paper revision history	venue, original_openreview_id, revision_openreview_id, content, time
`openreview_arxiv.csv`	Cross-platform links to ArXiv papers	venue, paper_openreview_id, arxiv_id, title

ArXiv Relationship Files

File Name	Description	Key Columns
`arxiv_citations.csv`	Paper-to-paper citations with context	id, citing_arxiv_id, cited_arxiv_id, bib_title, bib_key, citing_sections, citing_paragraphs
`arxiv_paper_authors.csv`	Paper-author relationships with ordering	paper_arxiv_id, author_id, author_sequence
`arxiv_paper_category.csv`	Paper categorization	paper_arxiv_id, category_id
`arxiv_paper_figures.csv`	Paper-figure containment	paper_arxiv_id, figure_id
`arxiv_paper_tables.csv`	Paper-table containment	paper_arxiv_id, table_id
`arxiv_paragraph_references.csv`	Internal references to figures/tables	id, paragraph_id, paper_section, paper_arxiv_id, reference_label, reference_type

OpenReview Relationship Files

File Name	Description	Key Columns
`openreview_papers_authors.csv`	Paper-author associations	venue, paper_openreview_id, author_openreview_id
`openreview_papers_reviews.csv`	Paper-review relationships	venue, paper_openreview_id, review_openreview_id, title, time
`openreview_papers_revisions.csv`	Paper revision tracking	venue, paper_openreview_id, revision_openreview_id, title, time
`openreview_revisions_reviews.csv`	Links revisions to reviews	venue, revision_openreview_id, review_openreview_id

Dataset Statistics

Papers

40,000+

ArXiv & OpenReview

Paragraphs

5M+

Fine-grained content

Reviews

Comprehensive

OpenReview venues

Citations

Tracked

With context

Format Conversion

Need to convert data to other formats? See our format conversion guides:

Import to SQL

Load CSV data into relational databases

Format Conversion

Convert between SQL, CSV, and JSON formats

Overview

Key Features

Pandas Compatible

Easy to Share

Human Readable

Low Dependency

File Structure

ArXiv Entity Files

OpenReview Entity Files

ArXiv Relationship Files

OpenReview Relationship Files

Dataset Statistics

Format Conversion

Import to SQL

Format Conversion

Related Resources

Entity Reference

Relationship Reference

CSV Storage Details