CSV Dataset

ArXiv and OpenReview data stored in CSV format supported by pandas dataframe.

Overview

The CSV Dataset of ArXiv and OpenReview data provides a simple, accessible format of data interface. Built with pandas dataframes, it enables quick exploratory analysis and data manipulation in Python.

Best for: Quick data exploration; pandas-based analysis; portability across tools; simpler workflows without database setup.

Key Features

Pandas Compatible

Directly loadable into pandas DataFrames with proper data types and no additional dependencies.

Easy to Share

Portable format that works across programming languages without database infrastructure.

Human Readable

Plain text format that can be inspected directly or opened in spreadsheet applications.

Low Dependency

No database setup required. CSV dataset can be loaded by any compatible library.

File Structure

The CSV dataset represents entities and relations as separate CSV files. Each file corresponds to one specific entity or relation defined in the linked documentation. Each row represents a single entity or relationship instance with consistent column headers.

ArXiv Entity Files

File Name Description Key Columns
arxiv_papers.csv Academic papers from ArXiv with version tracking id, arxiv_id, base_arxiv_id, version, title, abstract, submit_date, metadata
arxiv_authors.csv Authors with Semantic Scholar integration id, semantic_scholar_id, name, homepage
arxiv_categories.csv ArXiv subject categories id, name, description
arxiv_sections.csv Paper sections with full content id, content, title, appendix, paper_arxiv_id, section_in_paper_id
arxiv_paragraphs.csv Individual paragraphs within sections id, section_id, content, paragraph_in_section_id
arxiv_figures.csv Paper figures with captions and labels id, paper_arxiv_id, path, caption, label, name
arxiv_tables.csv Paper tables with content id, paper_arxiv_id, path, caption, label, table_text

OpenReview Entity Files

File Name Description Key Columns
openreview_papers.csv Papers submitted to OpenReview venues venue, paper_openreview_id, title, abstract, paper_decision, paper_pdf_link
openreview_authors.csv Authors with affiliation information venue, author_openreview_id, author_full_name, email, affiliation, homepage, dblp
openreview_paragraphs.csv Paragraph-level content from papers venue, paper_openreview_id, paragraph_idx, section, content
openreview_reviews.csv Peer reviews with structured content venue, review_openreview_id, replyto_openreview_id, writer, title, content, time
openreview_revisions.csv Paper revision history venue, original_openreview_id, revision_openreview_id, content, time
openreview_arxiv.csv Cross-platform links to ArXiv papers venue, paper_openreview_id, arxiv_id, title

ArXiv Relationship Files

File Name Description Key Columns
arxiv_citations.csv Paper-to-paper citations with context id, citing_arxiv_id, cited_arxiv_id, bib_title, bib_key, citing_sections, citing_paragraphs
arxiv_paper_authors.csv Paper-author relationships with ordering paper_arxiv_id, author_id, author_sequence
arxiv_paper_category.csv Paper categorization paper_arxiv_id, category_id
arxiv_paper_figures.csv Paper-figure containment paper_arxiv_id, figure_id
arxiv_paper_tables.csv Paper-table containment paper_arxiv_id, table_id
arxiv_paragraph_references.csv Internal references to figures/tables id, paragraph_id, paper_section, paper_arxiv_id, reference_label, reference_type

OpenReview Relationship Files

File Name Description Key Columns
openreview_papers_authors.csv Paper-author associations venue, paper_openreview_id, author_openreview_id
openreview_papers_reviews.csv Paper-review relationships venue, paper_openreview_id, review_openreview_id, title, time
openreview_papers_revisions.csv Paper revision tracking venue, paper_openreview_id, revision_openreview_id, title, time
openreview_revisions_reviews.csv Links revisions to reviews venue, revision_openreview_id, review_openreview_id

Dataset Statistics

Papers

40,000+

ArXiv & OpenReview

Paragraphs

5M+

Fine-grained content

Reviews

Comprehensive

OpenReview venues

Citations

Tracked

With context