SQL Dataset

ArXiv and OpenReview data stored in PostgreSQL relational dataset with normalized schemas.

Overview

The SQL Dataset of ArXiv and OpenReview data provides a normalized relational database. Built on PostgreSQL, it offers efficient querying with proper foreign key relationships and maintains data integrity across entities.

Best for: Complex queries with multiple joins; transactional consistency; production applications requiring data integrity; advanced analytics with SQL.

Key Features

Normalized Schema

Platform-specific schemas (arxiv.*, openreview.*) with proper foreign key constraints and referential integrity.

Rich Relationships

Models complex academic relationships including citations, authorship, reviews, and revisions with proper cardinality.

Cross-Platform Links

Direct connections between OpenReview and ArXiv papers through openreview.arxiv mapping table.

Optimized Queries

Efficient indexing on primary keys, foreign keys, and frequently queried fields for fast data retrieval.

Database Schema

The SQL dataset represents entities and relations in separate PostgreSQL schemas. Each table corresponds to one specific entity or relation type defined in the documentation. Tables use proper primary keys and foreign key constraints.

ArXiv Schema (arxiv.*)

Table Name Description Primary Key Key Columns
arxiv.papers ArXiv papers with version tracking id arxiv_id (unique), base_arxiv_id, version, title, abstract, submit_date, metadata (jsonb)
arxiv.authors Authors with Semantic Scholar IDs id semantic_scholar_id (unique), name, homepage
arxiv.categories ArXiv subject categories id name (unique), description
arxiv.sections Paper sections with full text id paper_arxiv_id (FK), content, title, appendix, section_in_paper_id
arxiv.paragraphs Individual paragraphs id section_id (FK), content, paragraph_in_section_id
arxiv.figures Paper figures id paper_arxiv_id (FK), path, caption, label, name
arxiv.tables Paper tables id paper_arxiv_id (FK), path, caption, label, table_text

ArXiv Relationship Tables

Table Name Description Composite Key Foreign Keys
arxiv.citations Paper citations with context id citing_arxiv_id, cited_arxiv_id, bib_title, bib_key, citing_sections (jsonb), citing_paragraphs (jsonb)
arxiv.paper_authors Paper-author relationships paper_arxiv_id, author_id paper_arxiv_id (FK), author_id (FK), author_sequence
arxiv.paper_category Paper categorization paper_arxiv_id, category_id paper_arxiv_id (FK), category_id (FK)
arxiv.paper_figures Paper-figure links paper_arxiv_id, figure_id paper_arxiv_id (FK), figure_id (FK)
arxiv.paper_tables Paper-table links paper_arxiv_id, table_id paper_arxiv_id (FK), table_id (FK)
arxiv.paragraph_references Internal figure/table references id paragraph_id (FK), paper_arxiv_id (FK), paper_section, reference_label, reference_type

OpenReview Schema (openreview.*)

Table Name Description Composite Key Key Columns
openreview.papers OpenReview papers venue, paper_openreview_id title, abstract, paper_decision, paper_pdf_link
openreview.authors Authors with profiles venue, author_openreview_id author_full_name, email, affiliation, homepage, dblp
openreview.paragraphs Paragraph-level content venue, paper_openreview_id, paragraph_idx section, content
openreview.reviews Peer reviews venue, review_openreview_id replyto_openreview_id (FK), writer, title, content (jsonb), time
openreview.revisions Paper revisions venue, revision_openreview_id original_openreview_id (FK), content (jsonb), time
openreview.arxiv OpenReview-ArXiv links venue, paper_openreview_id arxiv_id, title

OpenReview Relationship Tables

Table Name Description Composite Key Columns
openreview.papers_authors Paper-author associations venue, paper_openreview_id, author_openreview_id All three columns (composite FK)
openreview.papers_reviews Paper-review links venue, paper_openreview_id, review_openreview_id title, time
openreview.papers_revisions Paper revision tracking venue, paper_openreview_id, revision_openreview_id title, time
openreview.revisions_reviews Revision-review temporal links venue, revision_openreview_id, review_openreview_id All three columns (composite FK)

Dataset Statistics

Papers

40,000+

ArXiv & OpenReview

Paragraphs

5M+

Fine-grained content

Reviews

Comprehensive

Multiple venues

Citations

Tracked

With context

Export & Migration

Need to export data to other formats? See our format conversion guides: