SQL Dataset

ArXiv and OpenReview data stored in PostgreSQL relational dataset with schemas.

Overview

The SQL Dataset of ArXiv and OpenReview data provides a normalized relational database. Built on PostgreSQL, it offers efficient querying interfaces and maintains connectivity across entities.

Best for: Complex queries with multiple conditions; joint queries across different tables.

Key Features

Normalized Schema

Fully normalized tables following 3NF principles with proper foreign key constraints and referential integrity.

Rich Relationships

Models complex academic relationships.

Database Schema

The SQL dataset represents entities and relations in the form of PostgreSQL data. Below, each table corresponds to one specific entities or relations defined in the link. Each table has a primary key and/or foreign keys for references.

Core Entity Tables

Table Name Description Primary Key Key Columns
arxiv.papers Academic papers from ArXiv paper_id title, abstract, publish_date, source
arxiv.authors Paper authors and their affiliations author_id name, email, affiliation, orcid
arxiv.sections Hierarchical paper sections section_id paper_id, title, section_type, depth
arxiv.paragraphs Text content at paragraph level paragraph_id section_id, text, position, word_count
arxiv.figures Paper figures and images figure_id paper_id, caption, file_path, position
arxiv.tables Paper tables and data table_id paper_id, caption, content, position
openreview.reviews Peer reviews from OpenReview review_id paper_id, reviewer_id, rating, confidence
openreview.decisions Editorial decisions on papers decision_id paper_id, decision_type, decision_date
openreview.venues Publication venues and conferences venue_id name, abbreviation, venue_type, year

Relationship Tables

Table Name Description Foreign Keys
authorship Links papers to their authors paper_id, author_id, position
citations Paper-to-paper citation relationships citing_paper_id, cited_paper_id
references Paragraph-level citation contexts paragraph_id, cited_paper_id, context
revisions Paper revision history paper_id, version, revision_date
rebuttals Author responses to reviews review_id, rebuttal_text

Access Methods

Python Connection Example

SQL Alchemy Example

Example Queries

Find Papers with Reviews

Citation Network Analysis

Full-Text Search

Author Collaboration Network

Dataset Statistics

Papers

40000+

Authors

Placeholder

Citations

Placeholder

Reviews

Placeholder

Export & Migration

Need to export data to other formats? See our format conversion guides: