Overview
ResearchArcade manages two distinct but interconnected datasets: ArXiv and OpenReview. To maintain clarity and enable efficient querying, these datasets are organized using separate prefix in SQL databases or separate directories in file-based storage. This page explains the organizational structure, naming conventions, and how data from both sources can be linked together.
Key Concept:
Separate prefixes preserve data provenance while enabling cross-source queries and analysis.
Data Organization
ResearchArcade uses two parallel prefixes to organize data from different sources while maintaining a consistent data model.
ArXiv Data
Contains papers from ArXiv.org with full-text parsing, sections, paragraphs, figures, tables, and citation data.
- SQL:
arxiv data
- CSV:
arxiv/ directory
- Focus: Content structure and citations
OpenReview Data
Contains papers from OpenReview with peer reviews, decisions, rebuttals, venues, and submission metadata.
- SQL:
openreview data
- CSV:
openreview/ directory
- Focus: Peer review process and decisions
SQL Data Structure
In SQL databases (PostgreSQL), data is organized into two schemas with identical table structures but different content sources.
ArXiv Tables
arxiv.papers
arxiv.authors
arxiv.sections
arxiv.paragraphs
arxiv.figures
arxiv.tables
arxiv.authorship
arxiv.citations
arxiv.references
arxiv.revisions
OpenReview Tables
openreview.papers
openreview.authors
openreview.reviews
openreview.decisions
openreview.rebuttals
openreview.venues
openreview.revisions
openreview.authorship
openreview.submitted_to
CSV Directory Structure
In file-based storage, data is organized with separate prefixes into the same directory.
Directory Layout
data/
└── dataset_name/
├── arxiv_papers.csv
├── arxiv_authors.csv
├── arxiv_sections.csv
├── arxiv_paragraphs.csv
├── arxiv_figures.csv
├── arxiv_tables.csv
├── arxiv_citations.csv
├── arxiv_categories.csv
├── arxiv_paper_authors.csv
├── arxiv_paper_figures.csv
├── arxiv_paper_tables.csv
├── arxiv_paper_categories.csv
├── arxiv_paragraph_references.csv
├── openreview_papers.csv
├── openreview_paragraphs.csv
├── openreview_authors.csv
├── openreview_reviews.csv
├── openreview_decisions.csv
├── openreview_arxiv.csv
├── openreview_revision.csv
├── openreview_paper_author.csv
├── openreview_paper_reviews.csv
└── openreview_revision_reviews.csv