Import from CSV • ResearchArcade Tutorials

Overview

This tutorial covers importing data from CSV (Comma-Separated Values) files into your ResearchArcade graph database. You'll learn how to prepare CSV files, map columns to graph entities, handle data validation, manage relationships, and implement error handling for robust bulk data imports.

CSV File Structure

Basic CSV Format

Understand the expected structure for node and edge CSV files:

# Example CSV structure for nodes (papers.csv)
paper_id,title,authors,year,doi
1,"Deep Learning Fundamentals","Smith, J.; Jones, A.",2023,10.1234/example.001
2,"Graph Neural Networks","Brown, K.; Davis, L.",2024,10.1234/example.002

# Example CSV structure for edges (citations.csv)
source_id,target_id,citation_type,context
1,2,direct,"Building on the work of Smith et al."
2,1,reference,"As discussed in previous research"

Encoding and Delimiters

Handle different file encodings and delimiter types:

# Code example placeholder
# Add your Python/API code here for handling encodings and delimiters

Header Requirements

Configure whether your CSV includes headers and how to map them:

# Code example placeholder
# Add your Python/API code here for header configuration

Importing Nodes from CSV

Basic Node Import

Load nodes from a CSV file into your database:

# Code example placeholder
# Add your Python/API code here for basic node import

Mapping CSV Columns to Node Properties

Define how CSV columns correspond to node attributes:

# Code example placeholder
# Add your Python/API code here for column mapping

Handling Multiple Node Types

Import different types of entities from separate CSV files:

# Code example placeholder
# Add your Python/API code here for multiple node types

Importing Edges from CSV

Basic Edge Import

Create relationships between nodes using CSV data:

# Code example placeholder
# Add your Python/API code here for basic edge import

Referencing Nodes by ID

Link edges to existing nodes using identifiers:

# Code example placeholder
# Add your Python/API code here for node referencing

Edge Properties and Attributes

Include additional data in relationship imports:

# Code example placeholder
# Add your Python/API code here for edge properties

Data Validation

Schema Validation

Ensure CSV data matches expected schema before import:

# Code example placeholder
# Add your Python/API code here for schema validation

Data Type Checking

Validate that column values match expected data types:

# Code example placeholder
# Add your Python/API code here for type checking

Required Fields Validation

Check for missing or null values in mandatory columns:

# Code example placeholder
# Add your Python/API code here for required fields validation

Error Handling

Handling Import Errors

Manage errors during the import process gracefully:

# Code example placeholder
# Add your Python/API code here for error handling

Partial Import Recovery

Continue importing valid rows when errors occur:

# Code example placeholder
# Add your Python/API code here for partial recovery

Error Logging and Reporting

Track and report problematic rows for review:

# Code example placeholder
# Add your Python/API code here for error logging

Performance Optimization

Batch Processing

Import data in batches for improved performance:

# Code example placeholder
# Add your Python/API code here for batch processing

Parallel Import

Use multiple threads or processes for faster imports:

# Code example placeholder
# Add your Python/API code here for parallel import

Memory Management

Handle large CSV files efficiently without memory issues:

# Code example placeholder
# Add your Python/API code here for memory management

Duplicate Handling

Detecting Duplicates

Identify duplicate entries during import:

# Code example placeholder
# Add your Python/API code here for duplicate detection

Merge or Skip Strategy

Choose how to handle duplicate records:

# Code example placeholder
# Add your Python/API code here for duplicate strategies

Update Existing Nodes

Update existing entities instead of creating duplicates:

# Code example placeholder
# Add your Python/API code here for updating nodes

Data Transformation

Preprocessing CSV Data

Clean and transform data before import:

# Code example placeholder
# Add your Python/API code here for preprocessing

Splitting Delimited Fields

Parse multi-value fields like author lists:

# Code example placeholder
# Add your Python/API code here for field splitting

Custom Data Transformations

Apply custom logic during import:

# Code example placeholder
# Add your Python/API code here for custom transformations

Best Practices

Always validate CSV structure and data types before importing
Use unique identifiers for nodes to prevent duplicates
Process large files in batches to avoid memory issues
Implement comprehensive error logging for debugging
Test imports on a small subset of data first
Use transactions to ensure data consistency
Create backups before large-scale imports
Document your CSV schema and mapping rules
Handle missing values appropriately (null vs empty string)
Use UTF-8 encoding to support international characters

Common Import Scenarios

Importing Research Papers

# Code example placeholder
# Import papers with metadata from CSV

Building Citation Networks

# Code example placeholder
# Create citation relationships from CSV

Loading Author Collaborations

# Code example placeholder
# Import author collaboration data

Troubleshooting

Common Import Errors

Solutions for frequent CSV import issues:

Encoding errors: Ensure UTF-8 encoding or specify correct encoding
Column mismatch: Verify CSV headers match expected schema
Memory errors: Use batch processing for large files
Missing references: Import nodes before edges that reference them

Validation Failures

# Code example placeholder
# Handle and debug validation errors

Next Steps

Continue learning about data import with other formats:

Import from JSON

Load structured data from JSON files.

Import from API

Fetch and import data from external APIs.