Edit Nodes & Edges • ResearchArcade Tutorials

Overview

This tutorial demonstrates how to update existing nodes and edges in your ResearchArcade graph database. You'll learn how to modify properties, update metadata, handle versioning, and maintain data integrity when making changes to your research entities and relationships.

Prerequisites

ResearchArcade installed and configured (see Getting Started)
Existing nodes and edges in the database
Familiarity with reading nodes (see Read Nodes & Edges)

Editing Nodes

Select a node type below to see the code example for updating that entity:

Select Node Type:

Select a node type from the dropdown above to view the code example.

Updating a ArXiv Papers Node

Example code for updating an existing arxiv papers entry:

# Update metadata for a paper
updated_paper = {
    'arxiv_id': '1706.03762v7',
    'metadata': {
        'venue': 'NeurIPS 2017',
        'pdf_url': 'https://arxiv.org/pdf/1706.03762.pdf',
        'citations': 75000,
        'influential': True
    }
}

research_arcade.update_node("arxiv_papers", node_features=updated_paper)
print("Paper updated successfully!")

Updating a ArXiv Authors Node

Example code for updating an existing arxiv authors entry:

updated_author = {
    'semantic_scholar_id': 'ss_ashish_vaswani',
    'homepage': 'https://ashishvaswani.com'
}

research_arcade.update_node("arxiv_authors", node_features=updated_author)
print("Author updated successfully!")

No example code available for this operation in the tutorial.

Example code for updating an existing arxiv sections entry:

updated_section = {
                'id': 544409,
                'title': 'Updated Section Title',
                'content': 'Updated content'
            }
            research_arcade.update_node("arxiv_sections", node_features=updated_section)

Example code for updating an existing arxiv paragraph entry:

updated_paragraph = {
                'id': 27428268,
                'content': 'Updated paragraph content'
            }
            research_arcade.update_node("arxiv_paragraphs", node_features=updated_paragraph)

Example code for updating an existing arxiv figure entry:

updated_figure = {
                'id': 836693,
                'paper_arxiv_id': 1453.1644,
                'path': 'path',
                'caption': 'Updated figure caption'
            }
            research_arcade.update_node("arxiv_figures", node_features=updated_figure)

Example code for updating an existing arxiv table entry:

updated_table = {
                'id': 299024,
                'paper_arxiv_id': '1706.03762v7',
                'caption': 'Performance comparison',
                'label': 'label',
                'table_text': 'Table content here'
            }
            research_arcade.update_node("arxiv_tables", node_features=updated_table)

Updating a OpenReview Authors Node

Example code for updating an existing openreview authors entry:

new_author = {'venue': 'ICLR.cc/2025/Conference', 
              'author_openreview_id': '~ishmam_zabir1', 
              'author_full_name': 'ishmam zabir', 
              'email': '****@microsoft.com', 
              'affiliation': 'Microsoft', 
              'homepage': 'https://scholar.google.com/citations?user=X7bjzrUAAAAJ&hl=en&oi=ao', 
              'dblp': ''}

research_arcade.update_node("openreview_authors", node_features=new_author)
author_id = {"author_openreview_id": "~ishmam_zabir1"}

Updating a OpenReview Papers Node

Example code for updating an existing openreview papers entry:

new_paper_features = {'venue': 'ICLR.cc/2025/Conference', 
                  'paper_openreview_id': 'zGej22CBnS', 
                  'title': 'Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles', 
                  'abstract': "Tokenization is associated with many poorly understood shortcomings in language models (LMs), yet remains an important component for long sequence scaling purposes. This work studies  how tokenization impacts  model performance by analyzing and comparing the stochastic behavior of tokenized models with their byte-level, or token-free, counterparts. We discover that, even when the two models are statistically equivalent, their predictive distributions over the next byte can be substantially different, a phenomenon we term as ``tokenization bias''. To fully characterize this phenomenon, we  introduce the Byte-Token Representation Lemma, a framework that establishes a mapping between the learned token distribution and its equivalent byte-level distribution.  From this result, we develop a next-byte sampling algorithm  that eliminates tokenization bias without requiring further training or optimization. In other words, this enables zero-shot conversion of tokenized LMs into statistically equivalent token-free ones. We demonstrate its broad applicability with two use cases: fill-in-the-middle (FIM) tasks and model ensembles. In FIM tasks where input prompts may terminate mid-token, leading to out-of-distribution tokenization, our method mitigates performance degradation and achieves 18\\% improvement in FIM coding benchmarks, while consistently outperforming the standard token healing fix. For model ensembles where each model employs a distinct vocabulary, our approach enables seamless integration, resulting in improved performance up to 3.7\\% over individual models across various standard baselines in reasoning, knowledge, and coding. Code is available at:https: //github.com/facebookresearch/Exact-Byte-Level-Probabilities-from-Tokenized-LMs.", 
                  'paper_decision': 'ICLR 2025 Poster', 
                  'paper_pdf_link': '/pdf/cdd2212a20c4034029874cba11a05e081bfdb83e.pdf'}
research_arcade.update_node("openreview_papers", node_features=new_paper_features)

Updating a OpenReview Reviews Node

Example code for updating an existing openreview reviews entry:

new_review_features = {'venue': 'ICLR.cc/2025/Conference', 
                   'review_openreview_id': 'DHwZxFryth', 
                   'replyto_openreview_id': 'Yqbllggrmw', 
                   'writer': 'Authors', 
                   'title': 'Response by Authors', 
                   'content': {'Title': 'Response to Reviewer 7i95 (1/2)', 'Comment': '> The method does not improve much in the AlpacaEval 2.0 Score. The author should give a detailed explanation. And why not use metrics like length-controlled win rate?**Response:** Thank you for your careful observation and question. We would like to clarify that we are already using the length-controlled (LC) AlpacaEval 2.0 win-rate metric in our evaluations. We will make this clearer in the table header of Table 3.Regarding the fact that the AlpacaEval 2.0 scores on LLama-3 (8B) do not improve compared to the baselines, we believe this is because our base model, the instruction-finetuned LLama-3 (8B), is already trained to perform exceptionally well in terms of helpfulness, which is the focus of the AlpacaEval benchmark. Additionally, the preference dataset we used, UltraFeedback, may not provide significant further enhancement in the helpfulness aspect. This is supported by the slight decrease observed in the AlpacaEval score for the standard DPO baseline as well (see Table 3, results on LLama-3). Therefore, we think these AlpacaEval 2.0 results on LLama-3 (8B) may not indicate that SAIL is ineffective; it may be simply caused by an ill-suited combination of base model, finetuning dataset, and evaluation benchmark.We also further conducted experiments on the Zephyr (7B) model as the backbone, whose AlpacaEval 2.0 win-rate is lower. We still train on the UltraFeedback preference dataset and the other experiment setups are unchanged. In this experiment, we see a larger improvement of the SAIL method compared to the standard DPO baseline (Zephyr-7B-Beta).|             | AlpacaEval 2.0 (LC) Win-Rate ||--------------------|------------------------------|| Base (Zephyr-7B-SFT-Full) | 6.4 %                        || DPO (Zephyr-7B-Beta)   | 13.2 %                       || SAIL-PP  | 15.9 %                       |> Authors should compare more advanced preference optimization algorithms like ORPO and SimPO. And current results are not impressive for the alignment community.**Response:** Thank you for raising this insightful point. We see ORPO and SimPO are two recent work which propose a different objective than the standard RLHF, and achieve remarkable improvements in terms of alignment performance and efficiency.Our work focus more on bringing standard RLHF to a bilevel optimization framework and propose an effective and efficient approximate algorithm on top of it. We can see some new preference optimization methods including ORPO and SimPO have one fundamental difference from our approach: they do not explicitly incorporate the KL regularization term. The absence of the KL regularization term allows these methods to optimize more aggressively for the reward function by deviating significantly from the reference model. In contrast, our approach is specifically grounded in the standard RLHF, where the KL regularization term ensures that the model remains aligned with the reference distribution while optimizing for the reward function. This distinction makes direct comparisons with ORPO or SimPO less meaningful theoretically, as those methods omit the KL regularization and adopt a fundamentally different optimization objective design.However, we think our work, although developed adhering to the standard RLHF setup, can be compatible and combined with some recent advanced preference optimization algorithms, despite their differences in optimization setups and objectives. This is because we can reformulate their alignment problem as bilevel optimization, and go through the derivation as done in the paper. Taking SimPO as an example, we can treat their reward model definition (Equation (4) in the SimPO paper) as the solution of the upper level optimization (replacing Equation (4) in our manuscript), and adopt their modified Bradley-Terry objective with reward margin (Equation (5) in the SimPO paper) to replace the standard one (Equation (10) in our manuscript). By applying these changes and rederiving the extra gradient terms, we can formulate an adaptation of our method to the SimPO objective. We will implement this combined algorithm, which adapt our methodology to the SimPO objective, and compare with the SimPO as a baseline.Recently many different alignment objectives and algorithms have emerged; it is an interesting question to discuss the compatibility and combination of our method with each objective. We will add more relevant discussions to the appendices, but due to the fact that the compatibility problem with each design is a non-trivial question, this process may incur considerably more work, and we hope the reviewer understands that this effort cannot be fully reflected by the rebuttal period. But we will continue to expand the discussion as the wide compatibility to other designs also strengthens our contribution to the community. We thank the reviewer for raising this insightful point.'}, 
                   'time': '2024-11-26 15:27:26'
}
research_arcade.update_node("openreview_reviews", node_features=new_review_features)

Updating a OpenReview Revisions Node

Example code for updating an existing openreview revisions entry:

if "2023" in venue or "2022" in venue or "2021" in venue or "2020" in venue or "2019" in venue or "2018" in venue or "2017" in venue or "2014" in venue or "2013" in venue:
    if "2023" in venue or "2022" in venue or "2021" in venue or "2020" in venue or "2019" in venue or "2018" in venue:
        submissions = client_v1.get_all_notes(invitation=f'{venue}/-/Blind_Submission', details='revisions')
    elif "2017" in venue or "2014" in venue or "2013" in venue:
        submissions = client_v1.get_all_notes(invitation=f'{venue}/-/submission', details='revisions')
        
    if submissions is None:
        print(f"No submissions found for venue: {venue}")
    else:
        for submission in tqdm(submissions[start_idx:end_idx]):
            # get paper openreview id
            paper_id = submission.id
            if "pdf" in submission.content:
                pdf_link = submission.content["pdf"]
                pdf_path = str(pdf_dir)+str(paper_id)+".pdf"
                if os.path.isfile(pdf_path):
                    continue
                else:
                    get_paper_pdf(pdf_link, pdf_path, log_file)
            
            revisions = client_v1.get_references(referent=paper_id, original=True)
            time.sleep(1)
            
            pdf_revisions_ids = []
            for revision in revisions:
                if "pdf" in revision.content:
                    pdf_revisions_ids.append(revision.id)
            
            if len(pdf_revisions_ids) <= 1:
                continue
            else:
                for pdf_revision_id in pdf_revisions_ids:
                    pdf_path = str(pdf_dir)+str(pdf_revision_id)+".pdf"
                    if os.path.isfile(pdf_path):
                        continue
                    else:
                        get_revision_pdf(venue, pdf_revision_id, pdf_path, log_file)
                        time.sleep(1)
else:
    submissions = client_v2.get_all_notes(invitation=f'{venue}/-/Submission', details='revisions')
    if submissions is None:
        print(f"No submissions found for venue: {venue}")
    else:
        for submission in tqdm(submissions[start_idx:end_idx]):
            decision = submission.content["venueid"]["value"].split('/')[-1]
            if decision == "Withdrawn_Submission":
                continue
            else:
                # get paper openreview id
                paper_id = submission.id
                if "pdf" in submission.content:
                    pdf_link = submission.content["pdf"]["value"]
                    pdf_path = str(pdf_dir)+str(paper_id)+".pdf"
                    if os.path.isfile(pdf_path):
                        continue
                    else:
                        get_paper_pdf(pdf_link, pdf_path, log_file)
                        
                revisions = client_v2.get_note_edits(note_id=paper_id)
                if len(revisions) <= 1:
                    continue
                else:
                    for revision in revisions:
                        pdf_revision_id = revision.id
                        pdf_path = str(pdf_dir)+str(pdf_revision_id)+".pdf"
                        if os.path.isfile(pdf_path):
                            continue
                        else:
                            time.sleep(1)
                            get_revision_pdf(venue, pdf_revision_id, pdf_path, log_file)
                            time.sleep(1)

Updating a OpenReview Paragraphs Node

Example code for updating an existing openreview paragraphs entry:

if "2023" in venue or "2022" in venue or "2021" in venue or "2020" in venue or "2019" in venue or "2018" in venue or "2017" in venue or "2014" in venue or "2013" in venue:
    if "2023" in venue or "2022" in venue or "2021" in venue or "2020" in venue or "2019" in venue or "2018" in venue:
        submissions = client_v1.get_all_notes(invitation=f'{venue}/-/Blind_Submission', details='revisions')
    elif "2017" in venue or "2014" in venue or "2013" in venue:
        submissions = client_v1.get_all_notes(invitation=f'{venue}/-/submission', details='revisions')
        
    if submissions is None:
        print(f"No submissions found for venue: {venue}")
    else:
        for submission in tqdm(submissions[start_idx:end_idx]):
            # get paper openreview id
            paper_id = submission.id
            if "pdf" in submission.content:
                pdf_link = submission.content["pdf"]
                pdf_path = str(pdf_dir)+str(paper_id)+".pdf"
                if os.path.isfile(pdf_path):
                    continue
                else:
                    get_paper_pdf(pdf_link, pdf_path, log_file)
            
            revisions = client_v1.get_references(referent=paper_id, original=True)
            time.sleep(1)
            
            pdf_revisions_ids = []
            for revision in revisions:
                if "pdf" in revision.content:
                    pdf_revisions_ids.append(revision.id)
            
            if len(pdf_revisions_ids) <= 1:
                continue
            else:
                for pdf_revision_id in pdf_revisions_ids:
                    pdf_path = str(pdf_dir)+str(pdf_revision_id)+".pdf"
                    if os.path.isfile(pdf_path):
                        continue
                    else:
                        get_revision_pdf(venue, pdf_revision_id, pdf_path, log_file)
                        time.sleep(1)
else:
    submissions = client_v2.get_all_notes(invitation=f'{venue}/-/Submission', details='revisions')
    if submissions is None:
        print(f"No submissions found for venue: {venue}")
    else:
        for submission in tqdm(submissions[start_idx:end_idx]):
            decision = submission.content["venueid"]["value"].split('/')[-1]
            if decision == "Withdrawn_Submission":
                continue
            else:
                # get paper openreview id
                paper_id = submission.id
                if "pdf" in submission.content:
                    pdf_link = submission.content["pdf"]["value"]
                    pdf_path = str(pdf_dir)+str(paper_id)+".pdf"
                    if os.path.isfile(pdf_path):
                        continue
                    else:
                        get_paper_pdf(pdf_link, pdf_path, log_file)
                        
                revisions = client_v2.get_note_edits(note_id=paper_id)
                if len(revisions) <= 1:
                    continue
                else:
                    for revision in revisions:
                        pdf_revision_id = revision.id
                        pdf_path = str(pdf_dir)+str(pdf_revision_id)+".pdf"
                        if os.path.isfile(pdf_path):
                            continue
                        else:
                            time.sleep(1)
                            get_revision_pdf(venue, pdf_revision_id, pdf_path, log_file)
                            time.sleep(1)

No example code available for this operation in the tutorial.

Editing Edges

Select an edge type below to see the code example for updating that relationship:

Select Edge Type:

Select an edge type from the dropdown above to view the code example.

No example code available for this operation in the tutorial.

Updating arxiv_paragraph_citations Edges

Code example placeholder - Add your Python/API code here for updating arxiv_paragraph_citations edges

No example code available for this operation in the tutorial.

Next Steps

Now that you can update your data, explore these related tutorials:

Delete Nodes & Edges

Remove entities and relationships safely.

Import from CSV

Load bulk data into your database.