Getting Started with R / Bioconductor in AnVIL
This guide helps R / Bioconductor users:
- Establish and familiarize themselves with essential Terra account and workspace concepts.
- Use RStudio and Jupyter Notebooks for interactive analysis.
- Execute workflows for large-scale, including use of R / Bioconductor in the workflow, and management of workflows from within R.
The guide indicates how to discover R / Bioconductor workspaces, and how the R / Bioconductor community can contribute to AnVIL and cloud-based computation.
Getting Started with AnVIL - Provides essential information for setting up a Terra account, billing and cost management, use of Terra workspaces, finding and accessing (public as well as protected) consortium-scale data, and running workflows and interactive analyses.
R / Bioconductor with RStudio or Jupyter
The RStudio Runtime - RStudio provides a familiar environment for using R / Bioconductor, with the advantage that RStudio is running on cloud-based resources that allow fast, secure, authenticated data access and easily scalable compute resources.
Access R / Bioconductor through Jupyter notebooks - Jupyter notebooks running an R 'kernel' provide a good way to step collaborators or trainees through an analysis.
Terra / AnVIL concepts for R / Bioconductor users
Where Is My Computer?
The AnVIL runtime provides the physical machinery for computation (e.g., a 4 core CPU with 16 GB of memory) as well as local 'persistent disk' storage. Unlike a traditional computer, the compute and storage components are separate from one another. For instance, storage created with one runtime can be used with another runtime. A runtime and persistent disk belong to a single user, and can be used across workspaces.
Where Is My Data?
Local persistent disks, DATA, and workspace buckets - A persistent disk contains data, scripts, packages, and output created by the user in the course of an analysis. Workspaces bring additional data.
Tabular summaries of workspace data, e.g., descriptions of participants in the study the workspace encapsulates, are presented under the DATA element, while larger data produced during an analysis may be associated with the workspace 'bucket'.
The AnVIL package R / Bioconductor package provides a familiar interface for accessing these resources.
Techniques for effective use of R / Bioconductor
- Fast package installation - Cloud-based R / Bioconductor provides three major advantages during package installation: a pre-configured system supporting most CRAN and Bioconductor packages; fast retrieval of packages from cloud-based repositories; and very fast installation of 'binary' packages that do not require source code compilation. Use the AnVIL::install() function to gain all three benefits.
- Tools for assessing cost - The AnVILBilling package provides R / Bioconductor tools for exploring the cost of AnVIL-based computation. This complements other AnVIL facilities for assessing cost.
- Using best practices for sharing reproducible AnVIL resources - Start by encapsulating your contribution in an R package, with fully documented functions and vignettes for describing use. Manage the source code of your package in Git or other version control system. Then use AnVILPublish to make your package content available as an AnVIL workspace for sharing and cloud-based computation.
- Workflow Inputs, Execution, and Outputs - The AnVIL package provides commands that make working with workflows, especially workflow inputs and outputs, easy for R / Bioconductor users.
Terra / AnVIL R / Bioconductor Popup Workshops
The following Terra / AnVIL R / Bioconductor Popup Workshops were held in 2021 from April to June.
- Week 1: Using R / Bioconductor in AnVIL with Martin Morgan
- Week 2: The R / Bioconductor AnVIL Package with Martin Morgan and Nitesh Turaga
- Week 3: Running a Workflow with Martin Morgan and Kayla Interdonato
- Week 4: Single-cell RNASeq with 'Orchestrating Single Cell Analysis' in R / Bioconductor with Vince Carey
- Week 5: Using AnVIL for Teaching R with Levi Waldron
- Week 6: Reproducible Research with AnVILPublish with Martin Morgan
- Week 7: Participant Stories
Introduction to the Terra AnVIL Cloud-based Genomics Platform by Sehyun Oh and Levi Waldron at Bioc2021
Terra in the Classroom documents the experience of running a small course utilizing AnVIL. Includes some set up, learned positives and negatives from February 2020.
- Orchestrating Single-Cell Analysis - use-strides/Bioconductor-Workshop-OSCA-3-12 demonstrates using the OSCA book.
- RNA-seq using DESeq2 - bioconductor-rpci-anvil/Bioconductor-Workflow-DESeq2 shows differential expression analysis of bulk RNA-seq using Bioconductor package DESeq2.
R / Bioconductor resources
- Participate in the R / Bioconductor Community - Ask general questions about using Bioconductor packages on the Bioconductor support forum. Sign up to participate in the Bioconductor community slack and join our #AnVIL channel for more in-depth conversations. Terra / AnVIL provides extensive support through the support feature of the Terra website.
- Learn more about Bioconductor - Visit bioconductor.org for available packages, learning materials, events, and getting involved with Bioconductor.