Preface

This book introduces the R/Bioconductor packages, Rcwl and RcwlPipelines, to improve the way of building, managing and running Bioinformatics tools and pipelines within R.

The Rcwl package is built on top of the Common Workflow Language (CWL), and provides a simple and user-friendly way to wrap command line tools into data analysis pipelines in R. The RcwlPipelines package manages a collection of bioinformatics tools and pipelines based on Rcwl.

0.1 R package installation

The Rcwl and RcwlPipelines packages can be installed from Bioconductor or Github:

BiocManager::install(c("Rcwl", "RcwlPipelines"))
# or the development version
BiocManager::install(c("rworkflow/Rcwl", "rworkflow/RcwlPipelines"))

To load the packages into R session:

library(Rcwl)
library(RcwlPipelines)

0.2 System requirements

In addition to the R packages, the following tools are also required to successfully run the tools/pipelines. If not locally available, these tools will be installed automatically, powered by the basilisk package.

  • python (>= 2.7)
  • cwltool (>= 1.0.2018)
  • nodejs

The cwltool is the reference implementation of the Common Workflow Language, which is used to run the CWL scripts. The nodejs is required when the CWL scripts use JavaScript. More details about these tools can be found here: * https://github.com/common-workflow-language/cwltool * https://nodejs.org

0.3 Docker

The Docker container simplifies software installation and management, especially for bioinformatics tools/pipelines requiring different runtime environments and library dependencies. A CWL runner can perform this work automatically by pulling the Docker containers and mounting the paths of input files.

The Docker requirement is optional, as CWL scripts can also be run locally with all the dependencies pre-installed.

0.4 Structure of the book

  • Introduction
  • Get started
  • Wrap command line tools
  • Writing Pipeline
  • Tool/pipeline execution
  • RcwlPipelines
  • DNAseq alignment
  • DNAseq variant calling
  • Bulk RNAseq
  • Single cell RNAseq
  • miRNA

0.5 R session information

The R session information for compiling this mannual is shown below:

sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS/LAPACK: /Users/qi31566/miniconda3/envs/r-base/lib/libopenblasp-r0.3.12.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] bookdown_0.21               DropletUtils_1.10.3        
##  [3] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
##  [5] Biobase_2.50.0              GenomicRanges_1.42.0       
##  [7] GenomeInfoDb_1.26.2         IRanges_2.24.1             
##  [9] MatrixGenerics_1.2.1        matrixStats_0.58.0         
## [11] BiocStyle_2.18.1            BiocParallel_1.24.1        
## [13] RcwlPipelines_1.7.7         BiocFileCache_1.14.0       
## [15] dbplyr_2.1.0                Rcwl_1.7.12                
## [17] S4Vectors_0.28.1            BiocGenerics_0.36.0        
## [19] yaml_2.2.1                 
## 
## loaded via a namespace (and not attached):
##   [1] ellipsis_0.3.1            rprojroot_2.0.2          
##   [3] scuttle_1.0.4             XVector_0.30.0           
##   [5] fs_1.5.0                  rstudioapi_0.13          
##   [7] remotes_2.2.0             bit64_4.0.5              
##   [9] fansi_0.4.2               sparseMatrixStats_1.2.1  
##  [11] codetools_0.2-18          R.methodsS3_1.8.1        
##  [13] cachem_1.0.4              knitr_1.31               
##  [15] pkgload_1.2.0             jsonlite_1.7.2           
##  [17] R.oo_1.24.0               HDF5Array_1.18.1         
##  [19] shiny_1.6.0               DiagrammeR_1.0.6.1       
##  [21] BiocManager_1.30.10       compiler_4.0.3           
##  [23] httr_1.4.2                dqrng_0.2.1              
##  [25] basilisk_1.2.1            backports_1.2.1          
##  [27] assertthat_0.2.1          Matrix_1.3-2             
##  [29] fastmap_1.1.0             limma_3.46.0             
##  [31] cli_2.3.1                 later_1.1.0.1            
##  [33] visNetwork_2.0.9          htmltools_0.5.1.1        
##  [35] prettyunits_1.1.1         tools_4.0.3              
##  [37] igraph_1.2.6              glue_1.4.2               
##  [39] GenomeInfoDbData_1.2.4    dplyr_1.0.4              
##  [41] batchtools_0.9.15         rappdirs_0.3.3           
##  [43] tinytex_0.29              Rcpp_1.0.6               
##  [45] jquerylib_0.1.3           rhdf5filters_1.2.0       
##  [47] vctrs_0.3.6               DelayedMatrixStats_1.12.3
##  [49] xfun_0.21                 stringr_1.4.0            
##  [51] ps_1.5.0                  beachmat_2.6.4           
##  [53] testthat_3.0.2            mime_0.10                
##  [55] lifecycle_1.0.0           devtools_2.3.2           
##  [57] edgeR_3.32.1              zlibbioc_1.36.0          
##  [59] basilisk.utils_1.2.2      hms_1.0.0                
##  [61] promises_1.2.0.1          rhdf5_2.34.0             
##  [63] RColorBrewer_1.1-2        curl_4.3                 
##  [65] memoise_2.0.0             reticulate_1.18          
##  [67] sass_0.3.1                stringi_1.5.3            
##  [69] RSQLite_2.2.3             desc_1.2.0               
##  [71] checkmate_2.0.0           filelock_1.0.2           
##  [73] pkgbuild_1.2.0            rlang_0.4.10             
##  [75] pkgconfig_2.0.3           bitops_1.0-6             
##  [77] evaluate_0.14             lattice_0.20-41          
##  [79] Rhdf5lib_1.12.1           purrr_0.3.4              
##  [81] htmlwidgets_1.5.3         bit_4.0.4                
##  [83] processx_3.4.5            tidyselect_1.1.0         
##  [85] magrittr_2.0.1            R6_2.5.0                 
##  [87] generics_0.1.0            base64url_1.4            
##  [89] DelayedArray_0.16.1       DBI_1.1.1                
##  [91] pillar_1.5.0              withr_2.4.1              
##  [93] RCurl_1.98-1.2            tibble_3.0.6             
##  [95] crayon_1.4.1              utf8_1.1.4               
##  [97] rmarkdown_2.7             progress_1.2.2           
##  [99] usethis_2.0.1             locfit_1.5-9.4           
## [101] grid_4.0.3                data.table_1.14.0        
## [103] blob_1.2.1                callr_3.5.1              
## [105] git2r_0.28.0              digest_0.6.27            
## [107] xtable_1.8-4              tidyr_1.1.2              
## [109] httpuv_1.5.5              brew_1.0-6               
## [111] R.utils_2.10.1            bslib_0.2.4              
## [113] sessioninfo_1.1.1