CNT: Semi-Automatic Translation from CWL to Nextflow for Genomic Workflows

Abstract

With the rise of advanced workflow languages for scientific computations, Nextflow has gained increased attention from the bioinformatics community. Nextflow offers native support for advanced parallelism, which can greatly enhance resource utilization and throughput. Still, a significant portion of bioinformatics workflows are developed with the Common Workflow Language (CWL). Transitioning from CWL to Nextflow poses a significant challenge due to the differences in programming models, scripting language compatibilities, and the prerequisite for in-depth knowledge in both languages. To address this challenge, we present CNT, a novel, semiautomated translator converting CWL workflows into Nextflow ones. At its core, CNT uses an automated translation mechanism that converts the CommandLineTool, the most basic unit of CWL, into Nextflow’s Process class. This component integrates tool-level conversion, graph dependency analysis, and correctness checks to provide highly automated translation coverage, significantly reducing the development time while satisfying language-specific requirements like building a proper dataflow model when creating workflows. Furthermore, CNT incorporates a module for aiding manual translation. Specifically, it can identify three common JavaScript patterns in CWL workflows, offering further guidance for developers during the translation phase. We evaluated CNT with production-grade workflows and found that it can cover up to 81% of the original workflows, substantially reducing development time. Additionally, transitioning from a cwltool-based system to Nextflow with CNT can result in a 72% speedup and 85% increased CPU utilization.

Publication
The 23rd IEEE International Conference on BioInformatics and BioEngineering (BIBE), 2023