Additional scriptsΒΆ

cnn_updater.py

Update .cnn, .cnr and .cns files previously generated by earlier versions of CNVkit to add a “depth” column used in CNVkit version 0.8.0 and later. The script reads each input file, calculates absolute-scale depth from the file’s existing “log2” column value in each row, and creates a corresponding output file with a modified name – the input files are not modified in-place.

Running this script is not necesssary for new analyses, but may help ease the transition for analyses that have already begun.

coverage_bin_size.py

Quickly estimate coverage depths and recommend average bin sizes given a sample BAM file. Reported coverage depths are relevant to the sequencing protocol used, i.e. WGS, hybrid capture or target amplicon sequencing:

coverage_bin_size.py -m wgs Sample.bam
guess_baits.py

Use the read depths in one or more given BAM files to infer which regions were targeted in a hybrid capture or targeted amplicon capture sequencing protocol. This script can be used in case the original BED file of targeted intervals is unavailable. (However, CNVkit will give much better results if the true targeted intervals can be provided.) It works in 2 modes, guided and unguided:

  • Guided: Given candidate targets, such as all known exons in the reference genome, test the mean coverage depth in each candidate target and drop those that did not receive sufficient coverage, presumed to be those exons or genes that were not targeted by the sequencing library.

    guess_baits.py Sample1.bam Sample2.bam -t ucsc-exons.bed -o baits.bed
    
  • Unguided: Scan every base in the sample BAM(s), inferring likely boundaries for enriched regions. (This is usually much slower then the guided approach.)

    guess_baits.py -g access.hg19.bed Sample1.bam Sample2.bam -o baits.bed
    
reference2targets.py

Extract target and antitarget BED files from a CNVkit reference file. While the batch command does this step automatically when an existing reference is provided, you may find this standalone script useful to recover the target and antitarget BED files that match the reference if those BED files are missing or you’re not sure which ones are correct.

Alternatively, once you have a stable CNVkit reference for your platform, you can use this script to drop the “bad” bins from your target and antitarget BED files (and subsequently built references) to avoid unnecessarily calculating coverage in those bins during future runs.

refFlat2bed.py
Generate a BED file of the genes or exons in the reference genome given in UCSC refFlat.txt format. (Download the input file from UCSC Genome Bioinformatics).