Calculate the sequence-accessible coordinates in chromosomes from the given reference genome, treating long spans of ‘N’ characters as the inaccessible regions.
CNVkit will compute “antitarget” bins only within the accessible genomic regions specified in the “access” file produced by this script. If there are many small excluded/inaccessible regions in the genome, then small, less-reliable antitarget bins would be squeezed into the remaining accessible regions. The
-soption tells the script to ignore short regions that would otherwise be excluded as inaccessible, allowing larger antitarget bins to overlap them.
Additional regions to exclude can also be given with the
-xoption. This option can be used more than once to exclude several BED files listing different sets of regions. For example, “excludable” regions of poor mappability have been precalculated by others and are available from the UCSC FTP Server (see here for hg19).
Generate a BED file of the genes or exons in the reference genome given in UCSC refFlat.txt format. (Download the input file from UCSC Genome Bioinformatics).
This script can be used in case the original BED file of targeted intervals is unavailable. Subsequent steps of the pipeline will remove probes that did not receive sufficient coverage, including those exons or genes that were not targeted by the sequencing library. However, CNVkit will give much better results if the true targeted intervals can be provided.