Sentieon | 应用教程:体细胞SNP/Indel变异检测
没有唯一分子标识符(UMIs)的数据处理
第一步:比对(Alignment)
# ******************************************# 1a. Mapping reads with BWA-MEM, sorting for the tumor sample# ******************************************
( sentieon bwa mem -R "@RG\tID:$tumor\tSM:$tumor\tPL:$platform" \
-t $nt -K 10000000 $fasta $tumor_fastq_1 $tumor_fastq_2 || \
echo -n 'error' ) | \
sentieon util sort -o tumor_sorted.bam -t $nt --sam2bam -i -
# ******************************************# 1b. Mapping reads with BWA-MEM, sorting for the normal sample# ******************************************
( sentieon bwa mem -R "@RG\tID:$normal\tSM:$normal\tPL:$platform" \
-t $nt -K 10000000 $fasta $normal_fastq_1 $normal_fastq_2 || \
echo -n 'error' ) | \
sentieon util sort -o normal_sorted.bam -t $nt --sam2bam -i -
-
第二步:PCR重复去除(对于定点扩增测序,可跳过此步骤)
# ******************************************# 2a. Remove duplicate reads from the tumor sample# ******************************************
sentieon driver -t $nt -i tumor_sorted.bam \
--algo LocusCollector \
--fun score_info \
tumor_score.txt
sentieon driver -t $nt -i tumor_sorted.bam \
--algo Dedup \
--score_info tumor_score.txt \
--metrics tumor_dedup_metrics.txt \
tumor_deduped.bam
# ******************************************# 2b. Remove duplicate reads from the normal sample# ******************************************
sentieon driver -t $nt -i normal_sorted.bam \
--algo LocusCollector \
--fun score_info \
normal_score.txt
sentieon driver -t $nt -i normal_sorted.bam \
--algo Dedup \
--score_info normal_score.txt \
--metrics normal_dedup_metrics.txt \
normal_deduped.bam
-
第三步:碱基质量分数校正(对于panel测序,可跳过此步骤)
# ******************************************# 3a. Base recalibration for the tumor sample# ******************************************
sentieon driver -r $fasta -t $nt -i tumor_deduped.bam --interval $BED \
--algo QualCal \
-k $dbsnp \
-k $known_Mills_indels \
-k $known_1000G_indels \
tumor_recal_data.table
# ******************************************# 3b. Base recalibration for the normal sample# ******************************************
sentieon driver -r $fasta -t $nt -i normal_deduped.bam --interval $BED \
--algo QualCal \
-k $dbsnp \
-k $known_Mills_indels \
-k $known_1000G_indels \
normal_recal_data.table
-
第四步:变异检测和过滤
样本类型1-全基因组/外显子测序
sentieon driver -r $fasta -t $nt -i tumor_deduped.bam -i normal_deduped.bam \
[--interval $BED] \
--algo TNhaplotyper2 \
--tumor_sample $TUMOR_SM \
--normal_sample $NORMAL_SM \
[--pon $PON] \
--germline_vcf $GERMLINE_VCF \
output-tnhap2-tmp.vcf.gz \
--algo OrientationBias \
--tumor_sample $TUMOR_SM \
output-orientation \
--algo ContaminationModel \
--tumor_sample $TUMOR_SM \
--normal_sample $NORMAL_SM \
--vcf $CONTAMINATION_VCF \
--tumor_segments output-contamination-segments \
output-contamination
sentieon driver -r $fasta \
--algo TNfilter \
-v output-tnhap2-tmp.vcf.gz \
--tumor_sample $TUMOR_SM \
--normal_sample $NORMAL_SM \
[--contamination output-contamination] \
[--tumor_segments output-contamination-segments] \
[--orientation_priors output-orientation] \
output-tnhap2.vcf.gz
样本类型2-靶向panel(杂交捕获)
sentieon driver -r $fasta -t $nt -i tumor_deduped.bam -i normal_deduped.bam \
--interval $BED \
--algo TNscope \
--tumor_sample $TUMOR_SM \
--normal_sample $NORMAL_SM \
--dbsnp $dbsnp \
--min_tumor_allele_frac 0.009 \
--max_fisher_pv_active 0.05 \
--max_normal_alt_cnt 10 \
--max_normal_alt_frac 0.01 \
--max_normal_alt_qsum 250 \
--sv_mask_ext 10 \
--prune_factor 0 \
--assemble_mode 2 \
[--pon panel_of_normal.vcf] \
output_tnscope.pre_filter.vcf.gz
sentieon pyexec tnscope_filter.py \
-v output_tnscope.pre_filter.vcf.gz \
--tumor_sample $TUMOR_SM \
-x tissue_panel --min_tumor_af 0.0095 --min_depth 100 \
output_tnscope.filter.vcf.gz
样本类型3-靶向panel(引物扩增)
sentieon driver -r $fasta -t $nt -i tumor_deduped.bam -i normal_deduped.bam \
--interval $BED --interval_padding 10 \
--algo TNscope \
--tumor_sample $TUMOR_SM \
--normal_sample $NORMAL_SM \
--dbsnp $dbsnp \
--min_tumor_allele_frac 0.009 \
--max_fisher_pv_active 0.05 \
--max_normal_alt_cnt 10 \
--max_normal_alt_frac 0.01 \
--max_normal_alt_qsum 250 \
--sv_mask_ext 10 \
--prune_factor 0 \
--assemble_mode 2 \
[--pon panel_of_normal.vcf ] \
output_tnscope.pre_filter.vcf.gz
sentieon pyexec tnscope_filter.py \
-v output_tnscope.pre_filter.vcf.gz \
--tumor_sample $TUMOR_SM \
-x amplicon --min_tumor_af 0.0095 --min_depth 200 \
output_tnscope.filter.vcf.gz
样本类型4-ctDNA(仅肿瘤样本,无UMI)
sentieon driver -r $fasta -t $nt -i tumor_deduped.bam --interval $BED \
--algo TNscope \
--tumor_sample $TUMOR_SM \
--dbsnp $dbsnp \
--disable_detector sv \
--min_tumor_allele_frac 3e-3 \
--min_tumor_lod 3.0 \
--min_init_tumor_lod 1.0 \
--assemble_mode 2 \
[--pon panel_of_normal.vcf ] \
output_tnscope.pre_filter.vcf.gz
sentieon pyexec tnscope_filter.py \
-v output_tnscope.pre_filter.vcf.gz \
--tumor_sample $TUMOR_SM \
-x ctdna --min_tumor_af 0.005 --min_depth 400 \
output_tnscope.filter.vcf.gz
使用Sentieon® UMI和TNscope®进行UMI数据处理
该部分描述了适用于具有UMI标记reads的ctDNA和其他高深度样本(深度为20000x-50000x,突变等位基因频率(AF)> 0.1%)的TNscope®参数。应根据需要调整--min_tumor_af和--min_depth的阈值。可在https://support.sentieon.com/appnotes/umi/上了解更多详细信息。
您可以在https://github.com/Sentieon/sentieon-scripts上找到用于过滤的Python脚本 tnscope_filter.py。
-
第一步:比对和一致性生成
if [ "$DUPLEX_UMI" = "true" ] ; then
READ_STRUCTURE="-d $READ_STRUCTURE"fi
sentieon umi extract $READ_STRUCTURE $fastq_1 $fastq_2 | \
sentieon bwa mem -p -C -R "@RG\tID:$tumor\tSM:$tumor\tPL:$platform" -t $nt \
-K 10000000 $fasta - | \
sentieon umi consensus -o umi_consensus.fastq.gz
sentieon bwa mem -p -C -R "@RG\tID:$tumor\tSM:$tumor\tPL:$platform" -t $nt \
-K 10000000 $fasta umi_consensus.fastq.gz | \
sentieon util sort --umi_post_process --sam2bam -i - -o umi_consensus.bam
-
第二步:使用UMI一致性reads进行变异检测
sentieon driver -r $fasta -t $nt -i umi_consensus.bam --interval $BED \
--algo TNscope \
--tumor_sample $TUMOR_SM \
--dbsnp $dbsnp \
--min_tumor_allele_frac 0.001 \
--min_tumor_lod 3.0 \
--min_init_tumor_lod 3.0 \
--pcr_indel_model NONE \
--min_base_qual 40 \
--resample_depth 100000 \
--assemble_mode 4 \
output_tnscope.pre_filter.vcf.gz
sentieon pyexec tnscope_filter.py \
-v output_tnscope.pre_filter.vcf.gz \
--tumor_sample $TUMOR_SM \
-x ctdna_umi --min_tumor_af 0.001 --min_depth 1000 \
output_tnscope.filtered.vcf.gz
附录
可选命令行参数的描述
TNscope:
-
--assemble_mode:assemble_mode=2在准确性和速度之间取得了良好的平衡。assemble_mode=4可用于处理更复杂的区域。
-
--sv_mask_ext:在结构变异断点邻近防止检测SNP/Indel。
-
--max_fisher_pv_active:打开并设置阈值,使用PV值过滤,在特定位置测量肿瘤和正常样本之间的统计学差异。
-
--min_tumor_allele_frac:设置最低肿瘤突变等位基因频率(AF),作为潜在变异位点的判定标准。
-
--min_init_tumor_lod:初始变异检测步骤中最小肿瘤对数几率比(LOD)。
-
--min_tumor_lod:在带有TLOD注释的输出VCF中过滤的最小肿瘤对数几率比(LOD)。
-
--prune_factor:使用prune_factor=0打开自适应图剪枝算法。
常见问题解答
-
这两者的计算和定义不同,出于遗留原因,源自于GATK -
AD:基于输入BAM文件的pileup叠加,在进行局部组装之前 -
AF:局部组装完成后,REF和ALT等位基因支持reads的比例 -
在计算之前,reads经历了一些不同的过滤条件
-
Panel of Normal可以与或无需匹配的正常样本一起使用 -
当在PoN中发现变异时,会标记为过滤器panel_of_normal -
当无法获得匹配的正常样本时,Panel of Normal非常有帮助 -
想了解如何使用TNscope生成自己的Panel of Normal VCF的说明请参考链接:https://support.sentieon.com/manual/TNscope_usage/tnscope/#generating-a-panel-of-normal-vcf-file
-
它们仅影响肿瘤-正常样本,对过滤起作用 -
COSMIC和dbSNP只确定了germline_risk方面的NOLD阈值
- 点赞
- 收藏
- 关注作者
评论(0)