- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

Sentieon | 应用教程：体细胞SNP/Indel变异检测

INSVAST 发表于 2024/02/24 17:42:27 2024/02/24

【摘要】没有唯一分子标识符（UMIs）的数据处理第一步：比对（Alignment）# ******************************************# 1a. Mapping reads with BWA-MEM, sorting for the tumor sample# ******************************************( sentieon...

没有唯一分子标识符（UMIs）的数据处理

第一步：比对（Alignment）

# ******************************************# 1a. Mapping reads with BWA-MEM, sorting for the tumor sample# ******************************************
( sentieon bwa mem -R "@RG\tID:$tumor\tSM:$tumor\tPL:$platform" \
  -t $nt -K 10000000 $fasta $tumor_fastq_1 $tumor_fastq_2 || \
  echo -n 'error' ) | \
  sentieon util sort -o tumor_sorted.bam -t $nt --sam2bam -i -
# ******************************************# 1b. Mapping reads with BWA-MEM, sorting for the normal sample# ******************************************
( sentieon bwa mem -R "@RG\tID:$normal\tSM:$normal\tPL:$platform" \
  -t $nt -K 10000000 $fasta $normal_fastq_1 $normal_fastq_2 || \
  echo -n 'error' ) | \
  sentieon util sort -o normal_sorted.bam -t $nt --sam2bam -i -

第二步：PCR重复去除（对于定点扩增测序，可跳过此步骤）

# ******************************************# 2a. Remove duplicate reads from the tumor sample# ******************************************
sentieon driver -t $nt -i tumor_sorted.bam \
  --algo LocusCollector \
  --fun score_info \
  tumor_score.txt
sentieon driver -t $nt -i tumor_sorted.bam \
  --algo Dedup \
  --score_info tumor_score.txt \
  --metrics tumor_dedup_metrics.txt \
  tumor_deduped.bam
# ******************************************# 2b. Remove duplicate reads from the normal sample# ******************************************
sentieon driver -t $nt -i normal_sorted.bam \
  --algo LocusCollector \
  --fun score_info \
  normal_score.txt
sentieon driver -t $nt -i normal_sorted.bam \
  --algo Dedup \
  --score_info normal_score.txt \
  --metrics normal_dedup_metrics.txt \
  normal_deduped.bam

第三步：碱基质量分数校正（对于panel测序，可跳过此步骤）

# ******************************************# 3a. Base recalibration for the tumor sample# ******************************************
sentieon driver -r $fasta -t $nt -i tumor_deduped.bam --interval $BED \
  --algo QualCal \
  -k $dbsnp \
  -k $known_Mills_indels \
  -k $known_1000G_indels \
  tumor_recal_data.table
# ******************************************# 3b. Base recalibration for the normal sample# ******************************************
sentieon driver -r $fasta -t $nt -i normal_deduped.bam --interval $BED \
  --algo QualCal \
  -k $dbsnp \
  -k $known_Mills_indels \
  -k $known_1000G_indels \
  normal_recal_data.table

第四步：变异检测和过滤

最终的变异调用集生成分为两个步骤：变异检测和过滤。根据样本和文库类型，应使用不同的参数选项和过滤器。

可以在https://github.com/Sentieon/sentieon-scripts上找到用于过滤的Python脚本 tnscope_filter.py。

样本类型1-全基因组/外显子测序

该部分描述了使用TNseq®进行WGS/WES的流程。可以在https://support.sentieon.com/manual/TNseq_usage/tnseq/上了解更多有关TNseq®的信息。

sentieon driver -r $fasta -t $nt -i tumor_deduped.bam -i normal_deduped.bam \
  [--interval $BED] \
  --algo TNhaplotyper2 \
  --tumor_sample $TUMOR_SM \
  --normal_sample $NORMAL_SM \
  [--pon $PON] \
  --germline_vcf $GERMLINE_VCF \
  output-tnhap2-tmp.vcf.gz \
  --algo OrientationBias \
  --tumor_sample $TUMOR_SM \
  output-orientation \
  --algo ContaminationModel \
  --tumor_sample $TUMOR_SM \
  --normal_sample $NORMAL_SM \
  --vcf $CONTAMINATION_VCF \
  --tumor_segments output-contamination-segments \
  output-contamination

sentieon driver -r $fasta \
  --algo TNfilter \
  -v output-tnhap2-tmp.vcf.gz \
  --tumor_sample $TUMOR_SM \
  --normal_sample $NORMAL_SM \
  [--contamination output-contamination] \
  [--tumor_segments output-contamination-segments] \
  [--orientation_priors output-orientation] \
  output-tnhap2.vcf.gz

样本类型2-靶向panel（杂交捕获）

该部分描述了从靶向panel测序（杂交捕获，深度为200-5000x，突变等位基因频率（AF）>= 1%）中进行体细胞变异检测的TNscope®参数。应根据需要调整--min_tumor_af和--min_depth的阈值。

sentieon driver -r $fasta -t $nt -i tumor_deduped.bam -i normal_deduped.bam \
  --interval $BED \
  --algo TNscope \
  --tumor_sample $TUMOR_SM \
  --normal_sample $NORMAL_SM \
  --dbsnp $dbsnp \
  --min_tumor_allele_frac 0.009 \
  --max_fisher_pv_active 0.05 \
  --max_normal_alt_cnt 10 \
  --max_normal_alt_frac 0.01 \
  --max_normal_alt_qsum 250 \
  --sv_mask_ext 10 \
  --prune_factor 0 \
  --assemble_mode 2 \
  [--pon panel_of_normal.vcf] \
  output_tnscope.pre_filter.vcf.gz

sentieon pyexec tnscope_filter.py \
    -v output_tnscope.pre_filter.vcf.gz \
    --tumor_sample $TUMOR_SM \
    -x tissue_panel --min_tumor_af 0.0095 --min_depth 100 \
    output_tnscope.filter.vcf.gz

样本类型3-靶向panel（引物扩增）

该部分描述了从靶向panel测序（引物扩增，深度为200-5000x，突变等位基因频率（AF）>= 1%）中进行体细胞变异检测的TNscope®参数。应根据需要调整--min_tumor_af和--min_depth的阈值。

sentieon driver -r $fasta -t $nt -i tumor_deduped.bam -i normal_deduped.bam \
  --interval $BED --interval_padding 10 \
  --algo TNscope \
  --tumor_sample $TUMOR_SM \
  --normal_sample $NORMAL_SM \
  --dbsnp $dbsnp \
  --min_tumor_allele_frac 0.009 \
  --max_fisher_pv_active 0.05 \
  --max_normal_alt_cnt 10 \
  --max_normal_alt_frac 0.01 \
  --max_normal_alt_qsum 250 \
  --sv_mask_ext 10 \
  --prune_factor 0 \
  --assemble_mode 2 \
  [--pon panel_of_normal.vcf ] \
  output_tnscope.pre_filter.vcf.gz

sentieon pyexec tnscope_filter.py \
    -v output_tnscope.pre_filter.vcf.gz \
    --tumor_sample $TUMOR_SM \
    -x amplicon --min_tumor_af 0.0095 --min_depth 200 \
    output_tnscope.filter.vcf.gz

样本类型4-ctDNA（仅肿瘤样本,无UMI）

该部分描述了适用于ctDNA和其他高深度样本（深度为6000x-10000x，突变等位基因频率（AF）> 0.5%）的TNscope®参数。应根据需要调整--min_tumor_af和--min_depth的阈值。

sentieon driver -r $fasta -t $nt -i tumor_deduped.bam --interval $BED \
  --algo TNscope \
  --tumor_sample $TUMOR_SM \
  --dbsnp $dbsnp \
  --disable_detector sv \
  --min_tumor_allele_frac 3e-3 \
  --min_tumor_lod 3.0 \
  --min_init_tumor_lod 1.0 \
  --assemble_mode 2 \
  [--pon panel_of_normal.vcf ] \
  output_tnscope.pre_filter.vcf.gz

sentieon pyexec tnscope_filter.py \
    -v output_tnscope.pre_filter.vcf.gz \
    --tumor_sample $TUMOR_SM \
    -x ctdna --min_tumor_af 0.005 --min_depth 400 \
    output_tnscope.filter.vcf.gz

使用Sentieon® UMI和TNscope®进行UMI数据处理

该部分描述了适用于具有UMI标记reads的ctDNA和其他高深度样本（深度为20000x-50000x，突变等位基因频率（AF）> 0.1%）的TNscope®参数。应根据需要调整--min_tumor_af和--min_depth的阈值。可在https://support.sentieon.com/appnotes/umi/上了解更多详细信息。

您可以在https://github.com/Sentieon/sentieon-scripts上找到用于过滤的Python脚本 tnscope_filter.py。

第一步：比对和一致性生成

if [ "$DUPLEX_UMI" = "true" ] ; then
      READ_STRUCTURE="-d $READ_STRUCTURE"fi
sentieon umi extract $READ_STRUCTURE $fastq_1 $fastq_2 | \
  sentieon bwa mem -p -C -R "@RG\tID:$tumor\tSM:$tumor\tPL:$platform" -t $nt \
  -K 10000000 $fasta - | \
  sentieon umi consensus -o umi_consensus.fastq.gz
sentieon bwa mem -p -C -R "@RG\tID:$tumor\tSM:$tumor\tPL:$platform" -t $nt \
  -K 10000000 $fasta umi_consensus.fastq.gz | \
  sentieon util sort --umi_post_process --sam2bam -i - -o umi_consensus.bam

第二步：使用UMI一致性reads进行变异检测

sentieon driver -r $fasta -t $nt -i umi_consensus.bam --interval $BED \
  --algo TNscope \
  --tumor_sample $TUMOR_SM \
  --dbsnp $dbsnp \
  --min_tumor_allele_frac 0.001 \
  --min_tumor_lod 3.0 \
  --min_init_tumor_lod 3.0 \
  --pcr_indel_model NONE \
  --min_base_qual 40 \
  --resample_depth 100000 \
  --assemble_mode 4 \
  output_tnscope.pre_filter.vcf.gz

sentieon pyexec tnscope_filter.py \
    -v output_tnscope.pre_filter.vcf.gz \
    --tumor_sample $TUMOR_SM \
    -x ctdna_umi --min_tumor_af 0.001 --min_depth 1000 \
    output_tnscope.filtered.vcf.gz

附录

可选命令行参数的描述

TNscope：

--assemble_mode：assemble_mode=2在准确性和速度之间取得了良好的平衡。assemble_mode=4可用于处理更复杂的区域。
--sv_mask_ext：在结构变异断点邻近防止检测SNP/Indel。
--max_fisher_pv_active：打开并设置阈值，使用PV值过滤，在特定位置测量肿瘤和正常样本之间的统计学差异。
--min_tumor_allele_frac：设置最低肿瘤突变等位基因频率（AF），作为潜在变异位点的判定标准。
--min_init_tumor_lod：初始变异检测步骤中最小肿瘤对数几率比（LOD）。
--min_tumor_lod：在带有TLOD注释的输出VCF中过滤的最小肿瘤对数几率比（LOD）。
--prune_factor：使用prune_factor=0打开自适应图剪枝算法。

常见问题解答

1. AD比例与输出VCF中报告的AF不匹配：

这两者的计算和定义不同，出于遗留原因，源自于GATK
AD：基于输入BAM文件的pileup叠加，在进行局部组装之前
AF：局部组装完成后，REF和ALT等位基因支持reads的比例
在计算之前，reads经历了一些不同的过滤条件

2. 使用Normal样本组成的Panel of Normal文件：

Panel of Normal可以与或无需匹配的正常样本一起使用
当在PoN中发现变异时，会标记为过滤器panel_of_normal
当无法获得匹配的正常样本时，Panel of Normal非常有帮助
想了解如何使用TNscope生成自己的Panel of Normal VCF的说明请参考链接：https://support.sentieon.com/manual/TNscope_usage/tnscope/#generating-a-panel-of-normal-vcf-file

3. COSMIC和dbSNP数据库文件的作用：

它们仅影响肿瘤-正常样本，对过滤起作用
COSMIC和dbSNP只确定了germline_risk方面的NOLD阈值

4. 复杂变异：

Sentieon提供了一个实验性的Python脚本，用于合并同一相位的相邻变异。该脚本仅供实验使用，您可在https://github.com/Sentieon/sentieon-scripts/blob/master/merge_mnp/merge_mnp.py上进行了解。

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

Sentieon | 应用教程：体细胞SNP/Indel变异检测

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

Sentieon | 应用教程：体细胞SNP/Indel变异检测

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

推荐阅读

相关产品