使用blupf90进行ssgwas分析

2025-12-29

字数统计: 1.6k | 阅读时长≈ 7 分钟

使用 blupf90 进行 ssgwas 分析。

从育种值推导 SNP 效应值

简单地说，首先我们计算一次育种值（SNP权重矩阵视为单位矩阵），按照育种值结果中推导出SNP效应值。基于 Wang et al. (2012) ，我们可以基于SNP效应值计算每个SNP的权重（SNP权重矩阵矩阵），这里也是每个SNP解释的方差。

这里还对原始权重进行了处理，使所有权重的均值为1。

然后我们用更新后的矩阵重新计算阵，重新计算育种值（可选），再计算出SNP效应值，这样循环几次（官方文档说是直到得到“合理的结果为止”）。说白了，就是每次循环就是为了更新SNP权重的矩阵。

但是最初的SSGWAS 只是给出了SNP的效应值，没有给出 P 值。

旧的流程

运行ssGWAS可能需要用到下面的模块：

RENUMF90: general preparation of data set.
PREGSF90: quality control and creation of cleaned genotypes.（用不到）
BLUPF90: prediction of a with a weighted G.
POSTGSF90: prediction of u and a new weight D.（计算SNP效应值）
PREDF90 (Optional): computation of GEBV based on the predicted SNP effects and genotypes

其中正常就是重复使用 blupf90 （计算育种值）和 postgsf90 （计算SNP效应值）这两个模块。

示例

官方文档中给出了一个示例数据和脚本。

准备文件如下

• phenotypes.txt = observation data file
• pedigree = pedigree file
• marker.geno.clean = marker file
• chrmap.txt = chromosome map file （三列，分别为 SNP 序号（从1开始编号），染色体，物理位置，无标题，空格分隔）
• w = the default weight file（一列，每一行对应一个SNP的权重，初始全是1，无标题，空格分隔）
• renum.par = instruction file for RENUMF90

两个脚本

• run_renum.sh = for renumbering, and for generating the parameter files
• run_ssgwas.sh = for iterative ssGWAS with BLUPF90 and POSTGSF90

run_renum.sh

首先第一个脚本是运行 renumf90 后，修改 renf90.par ，生成两个参数卡 (param_ssgwas1a.txt 和 param_ssgwas1b.txt，分别用于 blupf90 和 postgsf90 使用)

#!/bin/bash

#
# Prepare and renum data to run 
#   GWAS example from GWAS from Wang et al 2012 Gen Res
#   Simplified by Yutaka Masuda
#

# your binary path (ending with /)
bindir='./'

echo renum.par | ${bindir}renumf90 | tee log.renum

# making parameter files
cp renf90.par param_ssgwas1a.txt
cp renf90.par param_ssgwas1b.txt

# adding extra options
# BLUPF90
echo "OPTION saveGInverse" >> param_ssgwas1a.txt
echo "OPTION weightedG w" >> param_ssgwas1a.txt

# POSTGSF90
echo "OPTION ReadGInverse" >> param_ssgwas1b.txt
echo "OPTION weightedG w" >> param_ssgwas1b.txt
echo "OPTION chrinfo chrmap.txt" >> param_ssgwas1b.txt
echo "#OPTION Manhattan_plot" >> param_ssgwas1b.txt

解释一下这里额外增加的选项的含义

OPTION saveGInverse ：保存结果为 “Gi”
OPTION weightedG w : 读取 SNP 权重文件
OPTION ReadGInverse ：从 “Gi” 文件中读取 G 逆结果
OPTION chrinfo chrmap.txt ：专门用于 postgsf90 模块计算 SNP 效应，三列，分别为 SNP 序号（从1开始编号），染色体，物理位置（postGSF90 需要用这个文件的信息绘制曼哈顿图）

这里 blupf90 会读取 SNP权重文件，输出 G 逆和育种值结果；而 postf90 会读取 G 逆文件和 SNP的物理位置文件，输出 SNP效应结果。

run_ssgwas.sh

内容如下（改了一下 blupf90 等运行方式）

# result directory
resdir=results
mkdir -p ${resdir}

# the number of iteration
niter=2

#
# You don't have to modify the following statements unless needed.
#

# the number of snps
ns=`wc -l chrmap | awk '{print $1}'`

# the default weights
awk -v ns=${ns} 'BEGIN {for(i=1;i<=ns;i++) print 1}' > w
cp w w.default

for i in `seq 1 ${niter}`; do
   # updating the weights using BLUPF90 and POSTGSF90
   blupf90 param_ssgwas1a.txt | tee logblup_$i
   cp solutions solutions_$i
   postGSf90 param_ssgwas1b.txt | tee logpost_$i
   cp snp_sol snp_sol_$i
   cp chrsnp chrsnp_$i
   cp w w_$i
   awk '{print $7}' snp_sol > w

   # save the results in the current iteration
   mv logblup_$i $resdir
   mv logpost_$i $resdir
   mv solutions_$i $resdir
   mv snp_sol_$i $resdir
   mv chrsnp_$i $resdir
   mv w_$i $resdir
done

这里运行 blupf90 后除了育种值结果 solutions ，还会生成 Gi （G逆的二进制文件）

运行 postGSf90 输出文件

snp_sol

SNP 效应值结果，各列内容解释如下，第 6 列应该是 SNP 效应值，第 7 列是SNP权重（感觉结果不太对）。

1: trait
2: effect
3: SNP
4: Chromosome
5: Position
6: SNP solution
7: weight (can be used as the weight to calculate the weighted G matrix)
8: variance explained by n adjacent SNP.（#if OPTION windows_variance is used ，计算 n个相邻SNP的解释方差）
9: variance of the SNP solution (used to compute the p-value if OPTION snp_p_value is used)

这里可以通过选型 OPTION which_weight x 来选择使用哪种方法来计算 SNP 权重（官方文档没说哪种是默认的选项）。这里计算得到的权重应该都经过了标准化（均值为1）

1: w = y^2 * (2(p(1-p)))
2: w = y^2
3: experimental with the degree of brief
4: w = C**(abs(y i )/sqrt(var(y))-2) from VanRaden et al. (2009)
nonlinearA: same as 4
where y is the SNP solution, with scaled weight = w * nSnp/sum(w); and C is 1.125 by default (enable to
change it using the second argument of the option line (OPTION which_weight nonlinearA value), e.g.,
OPTION which_weight nonlinearA 1.2

chrsnp

用于绘制曼哈顿图

1: trait
2: effect
3: values of SNP effects to use in Manhattan plots, i.e., (abs(SNP_i)/var(SNP))
4: SNP
5: Chromosome
6: Position

其他输出文件不重要，这里就不介绍了

计算 P 值

Aguilar 2019年发的文章，更新了一下算法，增加了计算 SNP 的 P 值的计算公式。而且这篇文章中说，不需要使用SNP权重 ，使用并迭代更新SNP权重没有增加SNP效应值和方差。

这里要计算SNP效应估计值的预测误差方差，公式如下

这里是矩阵的第列，对应着第个SNP。

然后，SNP的 P值计算公式如下，这里为累积标准正态分布函数。

在blupf90 的参数卡 postGSf90 参数卡加入下面选项

1	OPTION snp_p_value

此时输出结果 snp_sol 中第9列就有了SNP效应估计值的预测误差方差，但是没有P值结果，还是需要自己手动算一下P值。

因此汇总一下，通过测试，blupf90 参数卡最少只需要加入两条选项如下，此时会额外输出文件 xx_ija （应该系数矩阵的逆矩阵元素）

1 2	OPTION saveGInverse OPTION snp_p_value

postGSf90 最少也只需要加入两条选项如下（可以不提供 OPTION chrinfo chrmap.txt ，此时输出结果中的染色体和物理位置全为1）

1 2	OPTION ReadGInverse OPTION snp_p_value

参考文献

blupf90 官方文档
Wang H, Misztal I, Aguilar I, et al. Genome-wide association mapping including phenotypes from relatives without genotypes[J]. Genetics research, 2012, 94(2): 73-83.
Aguilar I, Legarra A, Cardoso F, et al. Frequentist p-values for large-scale-single step genome-wide association, with an application to birth weight in American Angus cattle[J]. Genetics Selection Evolution, 2019, 51(1): 28.

版权声明： 本博客所有文章除特别声明外，著作权归作者所有。转载请注明出处！