Title: | SNP-Set (Sequence) Kernel Association Test |
---|---|
Description: | Functions for kernel-regression-based association tests including Burden test, SKAT and SKAT-O. These methods aggregate individual SNP score statistics in a SNP set and efficiently compute SNP-set level p-values. |
Authors: | Seunggeun (Shawn) Lee and Zhangchen Zhao, with contributions from Larisa Miropolsky and Michael Wu |
Maintainer: | Seunggeun (Shawn) Lee <[email protected]> |
License: | GPL(>=2) |
Version: | 2.3.1 |
Built: | 2025-02-12 15:33:31 UTC |
Source: | https://github.com/leelabsg/skat |
Close the opened SNP Set data file (SSD). After using the SSD file, it must be closed.
Close_SSD()
Close_SSD()
Seunggeun Lee, Larisa Miropolsky
Generate a SNP set data file (SSD) from binary plink data files using user specified SNP sets.
Generate_SSD_SetID(File.Bed, File.Bim, File.Fam, File.SetID, File.SSD, File.Info, Is.FlipGenotype=TRUE)
Generate_SSD_SetID(File.Bed, File.Bim, File.Fam, File.SetID, File.SSD, File.Info, Is.FlipGenotype=TRUE)
File.Bed |
name of the binary ped file (BED). |
File.Bim |
name of the binary map file (BIM). |
File.Fam |
name of the FAM file (FAM). |
File.SetID |
name of the SNP set ID file that defines SNP sets. The first column must be Set ID, and the second column must be SNP ID. There should be no header!! |
File.SSD |
name of the SSD file generated. |
File.Info |
name of the SSD info file generated. |
Is.FlipGenotype |
internal use only, please do not change |
The SetID file is a white-space (space or tab) delimitered file with 2 columns: SetID and SNP_ID.
Please keep in mind that there should be no header! The SNP_IDs and SetIDs should be less than 50 characters, otherwise, it will return an error message.
The SSD file is a binary formated file with genotypes. The SSD info file is a text file with general information on data and SNP sets (first 6 rows), and information on each set (after 8th row).
Seunggeun Lee, Larisa Miropolsky
Estimate the effective number of tests for Bonferroni correction using the minimum achievable p-values (MAP).
Get_EffectiveNumberTest(MAP, alpha=0.05, Is.MidP=TRUE)
Get_EffectiveNumberTest(MAP, alpha=0.05, Is.MidP=TRUE)
MAP |
a vector of the minimum achievable p-values (MAP). |
alpha |
a significant level. |
Is.MidP |
a logical value indicating whether p-values are mid-p-values. |
Effective number of test for Bonferroni correction at the level alpha. MAP can be obtained from SKATBinary functions.
Seunggeun Lee
Lee, S., Fuchsberger, C., Kim, S., Scott, L. (2015) An efficient resampling method for calibrating single and gene-based rare variant association analysis in case-control studies. Biostatistics, in press.
Read a SSD file and return a genotype matrix.
Get_Genotypes_SSD(SSD_INFO, Set_Index, is_ID = TRUE) Get_Genotypes_SSD_Sparse(SSD_INFO, Set_Index)
Get_Genotypes_SSD(SSD_INFO, Set_Index, is_ID = TRUE) Get_Genotypes_SSD_Sparse(SSD_INFO, Set_Index)
SSD_INFO |
SSD_INFO object returned from Open_SSD. |
Set_Index |
a numeric value of Set index. The set index of each set can be found from SetInfo object of SSD.INFO. |
is_ID |
a logical value indicating whether to read SNP ID (default=TRUE). If TRUE, it reads SNP IDs and use them as column names. |
A genotype matrix with n rows and m columns, where n is the number of samples and m is the number of SNPs. Get_Genotypes_SSD_Sparse returns a sparse matrix. Get_Genotypes_SSD_Sparse always returns SNP IDs as the column names, so does not have is_ID parameter.
Seunggeun Lee, Larisa Miropolsky
Get logistic weights from either a genotype matrix (Z) or a vector of minor allele frequncies (MAF). Users can apply this weights to SKAT by giving it as the “weights” parameter. The logistic weight gives equal weights to rare variants and nearly zero weight to common variants.
Get_Logistic_Weights(Z, par1=0.07, par2=150) Get_Logistic_Weights_MAF(MAF, par1=0.07, par2=150)
Get_Logistic_Weights(Z, par1=0.07, par2=150) Get_Logistic_Weights_MAF(MAF, par1=0.07, par2=150)
Z |
a numeric genotype matrix with each row as a different individual and each column as a separate gene/snp. Each genotype should be coded as 0, 1, 2, and 9 for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele. |
MAF |
a numeric vector of minor allele frequncies. |
par1 |
a numeric value of the first parameter of the logistic weight (default= 0.07). |
par2 |
a numeric value of the second parameter of the logistic weight(default= 150). |
The formula for the weight is
A vector of the logistic weight.
Seunggeun Lee
data(SKAT.example) ############################################################# # Compute the P-value of SKAT with the logistic Weight (par1=0.07, par2=150) # Use logistic weight Z<-SKAT.example$Z obj<-SKAT_Null_Model(y.c ~ X, out_type="C", data=SKAT.example) weights<-Get_Logistic_Weights(Z, par1=0.07, par2=150) SKAT(Z, obj, kernel = "linear.weighted", weights=weights)$p.value # Weights function MAF<-colMeans(Z)/2 plot(MAF,weights)
data(SKAT.example) ############################################################# # Compute the P-value of SKAT with the logistic Weight (par1=0.07, par2=150) # Use logistic weight Z<-SKAT.example$Z obj<-SKAT_Null_Model(y.c ~ X, out_type="C", data=SKAT.example) weights<-Get_Logistic_Weights(Z, par1=0.07, par2=150) SKAT(Z, obj, kernel = "linear.weighted", weights=weights)$p.value # Weights function MAF<-colMeans(Z)/2 plot(MAF,weights)
Get the sample sizes required to achieve the given power.
Get_RequiredSampleSize(obj, Power=0.8)
Get_RequiredSampleSize(obj, Power=0.8)
obj |
an object returned from Power_Continuous or Power_Logistic. |
Power |
a value of the power to be achieved (default= 0.8). |
This function computes required sample sizes using simple interpolation.
A list object for the required sample sizes.
Seunggeun Lee
Compute a resampling p-value using resampled residuals. To use it, SKAT_Null_Model or SKAT_Null_Model_MomentAdjust should have n.Resampling > 0.
Get_Resampling_Pvalue(obj) Get_Resampling_Pvalue_1(p.value, p.value.resampling)
Get_Resampling_Pvalue(obj) Get_Resampling_Pvalue_1(p.value, p.value.resampling)
obj |
SKAT outcome object. |
p.value |
a numeric value of the SKAT p-value. |
p.value.resampling |
a vector of p-values from the resampled residuals. |
See SKAT_Null_Model
p.value |
the resampling p-value. It is computed as (n1 +1)/(n+1), where n is the number of resampling (n.Resampling in SKAT_Null_Model or SKAT_Null_Model_MomentAdjust), and n1 is the number of resampled residual p-values smaller than the original sample p-value. |
is_smaller |
a logical value indicates whether the resampling p-value should be smaller. If n1=0, then it is TRUE, otherwise it is FALSE. |
Seunggeun Lee
Open a SNP Set data file (SSD). After finishing using the SSD file, you must close the file by callinsg Close_SSD function.
Open_SSD(File.SSD, File.Info)
Open_SSD(File.SSD, File.Info)
File.SSD |
name of the SSD file . |
File.Info |
name of the SSD info file. |
a list object of SSD.INFO.
Seunggeun Lee, Larisa Miropolsky
Compute an average power of SKAT and SKAT-O for testing association between a genomic region and continuous phenotypes with a given disease model.
Power_Continuous(Haplotypes=NULL, SNP.Location=NULL, SubRegion.Length=-1 , Causal.Percent=5, Causal.MAF.Cutoff=0.03, alpha =c(0.01,10^(-3),10^(-6)) , N.Sample.ALL = 500 * (1:10), Weight.Param=c(1,25), N.Sim=100 , BetaType = "Log", MaxBeta=1.6, Negative.Percent=0) Power_Continuous_R(Haplotypes=NULL, SNP.Location, SubRegion.Length=-1 , Causal.Percent=5, Causal.MAF.Cutoff=0.03, alpha =c(0.01,10^(-3),10^(-6)) , N.Sample.ALL = 500 * (1:10), Weight.Param=c(1,25), N.Sim=100 , BetaType = "Log", MaxBeta=1.6, Negative.Percent=0, r.corr=0)
Power_Continuous(Haplotypes=NULL, SNP.Location=NULL, SubRegion.Length=-1 , Causal.Percent=5, Causal.MAF.Cutoff=0.03, alpha =c(0.01,10^(-3),10^(-6)) , N.Sample.ALL = 500 * (1:10), Weight.Param=c(1,25), N.Sim=100 , BetaType = "Log", MaxBeta=1.6, Negative.Percent=0) Power_Continuous_R(Haplotypes=NULL, SNP.Location, SubRegion.Length=-1 , Causal.Percent=5, Causal.MAF.Cutoff=0.03, alpha =c(0.01,10^(-3),10^(-6)) , N.Sample.ALL = 500 * (1:10), Weight.Param=c(1,25), N.Sim=100 , BetaType = "Log", MaxBeta=1.6, Negative.Percent=0, r.corr=0)
Haplotypes |
a haplotype matrix with each row as a different individual and each column as a separate SNP (default= NULL). Each element of the matrix should be either 0 (major allel) or 1 (minor allele). If NULL, SKAT.haplotype dataset will be used to compute power. |
SNP.Location |
a numeric vector of SNP locations that should be matched with the SNPs in the Haplotype matrix (default= NULL). It is used to obtain subregions. When Haplotype=NULL, it should be NULL. |
SubRegion.Length |
a value of the length of subregions (default= -1). Each subregion will be randomly selected, and then the average power will be calculated by taking the average over the estimated powers of all subregions. If SubRegion.Length=-1 (default), the length of the subregion will be the same as the length of the whole region, so there will no random selection of subregions. |
Causal.Percent |
a value of the percentage of causal SNPs among rare SNPs (MAF < Causal.MAF.Cutoff)(default= 5). |
Causal.MAF.Cutoff |
a value of MAF cutoff for the causal SNPs. Only SNPs that have MAFs smaller than the cutoff will be considered as causal SNPs (default= 0.03). |
alpha |
a vector of the significance levels (default= c(0.01,10^(-3),10^(-6))). |
N.Sample.ALL |
a vector of the sample sizes (default= 500 * (1:10)). |
Weight.Param |
a vector of parameters of beta weights (default= c(1,25)). |
N.Sim |
a value of number of causal SNP/SubRegion sets to be generated to compute the average power (default= 100). Power will be computed for each causal SNP/SubRegion set, and then the average power will be obtained by taking average over the computed powers. |
BetaType |
a type of effect sizes (default= “Log”). “Log” indicates that effect sizes of causal variants equal to |
MaxBeta |
a numeric value of the maximum effect size (default= 1.6). When BetaType="Log", the maximum effect size is MaxBeta (when MAF=0.0001). When BetaType="Fixed", all causal variants have the same effect size (= MaxBeta). See details |
Negative.Percent |
a numeric value of the percentage of coefficients of causal variants that are negative (default= 0). |
r.corr |
(Power_Continuous_R only) the |
By default it uses the haplotype information in the SKAT.haplotypes dataset. So if you want to use the SKAT.haplotypes dataset, you can left Haplotypes and SNP.Location as NULL.
When BetaType=“Log”, MaxBeta is a coeffecient value () of the causal SNP at MAF
and used to obtain c value of the function
. For example, if MaxBeta=1.6,
.
Then a variant with MAF=0.001 has
and a variant with MAF=0.01 has
.
When SubRegion.Length is small such as 3kb or 5kb,
it is possible that you can have different estimated power for each run with N.Sim = .
Then, please increase the N.Sim to
to obtain stable results.
R.sq is computed under the no linkage disequilibrium assumption.
Power_Continuous_R computes power with new class of kernels with the compound symmetric correlation structure. It uses a slightly different approach, and thus Power_Continuous and Power_Continuous_R can produce slightly different results although r.corr=0.
If you want to computer power of SKAT-O by estimating the optimal r.corr, use r.corr=2.
The estimated optimal r.corr is
,
where
is a proportion of causal variants, and
is a proportion of negatively associated causal variants
among the causal variants.
Power |
A matrix with each row as a different sample size and each column as a different significance level. Each element of the matrix is the estimated power. |
R.sq |
Proportion of phenotype variance explained by genetic variants. |
r.corr |
r.corr value. When r.corr=2 is used, it provides the estimated r.corr value. See details. |
Seunggeun Lee
# # Calculate the average power of randomly selected 3kb regions # with the following conditions. # # Causal percent = 20% # Negative percent = 20% # Max effect size = 2 at MAF = 10^-4 # # When you use this function, please increase N.Sim (more than 100) # out.c<-Power_Continuous(SubRegion.Length=3000, Causal.Percent= 20, N.Sim=5, MaxBeta=2,Negative.Percent=20) out.c # # Calculate the required sample sizes to achieve 80% power Get_RequiredSampleSize(out.c, Power=0.8)
# # Calculate the average power of randomly selected 3kb regions # with the following conditions. # # Causal percent = 20% # Negative percent = 20% # Max effect size = 2 at MAF = 10^-4 # # When you use this function, please increase N.Sim (more than 100) # out.c<-Power_Continuous(SubRegion.Length=3000, Causal.Percent= 20, N.Sim=5, MaxBeta=2,Negative.Percent=20) out.c # # Calculate the required sample sizes to achieve 80% power Get_RequiredSampleSize(out.c, Power=0.8)
Compute an average power of SKAT and SKAT-O for testing association between a genomic region and dichotomous phenotypes from case-control studies with a given disease model.
Power_Logistic(Haplotypes = NULL, SNP.Location = NULL, SubRegion.Length=-1 , Prevalence=0.01, Case.Prop=0.5, Causal.Percent=5, Causal.MAF.Cutoff=0.03 , alpha =c(0.01,10^(-3),10^(-6)), N.Sample.ALL = 500 * (1:10) , Weight.Param=c(1,25), N.Sim=100, OR.Type = "Log" , MaxOR=5, Negative.Percent=0) Power_Logistic_R(Haplotypes = NULL, SNP.Location = NULL, SubRegion.Length=-1 , Prevalence=0.01, Case.Prop=0.5, Causal.Percent=5, Causal.MAF.Cutoff=0.03 , alpha =c(0.01,10^(-3),10^(-6)), N.Sample.ALL = 500 * (1:10) , Weight.Param=c(1,25), N.Sim=100, OR.Type = "Log" , MaxOR=5, Negative.Percent=0, r.corr=0)
Power_Logistic(Haplotypes = NULL, SNP.Location = NULL, SubRegion.Length=-1 , Prevalence=0.01, Case.Prop=0.5, Causal.Percent=5, Causal.MAF.Cutoff=0.03 , alpha =c(0.01,10^(-3),10^(-6)), N.Sample.ALL = 500 * (1:10) , Weight.Param=c(1,25), N.Sim=100, OR.Type = "Log" , MaxOR=5, Negative.Percent=0) Power_Logistic_R(Haplotypes = NULL, SNP.Location = NULL, SubRegion.Length=-1 , Prevalence=0.01, Case.Prop=0.5, Causal.Percent=5, Causal.MAF.Cutoff=0.03 , alpha =c(0.01,10^(-3),10^(-6)), N.Sample.ALL = 500 * (1:10) , Weight.Param=c(1,25), N.Sim=100, OR.Type = "Log" , MaxOR=5, Negative.Percent=0, r.corr=0)
Haplotypes |
a haplotype matrix with each row as a different individual and each column as a separate SNP (default= NULL). Each element of the matrix should be either 0 (major allel) or 1 (minor allele). If NULL, SKAT.haplotype dataset will be used to compute power. |
SNP.Location |
a numeric vector of SNP locations which should be matched with the SNPs in the Haplotype matrix (default= NULL). It is used to obtain subregions. When Haplotype=NULL, it should be NULL. |
SubRegion.Length |
a value of the length of subregions (default= -1). Each subregion will be randomly selected, and then the average power will be calculated by taking the average over the estimated powers of all subregions. If SubRegion.Length=-1 (default), the length of the subregion is the same as the length of the whole region, so there will no random selection of subregions. |
Prevalence |
a value of disease prevalence. |
Case.Prop |
a value of the proportion of cases. For example, Case.Prop=0.5 means 50 % of samples are cases and 50 % of samples are controls. |
Causal.Percent |
a value of the percentage of causal SNPs among rare SNPs (MAF < Causal.MAF.Cutoff)(default= 5). |
Causal.MAF.Cutoff |
a value of MAF cutoff for the causal SNPs. Only SNPs that have MAFs smaller than this are considered as causal SNPs (default= 0.03). |
alpha |
a vector of the significance levels (default= c(0.01,10^(-3),10^(-6))). |
N.Sample.ALL |
a vector of the sample sizes (default= 500 * (1:10)). |
Weight.Param |
a vector of parameters of beta weights (default= c(1,25)). |
N.Sim |
a value of number of causal SNP/SubRegion sets to be generated to compute the average power (default= 100). Power will be computed for each causal SNP/SubRegion set, and then the average power will be obtained by taking mean of the computed powers. |
OR.Type |
a function type of effect sizes (default= “Log”). “Log” indicates that log odds ratio of causal variants equal to |
MaxOR |
a numeric value of the maximum odds ratio (default= 5). When OR.Type=“Log”, the maximum odds ratio is MaxOR (when MAF=0.0001). When OR.Type=“Fixed”, all causal variants have the same odds ratio (= MaxOR). See details |
Negative.Percent |
a numeric value of the percentage of coefficients of causal variants that are negative (default= 0). |
r.corr |
(Power_Logistic_R only) the |
By default it uses the haplotype information in the SKAT.haplotypes dataset. So you can left Haplotypes and SNP.Location as NULL if you want to use the SKAT.haplotypes dataset.
When OR.Type="Log", MaxOR is a odds ratio of the causal SNP at MAF
and used to obtain c value in the function
.
For example, if MaxOR=5,
. Then a variant with MAF=0.001 has log odds ratio = 1.206 and a variant with MAF=0.01 has log odds ratio = 0.804.
When SubRegion.Length is small such as 3kb or 5kb, it is possible that you can have different estimated power for each run with N.Sim = .
Then, please increase N.Sim to
to obtain stable results.
Power_Logistic_R computes the power with new class of kernels with the compound symmetric correlation structure. It uses a slightly different approach, and thus Power_Logistic and Power_Logistic_R can produce slightly different results although r.corr=0.
If you want to computer power of SKAT-O by estimating the optimal r.corr, use r.corr=2.
The estimated optimal r.corr is
,
where
is a proportion of causal variants, and
is a proportion of negatively associated causal variants
among the causal variants.
Power |
A matrix with each row as a different sample size and each column as a different significance level. Each element of the matrix is the estimated power. |
r.corr |
r.corr value. When r.corr=2 is used, it provides the estimated r.corr value. See details. |
Seunggeun Lee
# # Calculate the average power of randomly selected 3kb regions # with the following conditions. # # Causal percent = 20% # Negative percent = 20% # Max OR = 7 at MAF = 10^-4 # # When you use this function, please increase N.Sim (more than 100) # out.b<-Power_Logistic(SubRegion.Length=3000, Causal.Percent= 20, N.Sim=5 ,MaxOR=7,Negative.Percent=20) out.b # # Calculate the required sample sizes to achieve 80% power Get_RequiredSampleSize(out.b, Power=0.8)
# # Calculate the average power of randomly selected 3kb regions # with the following conditions. # # Causal percent = 20% # Negative percent = 20% # Max OR = 7 at MAF = 10^-4 # # When you use this function, please increase N.Sim (more than 100) # out.b<-Power_Logistic(SubRegion.Length=3000, Causal.Percent= 20, N.Sim=5 ,MaxOR=7,Negative.Percent=20) out.b # # Calculate the required sample sizes to achieve 80% power Get_RequiredSampleSize(out.b, Power=0.8)
Draws a MAP-adjusted QQ plot
QQPlot_Adj(Pval, MAP, main="QQ plot", ntry=500, confidence=0.95, Is.unadjsted=TRUE , Is.legend=TRUE, xlab="Expected Quantiles (-log10 P-values)" , ylab="Observed Quantiles (-log10 P-values)")
QQPlot_Adj(Pval, MAP, main="QQ plot", ntry=500, confidence=0.95, Is.unadjsted=TRUE , Is.legend=TRUE, xlab="Expected Quantiles (-log10 P-values)" , ylab="Observed Quantiles (-log10 P-values)")
Pval |
a vector of p-values. |
MAP |
a vector of MAP. |
main |
a main title. |
ntry |
a numeric value of the number for resampling (default= 500). |
confidence |
a value for the confidence band (default= 0.95). |
Is.unadjsted |
a logical value to indicate whether to plot the unadjusted QQ plot (default= TRUE). |
Is.legend |
a logical value to indicate whether to plot a legend (default= TRUE). |
xlab |
a label for the x axis. |
ylab |
a label for the y axis. |
Seunggeun Lee
Lee, S., Fuchsberger, C., Kim, S., Scott, L. (2015) An efficient resampling method for calibrating single and gene-based rare variant association analysis in case-control studies. Biostatistics, in press.
Read Plink FAM and covariates files.
Read_Plink_FAM(Filename, Is.binary=TRUE, flag1=0) Read_Plink_FAM_Cov(Filename, File_Cov, Is.binary=TRUE, flag1=0, cov_header=TRUE)
Read_Plink_FAM(Filename, Is.binary=TRUE, flag1=0) Read_Plink_FAM_Cov(Filename, File_Cov, Is.binary=TRUE, flag1=0, cov_header=TRUE)
Filename |
input file name of plink FAM file |
Is.binary |
if TRUE, the phenotype is binary. If phenotype is continuous, it should be FALSE |
flag1 |
0 represents the default coding of unaffected/affected (1/2) (default=0), and 1 represents 0/1 coding. flag1=1 is the same as –1 flag in plink. Please see the plink manual. |
File_Cov |
an input file name of plink covariates file. The first two columns of this file should be FID and IID. |
cov_header |
a logical value indicating whether the covariate file contains a header row (default=TRUE) |
A dataframe of Family ID (FID), Individual ID (IID), Paternal ID (PID), Maternal ID(MID), Sex, and Phenotype. If Read_Plink_FAM_Cov is used with a covariate file, the dataframe has covariates from the 7th column.
Seunggeun Lee
Read a file with custom weights
Read_SNP_WeightFile(FileName)
Read_SNP_WeightFile(FileName)
FileName |
input file name of a custom weight. |
The file should be a white-space (space or tab) delimitered file with 2 columns: SNP_ID and weight value.
Please keep in mind that there should be no header!!
Output object has a hash table of SNP IDs and weights.
Seunggeun Lee
Obtain significant SNP sets after controlling for family wise error rate (FWER) using resampled residuals. To use it, SKAT_Null_Model or SKAT_Null_Model_MomentAdjust should have n.Resampling > 0.
Resampling_FWER(obj,FWER=0.05) Resampling_FWER_1(P.value, P.value.Resampling, FWER=0.05)
Resampling_FWER(obj,FWER=0.05) Resampling_FWER_1(P.value, P.value.Resampling, FWER=0.05)
obj |
object returned from SKAT.SSD.All or SKATBinary.SSD.All. |
P.value |
a vector of SKAT p-values. If 100 genes were tested, this vector should have 100 p-values. |
P.value.Resampling |
a matrix of p-values of the resampled residuals. Each row represents each gene/snp set, and each column represents resampling set. For example, if you have 100 genes, and conducted resampling 1000 times ( ex.n.Resampling=1000 in SKAT_Null_Model), then it should be a 100 x 1000 matrix. |
FWER |
a numeric value of FWER rate to control (default=0.05) |
results |
If the returned object from SKAT.SSD.all (or SKATBinary.SSD.All) are used, it is a sub-table of significant snp sets of the result table in the obj. If you use P.value and P.value.Resampling, it is a vector of significant p-values. If there is no significant snp set, it is NULL. |
n |
a numeric value of the number of significant snp sets. |
ID |
a vector of indexes of significant snp sets. |
Seunggeun Lee
Test for association between a set of SNPS/genes and continuous or dichotomous phenotypes using kernel regression framework.
SKAT(Z, obj, kernel = "linear.weighted", method="davies", weights.beta=c(1,25), weights=NULL, impute.method="fixed", r.corr=0, is_check_genotype=TRUE, is_dosage = FALSE, missing_cutoff=0.15 , max_maf=1, estimate_MAF=1) SKAT.SSD.OneSet(SSD.INFO, SetID, obj, ... ,obj.SNPWeight=NULL) SKAT.SSD.OneSet_SetIndex(SSD.INFO, SetIndex, obj, ... ,obj.SNPWeight=NULL)
SKAT(Z, obj, kernel = "linear.weighted", method="davies", weights.beta=c(1,25), weights=NULL, impute.method="fixed", r.corr=0, is_check_genotype=TRUE, is_dosage = FALSE, missing_cutoff=0.15 , max_maf=1, estimate_MAF=1) SKAT.SSD.OneSet(SSD.INFO, SetID, obj, ... ,obj.SNPWeight=NULL) SKAT.SSD.OneSet_SetIndex(SSD.INFO, SetIndex, obj, ... ,obj.SNPWeight=NULL)
Z |
a numeric genotype matrix with each row as a different individual and each column as a separate gene/snp. Each genotype should be coded as 0, 1, 2, and 9 (or NA) for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele. Missing genotypes will be imputed by the simple Hardy-Weinberg equilibrium (HWE) based imputation. |
obj |
an output object of the SKAT_Null_Model function. |
kernel |
a type of kernel (default= "linear.weighted"). See detail section. |
method |
a method to compute the p-value (default= "davies"). "davies" represents an exact method that computes the p-value by inverting the characteristic function of the mixture chisq, "liu" represents an approximation method that matches the first 3 moments, "liu.mod" represents modified "liu" method that matches kurtosis instead of skewness to improve tail probability approximation,"SKATO" and "optimal.adj" represent a SKAT-O based on an unified approach, and "optimal" is an old version of the implementation of SKAT-O. See details. |
weights.beta |
a numeric vector of parameters for the beta weights for the weighted kernels. If you want to use your own weights, please use the “weights” parameter. It will be ignored if “weights” parameter is not null. |
weights |
a numeric vector of weights for the weighted kernels.
It is |
impute.method |
a method to impute missing genotypes (default= "fixed"). "bestguess" imputes missing genotypes as most likely values (0,1,2), "random" imputes missing genotypes by generating binomial(2,p) random variables (p is the MAF), and "fixed" imputes missing genotypes by assigning the mean genotype values (2p). |
r.corr |
the |
is_check_genotype |
a logical value indicating whether to check the validity of the genotype matrix Z (default= TRUE). If Z has non-SNP data, please set it FALSE, otherwise you will get an error message. If it is FALSE and you use weighted kernels, the weights should be given through the “weights” parameter. |
is_dosage |
a logical value indicating whether the matrix Z is a dosage matrix. If it is TRUE, SKAT will ignore “is_check_genotype”. |
missing_cutoff |
a cutoff of the missing rates of SNPs (default=0.15). Any SNPs with missing rates higher than the cutoff will be excluded from the analysis. |
max_maf |
a cutoff of the maximum minor allele frequencies (MAF) (default=1, no cutoff). Any SNPs with MAF > cutoff will be excluded from the analysis. |
estimate_MAF |
a numeric value indicating how to estimate MAFs for the weight calculation and the missing genotype imputation. If estimate_MAF=1 (default), SKAT uses all samples to estimate MAFs. If estimate_MAF=2, only samples with non-missing phenotypes and covariates are used to estimate MAFs. |
SSD.INFO |
an SSD_INFO object returned from Open_SSD. |
SetID |
a character value of Set ID. A set ID of each set can be found from SetInfo object in SSD.INFO. |
SetIndex |
a numeric value of Set index. A set index of each set can be found from SetInfo object in SSD.INFO. |
... |
further arguments to be passed to “SKAT” |
obj.SNPWeight |
an output object of Read_SNP_WeightFile (default=NULL). If NULL, the beta weight with the “weights.beta” parameter will be used. |
There are 6 types of pre-specified kernels: "linear", "linear.weighted", "IBS", "IBS.weighted", "quadratic" and "2wayIX". Among them, "2wayIX" is a product kernel consisting of main effects and SNP-SNP interaction terms.
If users want to use dosage values instead of genotypes, set is_dosage=TRUE. Please keep in mind that plink formatted files (so SSD files) cannot used for dosages. Instead, you should make a genotype matrix Z to run SKAT.
The kernel matrix for the weighted linear kernel is
, where G is a genotype matrix and W is a diagonal weight matrix.
Please note that it is different from the notation we used in the original SKAT paper,
which was
.
The Madsen and Browning (2009) weight is
in the current notation.
By the previous notation, it is
.
If you want to use the SSD file, you need to open it first using Open_SSD, and then use either SKAT.SSD.OneSet or SKAT.SSD.OneSet_SetIndex. Set index is a numeric value and automatically assigned to each set (from 1).
The r.corr represents a parameter of the unified test,
, where
is a SKAT test statistic,
and
is a weighted burden test statistic.
Therefore,
results in the original weighted linear kernel SKAT,
and
results in the weighted burden test (default:
).
If r.corr is a vector, SKAT-O will be conducted with adaptively seleting
from given r.corr values.
should be a value between 0 and 1. When method="optimal" or method="optimal.adj", the r.corr parameter will be ignored.
We slightly changed the implementation for SKAT-O to improve the estimation of p-values. You can run it by
using method="optimal.adj" or "SKATO". It uses a grid of eight points
for the search of the optimal
.
If you want to use the original implementation of SKAT-O, use method="optimal" that
carries out SKAT-O with an equal sized grid of 11 points (from 0 to 1).
If the true p.value is very small, you can have p.value=0 due to numerical reasons. In this case,
please see pval.zero.msg that shows how small it is.
For example, if the p.value is smaller than , it has "Pvalue < 1.000000e-60".
By default, SKAT uses impute.method="fixed" that imputes missing genotypes as the mean genotype values (2p). When variates are very rare and missing rates between cases and controls are highly unbalanced, impute.method="fixed" can yield inflated type I error rate. In this case, we recommend to use impute.method="bestguess", which does not suffer the same problem.
p.value |
p-value of SKAT. |
p.value.resampling |
p-values from resampled outcomes. You can get it when you use obj from SKAT_Null_Model function with resampling. See the SKAT_Null_Model. |
p.value.noadj |
p-value of SKAT without the small sample adjustment. It only appears when small sample adjustment is applied. |
p.value.noadj.resampling |
p-values from resampled outcomes without the small sample adjustment. |
pval.zero.msg |
(only when p.value=0) text message that shows how small the p.value is. ex. "Pvalue < 1.000000e-60" when the p.value is smaller than |
Q |
test statistic of SKAT. It has NA when method="optimal.adj" or "optimal". |
param |
estimated parameters of each method. |
param$Is_Converged |
(only with method="davies") an indicator of the convergence (1=convergence, 0=non-convergence). When 0 (not converged), "liu" method will be used to compute p-values. |
param$n.marker |
a number of SNPs in the genotype matrix. |
param$n.marker.test |
a number of SNPs used for the test. It can be different from param$n.marker when some markers are monomorphic or have higher missing rates than the missing_cutoff. |
test.snp.mac |
a vector of minor allele count (MAC) of the snps tested. The name is SNP-ID. |
Seunggeun Lee, Micheal Wu
Lee, S., Emond, M.J., Bamshad, M.J., Barnes, K.C., Rieder, M.J., Nickerson, D.A., NHLBI GO Exome Sequencing Project-ESP Lung Project Team, Christiani, D.C., Wurfel, M.M. and Lin, X. (2012) Optimal unified approach for rare variant association testing with application to small sample case-control whole-exome sequencing studies. American Journal of Human Genetics, 91, 224-237.
Lee, S., Wu, M. C., and Lin, X. (2012) Optimal tests for rare variant effects in sequencing association studies. Biostatistics, 13, 762-775.
Wu, M. C.*, Lee, S.*, Cai, T., Li, Y., Boehnke, M., and Lin, X. (2011) Rare Variant Association Testing for Sequencing Data Using the Sequence Kernel Association Test (SKAT). American Journal of Human Genetics, 89, 82-93. \ * contributed equally.
Wu, M. C., Kraft, P., Epstein, M. P.,Taylor, D., M., Chanock, S. J., Hunter, D., J., and Lin, X. (2010) Powerful SNP Set Analysis for Case-Control Genome-wide Association Studies. American Journal of Human Genetics, 86, 929-942.
Davies R.B. (1980) Algorithm AS 155: The Distribution of a Linear Combination of chi-2 Random Variables, Journal of the Royal Statistical Society. Series C , 29, 323-333.
H. Liu, Y. Tang, H.H. Zhang (2009) A new chi-square approximation to the distribution of non-negative definite quadratic forms in non-central normal variables, Computational Statistics and Data Analysis, 53, 853-856.
Duchesne, P. and Lafaye De Micheaux, P. (2010) Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods, Computational Statistics and Data Analysis, 54, 858-862.
data(SKAT.example) ############################################################# # SKAT with default Beta(1,25) Weights # - without covariates Z<-SKAT.example$Z # continuous trait obj<-SKAT_Null_Model(y.c ~ 1, out_type="C", data=SKAT.example) SKAT(Z, obj)$p.value # dichotomous trait obj<-SKAT_Null_Model(y.b ~ 1, out_type="D", data=SKAT.example) SKAT(Z, obj)$p.value ################################################## # SKAT with default Beta(1,25) Weights # - with covariates # continuous trait obj<-SKAT_Null_Model(y.c ~ X, out_type="C", data=SKAT.example) SKAT(Z, obj)$p.value obj.b<-SKAT_Null_Model(y.b ~ X, out_type="D", data=SKAT.example) SKAT(Z, obj.b)$p.value ################################################## # SKAT with default Beta(1,25) Weights # - Optimal Test SKAT(Z, obj, method="optimal.adj")$p.value # you can get the same p-value by using method="SKATO" SKAT(Z, obj, method="SKATO")$p.value ############################################################# # SKAT with Beta(1,30) Weights SKAT(Z, obj, weights.beta=c(1,30))$p.value
data(SKAT.example) ############################################################# # SKAT with default Beta(1,25) Weights # - without covariates Z<-SKAT.example$Z # continuous trait obj<-SKAT_Null_Model(y.c ~ 1, out_type="C", data=SKAT.example) SKAT(Z, obj)$p.value # dichotomous trait obj<-SKAT_Null_Model(y.b ~ 1, out_type="D", data=SKAT.example) SKAT(Z, obj)$p.value ################################################## # SKAT with default Beta(1,25) Weights # - with covariates # continuous trait obj<-SKAT_Null_Model(y.c ~ X, out_type="C", data=SKAT.example) SKAT(Z, obj)$p.value obj.b<-SKAT_Null_Model(y.b ~ X, out_type="D", data=SKAT.example) SKAT(Z, obj.b)$p.value ################################################## # SKAT with default Beta(1,25) Weights # - Optimal Test SKAT(Z, obj, method="optimal.adj")$p.value # you can get the same p-value by using method="SKATO" SKAT(Z, obj, method="SKATO")$p.value ############################################################# # SKAT with Beta(1,30) Weights SKAT(Z, obj, weights.beta=c(1,30))$p.value
Test for association between a set of SNPS/genes in the X chromosome and continuous or dichotomous outcomes using the kernel machine.
SKAT_ChrX(Z, obj, is_X.inact =TRUE , kernel = "linear.weighted", method="davies", weights.beta=c(1,25) , weights = NULL, impute.method = "fixed", r.corr=0, is_check_genotype=TRUE , is_dosage = FALSE, missing_cutoff=0.15, max_maf=1, estimate_MAF=1, SetID=NULL) SKAT_ChrY(Z, obj, kernel = "linear.weighted", method="davies", weights.beta=c(1,25) , weights = NULL, impute.method = "fixed", r.corr=0, is_check_genotype=TRUE , is_dosage = FALSE, missing_cutoff=0.15, max_maf=1, estimate_MAF=1, SetID=NULL)
SKAT_ChrX(Z, obj, is_X.inact =TRUE , kernel = "linear.weighted", method="davies", weights.beta=c(1,25) , weights = NULL, impute.method = "fixed", r.corr=0, is_check_genotype=TRUE , is_dosage = FALSE, missing_cutoff=0.15, max_maf=1, estimate_MAF=1, SetID=NULL) SKAT_ChrY(Z, obj, kernel = "linear.weighted", method="davies", weights.beta=c(1,25) , weights = NULL, impute.method = "fixed", r.corr=0, is_check_genotype=TRUE , is_dosage = FALSE, missing_cutoff=0.15, max_maf=1, estimate_MAF=1, SetID=NULL)
Z |
a numeric genotype matrix with each row as a different individual and each column as a separate gene/snp. Each genotype should be coded as 0, 1, 2, and 9 (or NA) for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele. Missing genotypes will be imputed by the simple Hardy-Weinberg equilibrium (HWE) based imputation. |
obj |
output object of the SKAT_Null_Model_ChrX function. For SKAT_ChrY, SKAT_Null_Model_ChrX function should be used with Model.Y=TRUE |
is_X.inact |
an indicator variable for the X-inactivation coding (default=TRUE). Male genotypes are coded as g=(0,2) when it is TRUE, and g=(0,1) when it is false. |
kernel |
a type of kernel (default= "linear.weighted"). |
method |
a method to compute the p-value (default= "davies"). See SKAT page for details. |
weights.beta |
a numeric vector of parameters of beta weights. See SKAT page for details. |
weights |
a numeric vector of weights for the weighted kernels. See SKAT page for details. |
impute.method |
a method to impute missing genotypes (default= "fixed"). "fixed" imputes missing genotypes by assigning the mean genotype value (2p), and "bestguess" uses best guess genotype values. |
r.corr |
the |
is_check_genotype |
a logical value indicating whether to check the validity of the genotype matrix Z (default= TRUE). See SKAT page for details. |
is_dosage |
a logical value indicating whether the matrix Z is a dosage matrix. If it is TRUE, SKAT will ignore “is_check_genotype”. |
missing_cutoff |
a cutoff of the missing rates of SNPs (default=0.15). Any SNPs with missing rates higher than the cutoff will be excluded from the analysis. |
max_maf |
a cutoff of the maximum minor allele frequencies (MAF) (default=1, no cutoff). Any SNPs with MAF > cutoff will be excluded from the analysis. |
estimate_MAF |
a numeric value indicating how to estimate MAFs for the weight calculation and the missing genotype imputation. See SKAT page for details. |
SetID |
Internal use only. |
For details of parameters, please see SKAT page.
p.value |
p-value of SKAT. |
p.value.resampling |
p-values from resampled outcome. You can get it when you use obj from SKAT_Null_Model function with resampling. See the SKAT_Null_Model. |
p.value.noadj |
p-value of SKAT without the small sample adjustment. It only appears when small sample adjustment is applied. |
p.value.noadj.resampling |
p-values from resampled outcome without the small sample adjustment. It only appears when small sample adjustment is applied. |
pval.zero.msg |
(only when p.value=0) text message that shows how small the p.value is. ex. "Pvalue < 1.000000e-60" when p.value is smaller than |
Q |
the test statistic of SKAT. It has NA when method="optimal.adj" or "optimal". |
param |
estimated parameters of each method. |
param$Is_Converged |
(only with method="davies") an indicator of the convergence. When 0 (not converged), "liu" method is used to compute p-value. |
param$n.marker |
a number of SNPs in the genotype matrix |
param$n.marker.test |
a number of SNPs used for the test. It can be different from param$n.marker when some markers are monomorphic or have higher missing rates than the missing_cutoff. |
Clement Ma and Seunggeun Lee
data(SKAT.example.ChrX) Z<-SKAT.example.ChrX$Z ############################################################# # Compute the P-value of SKAT # binary trait obj.x<-SKAT_Null_Model_ChrX(y ~ x1 +x2 + Gender, SexVar="Gender", out_type="D", data=SKAT.example.ChrX) # SKAT SKAT_ChrX(Z, obj.x, kernel = "linear.weighted") # Burden SKAT_ChrX(Z, obj.x, kernel = "linear.weighted", r.corr=1) # SKAT-O SKAT_ChrX(Z, obj.x, kernel = "linear.weighted", method="SKATO") ############################################################# # Fit the Y chromosome function # In this example, since male has only one copy of X (and Y), we reuse X chromosome genotype matrix. # binary trait obj.x<-SKAT_Null_Model_ChrX(y ~ x1 +x2 + Gender, SexVar="Gender", out_type="D", Model.Y=TRUE, data=SKAT.example.ChrX) SKAT_ChrY(Z, obj.x, kernel = "linear.weighted", method="SKATO")
data(SKAT.example.ChrX) Z<-SKAT.example.ChrX$Z ############################################################# # Compute the P-value of SKAT # binary trait obj.x<-SKAT_Null_Model_ChrX(y ~ x1 +x2 + Gender, SexVar="Gender", out_type="D", data=SKAT.example.ChrX) # SKAT SKAT_ChrX(Z, obj.x, kernel = "linear.weighted") # Burden SKAT_ChrX(Z, obj.x, kernel = "linear.weighted", r.corr=1) # SKAT-O SKAT_ChrX(Z, obj.x, kernel = "linear.weighted", method="SKATO") ############################################################# # Fit the Y chromosome function # In this example, since male has only one copy of X (and Y), we reuse X chromosome genotype matrix. # binary trait obj.x<-SKAT_Null_Model_ChrX(y ~ x1 +x2 + Gender, SexVar="Gender", out_type="D", Model.Y=TRUE, data=SKAT.example.ChrX) SKAT_ChrY(Z, obj.x, kernel = "linear.weighted", method="SKATO")
Sequence Kernel association test for the combined effect of common and rare variants.
SKAT_CommonRare(Z, obj, weights.beta.rare=c(1,25) , weights.beta.common=c(0.5,0.5), weights=NULL , method="C", r.corr.rare=0, r.corr.common=0, CommonRare_Cutoff=NULL , test.type="Joint", is_dosage=FALSE, missing_cutoff=0.15 , estimate_MAF=1, SetID1=NULL) SKAT_CommonRare.SSD.OneSet(SSD.INFO , SetID, obj, ..., obj.SNPWeight=NULL) SKAT_CommonRare.SSD.OneSet_SetIndex(SSD.INFO , SetIndex, obj, ..., obj.SNPWeight=NULL )
SKAT_CommonRare(Z, obj, weights.beta.rare=c(1,25) , weights.beta.common=c(0.5,0.5), weights=NULL , method="C", r.corr.rare=0, r.corr.common=0, CommonRare_Cutoff=NULL , test.type="Joint", is_dosage=FALSE, missing_cutoff=0.15 , estimate_MAF=1, SetID1=NULL) SKAT_CommonRare.SSD.OneSet(SSD.INFO , SetID, obj, ..., obj.SNPWeight=NULL) SKAT_CommonRare.SSD.OneSet_SetIndex(SSD.INFO , SetIndex, obj, ..., obj.SNPWeight=NULL )
Z |
a numeric genotype matrix with each row as a different individual and each column as a separate gene/snp. Each genotype should be coded as 0, 1, 2, and 9 (or NA) for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele. Missing genotypes will be imputed by the simple Hardy-Weinberg equilibrium (HWE) based imputation. |
obj |
an output object of the SKAT_Null_Model function. |
weights.beta.rare |
a numeric vector of parameters of beta weights for rare variants (default=c(1,25)). |
weights.beta.common |
a numeric vector of parameters of beta weights for common variants (default=c(0.5,0.5)). |
weights |
a numeric vector of weights for both common and rare variants. When it is NULL, the beta weight with the “weights.beta.rare” and “weights.beta.common” parameter are used. When method ="C", the coefficient to combine common and rare variants test statistics will be calculated after applying the weights to variants. |
method |
a method to combine common and rare variant effects (default="C"). "C" represents the combined sum test, and "A" represents the adaptive sum test. "AR" represents a different type of adaptive test in which common variants are projected over rare variants. |
r.corr.rare |
the |
r.corr.common |
the |
CommonRare_Cutoff |
MAF cutoff for common vs rare variants (default=NULL). It should be a numeric value between
0 and 0.5, or NULL. When it is NULL, |
test.type |
a string to indicate test type (default="Joint"). "Joint" indicates the joint test of the combined effects of common and rare variants. "Rare.Only" and "Common.Only" will conduct test only with rare and common variants, respectively. |
is_dosage |
see SKAT |
missing_cutoff |
see SKAT |
estimate_MAF |
see SKAT |
SetID1 |
internal use only |
SSD.INFO |
an SSD_INFO object returned from Open_SSD. |
SetID |
a character value of Set ID. You can find a set ID of each set from SetInfo object of SSD.INFO |
SetIndex |
a numeric value of Set index. You can find a set index of each set from SetInfo object of SSD.INFO |
obj.SNPWeight |
output object from Read_SNP_WeightFile (default=NULL). If NULL, the beta weight with the “weights.beta” parameter will be used. |
... |
furthuer arguments to be passed to “SKAT_CommonRare” |
The small sample adjustment for binary traits is not implemented for "A" and "AR".
p.value |
p-value. |
p.value.resampling |
p-values from resampled phenotypes. You can get it when you use obj from SKAT_Null_Model function with resampling. See the SKAT_Null_Model. |
n.rare |
the number of rare variants used for the test |
n.common |
the number of common variants used for the test |
Cutoff |
the MAF cut-off to divide common and rare variants |
Q |
the test statistic of SKAT. It has NA when method="A" or "AR". |
param |
estimated parameters of each method. |
param$Is_Converged |
an indicator of the convergence. 1 indicates the method is converged, and 0 indicates the method is not converged. When 0 (not converged), "liu.mod" method is used to compute p-value. |
param$n.marker |
a number of SNPs in the genotype matrix |
param$n.marker.test |
a number of SNPs used for the test. It can be different from param$n.marker when some markers are monomorphic or have higher missing rates than the missing_cutoff. |
test.snp.mac |
a vector of minor allele count (MAC) of the snps tested. The name is SNP-ID. |
Seunggeun Lee
Ionita-Laza, I.*, Lee, S.*, Makarov, V., Buxbaum, J. Lin, X. (2013). Sequence kernel association tests for the combined effect of rare and common variants. American Journal of Human Genetics, 92, 841-853. * contributed equally.
data(SKAT.example) Z<-SKAT.example$Z # continuous trait obj<-SKAT_Null_Model(y.c ~ X, out_type="C", data=SKAT.example) SKAT_CommonRare(Z, obj)$p.value SKAT_CommonRare(Z, obj, method="A")$p.value SKAT_CommonRare(Z, obj, method="AR")$p.value # dichotomous trait obj<-SKAT_Null_Model(y.b ~ X, out_type="D", data=SKAT.example) # Combined sum test in the manuscript (SKAT-C and Burden-C) SKAT_CommonRare(Z, obj)$p.value SKAT_CommonRare(Z, obj, r.corr.rare=1, r.corr.common=1 )$p.value # Test only with common variant SKAT_CommonRare(Z, obj, test.type="Common.Only")$p.value # Test only with rare variant SKAT_CommonRare(Z, obj, test.type="Rare.Only")$p.value # Use CommonRare_Cutoff=0.01 instead of CommonRare_Cutoff = NULL SKAT_CommonRare(Z, obj, CommonRare_Cutoff=0.01)$p.value # Use custom weights; the first 10 variants have higher weights weights<-rep(1,67); weights[1:10]<-2 SKAT_CommonRare(Z, obj, weights=weights)$p.value
data(SKAT.example) Z<-SKAT.example$Z # continuous trait obj<-SKAT_Null_Model(y.c ~ X, out_type="C", data=SKAT.example) SKAT_CommonRare(Z, obj)$p.value SKAT_CommonRare(Z, obj, method="A")$p.value SKAT_CommonRare(Z, obj, method="AR")$p.value # dichotomous trait obj<-SKAT_Null_Model(y.b ~ X, out_type="D", data=SKAT.example) # Combined sum test in the manuscript (SKAT-C and Burden-C) SKAT_CommonRare(Z, obj)$p.value SKAT_CommonRare(Z, obj, r.corr.rare=1, r.corr.common=1 )$p.value # Test only with common variant SKAT_CommonRare(Z, obj, test.type="Common.Only")$p.value # Test only with rare variant SKAT_CommonRare(Z, obj, test.type="Rare.Only")$p.value # Use CommonRare_Cutoff=0.01 instead of CommonRare_Cutoff = NULL SKAT_CommonRare(Z, obj, CommonRare_Cutoff=0.01)$p.value # Use custom weights; the first 10 variants have higher weights weights<-rep(1,67); weights[1:10]<-2 SKAT_CommonRare(Z, obj, weights=weights)$p.value
This function computes p-values of robust burden test, SKAT, and SKAT-O for binary traits using SPA and ER.
SKAT_CommonRare_Robust(Z, obj, kernel = "linear.weighted" , method="SKAT", r.corr=NULL,weights.beta.rare=c(1,25) , weights.beta.common=c(0.5,0.5), weights = NULL , CommonRare_Cutoff=NULL, impute.method = "bestguess" , is_check_genotype=TRUE,is_dosage = FALSE, missing_cutoff=0.15 , max_maf=1, estimate_MAF=1) SKAT_CommonRare_Robust.SSD.OneSet(SSD.INFO , SetID, obj, ..., obj.SNPWeight=NULL) SKAT_CommonRare_Robust.SSD.OneSet_SetIndex(SSD.INFO , SetIndex, obj, ..., obj.SNPWeight=NULL )
SKAT_CommonRare_Robust(Z, obj, kernel = "linear.weighted" , method="SKAT", r.corr=NULL,weights.beta.rare=c(1,25) , weights.beta.common=c(0.5,0.5), weights = NULL , CommonRare_Cutoff=NULL, impute.method = "bestguess" , is_check_genotype=TRUE,is_dosage = FALSE, missing_cutoff=0.15 , max_maf=1, estimate_MAF=1) SKAT_CommonRare_Robust.SSD.OneSet(SSD.INFO , SetID, obj, ..., obj.SNPWeight=NULL) SKAT_CommonRare_Robust.SSD.OneSet_SetIndex(SSD.INFO , SetIndex, obj, ..., obj.SNPWeight=NULL )
Z |
a numeric genotype matrix with each row as a different individual and each column as a separate gene/snp. Each genotype should be coded as 0, 1, 2, and 9 (or NA) for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele. Now we support both matrix and sparse matrix. |
obj |
output object from SKAT_Null_Model. |
kernel |
type of kernel (default= "linear.weighted"). The possible choices are "linear" and "linear.weighted". |
method |
type of gene based test (default= "SKAT"). The possible choices are "SKAT", "Burden" and "SKATO", which represents robust SKAT, Burden and SKAT-O tests, respectively. |
r.corr |
the |
weights.beta.rare |
a numeric vector of parameters of beta weights for rare variants. It is only used for weighted kernels. If you want to use your own weights, please specify the "weights" parameter. |
weights.beta.common |
a numeric vector of parameters of beta weights for common variants. It is only used for weighted kernels. If you want to use your own weights, please specify the "weights" parameter. |
weights |
a numeric vector of weights for the weighted kernels. See SKAT page for details. |
CommonRare_Cutoff |
MAF cutoff for common vs rare variants (default=NULL). It should be a numeric value between
0 and 0.5, or NULL. When it is NULL, |
impute.method |
a method to impute missing genotypes (default= "bestguess"). "bestguess" imputes missing genotypes as most likely values (0,1,2), "random" imputes missing genotypes by generating binomial(2,p) random variables (p is the MAF), and "fixed" imputes missing genotypes by assigning the mean genotype value (2p). |
is_check_genotype |
a logical value indicating whether to check the validity of the genotype matrix Z (default= TRUE). See SKAT page for details. |
is_dosage |
a logical value indicating whether the matrix Z is a dosage matrix. If it is TRUE, SKAT will ignore “is_check_genotype”. |
missing_cutoff |
a cutoff of the missing rates of SNPs (default=0.15). Any SNPs with missing rates higher than the cutoff will be excluded from the analysis. |
max_maf |
a cutoff of the maximum minor allele frequencies (MAF) (default=1, no cutoff). Any SNPs with MAF > cutoff will be excluded from the analysis. |
estimate_MAF |
a numeric value indicating how to estimate MAFs for the weight calculation and the missing genotype imputation. See SKAT page for details. |
SSD.INFO |
an SSD_INFO object returned from Open_SSD. |
SetID |
a character value of Set ID. You can find a set ID of each set from SetInfo object of SSD.INFO. In SKATBinary_Robust function, this parameter is for the internal use only. |
SetIndex |
a numeric value of Set index. You can find a set index of each set from SetInfo object of SSD.INFO |
obj.SNPWeight |
output object from Read_SNP_WeightFile (default=NULL). If NULL, the beta weight with the “weights.beta” parameter will be used. |
... |
further arguments to be passed to “SKATBinary_Robust” |
p.value |
p-value. It will be the p-value based on robust methods. |
p.value_singlevariant |
p-value for each single variant in this region-based test. |
param$n.marker |
a number of SNPs in the genotype matrix. |
param$n.marker.test |
a number of SNPs used for the test. It can be different from param$n.marker when some markers are monomorphic or have higher missing rates than the missing_cutoff. |
param$rho |
the |
n.common |
A number of common markers used for the test. |
mac.common |
The total minor allele count (MAC) of common markers used for the test. |
n.rare |
A number of rare markers used for the test. |
mac.rare |
The total minor allele count (MAC) of rare markers used for the test. |
Zhangchen Zhao
Zhao, Z., Bi, W., Zhou, W., VandeHaar, P., Fritsche, L. G., & Lee, S. (2019). UK-Biobank Whole Exome Sequence Binary Phenome Analysis with Robust Region-based Rare Variant Test. The American Journal of Human Genetics, in press.
data(SKATBinary.example) Z<-SKATBinary.example$Z obj<-SKAT_Null_Model(y ~ x1 + x2, out_type="D", data=SKATBinary.example) # run SKAT (default method) with Hybrid out = SKAT_CommonRare_Robust(Z, obj) # p-value out$p.value # # Run burden and SKAT SKAT_CommonRare_Robust(Z, obj, method="Burden")$p.value SKAT_CommonRare_Robust(Z, obj, method="SKAT")$p.value
data(SKATBinary.example) Z<-SKATBinary.example$Z obj<-SKAT_Null_Model(y ~ x1 + x2, out_type="D", data=SKATBinary.example) # run SKAT (default method) with Hybrid out = SKAT_CommonRare_Robust(Z, obj) # p-value out$p.value # # Run burden and SKAT SKAT_CommonRare_Robust(Z, obj, method="Burden")$p.value SKAT_CommonRare_Robust(Z, obj, method="SKAT")$p.value
Compute model parameters and residuals for SKAT with incorporating the kinship structure.
SKAT_NULL_emmaX (formula, data=NULL, K=NULL, Kin.File=NULL, ngrids=100, llim=-10, ulim=10, esp=1e-10, Is.GetEigenResult=FALSE)
SKAT_NULL_emmaX (formula, data=NULL, K=NULL, Kin.File=NULL, ngrids=100, llim=-10, ulim=10, esp=1e-10, Is.GetEigenResult=FALSE)
formula |
an object of class “formula”: a symbolic description of the NULL model to be fitted. |
data |
an optional data frame containing the variables in the model (default=NULL). If it is NULL, the variables are taken from 'environment(formula)' |
K |
a kinship matrix. If K=NULL, the function reads Kin.File to get a kinship matrix. |
Kin.File |
an emmax-kin output file name. If K=NULL, the function reads this file. |
ngrids |
Number of grids to search for the optimal variance component |
llim |
Lower bound of log ratio of two variance components |
ulim |
Upper bound of log ratio of two variance components |
esp |
Tolerance of numerical precision error |
Is.GetEigenResult |
Return intermediate eigen-decomposition results |
The Emma package code was used to implement this function.
Resampling is not implemented.
This function returns an object that has model parameters and residuals of the NULL model of no associations. After obtaining it, use SKAT function to carry out the association test.
Seunggeun Lee
data(SKAT.fam.example) K = SKAT.fam.example$K Z = SKAT.fam.example$Z obj<-SKAT_NULL_emmaX(y ~ X, K=K, data=SKAT.fam.example) SKAT(Z, obj)$p.value # SKAT-O SKAT(Z, obj, method="optimal.adj")$p.value
data(SKAT.fam.example) K = SKAT.fam.example$K Z = SKAT.fam.example$Z obj<-SKAT_NULL_emmaX(y ~ X, K=K, data=SKAT.fam.example) SKAT(Z, obj)$p.value # SKAT-O SKAT(Z, obj, method="optimal.adj")$p.value
Compute model parameters and residuals for SKAT. You also can obtain resampled residuals that can be used to compute resampling p-value or to control family-wise error rate.
SKAT_Null_Model(formula, data=NULL, out_type="C", n.Resampling=0 , type.Resampling="bootstrap", Adjustment=TRUE) SKAT_Null_Model_ChrX(formula, SexVar, data=NULL, out_type="C", n.Resampling=0 , type.Resampling="bootstrap", Adjustment=TRUE, Model.Y=FALSE)
SKAT_Null_Model(formula, data=NULL, out_type="C", n.Resampling=0 , type.Resampling="bootstrap", Adjustment=TRUE) SKAT_Null_Model_ChrX(formula, SexVar, data=NULL, out_type="C", n.Resampling=0 , type.Resampling="bootstrap", Adjustment=TRUE, Model.Y=FALSE)
formula |
an object of class “formula”: a symbolic description of the NULL model to be fitted. |
data |
an optional data frame containing the variables in the model (default=NULL). If it is NULL, the variables are taken from 'environment(formula)' |
out_type |
an indicator of the outcome type. "C" for the continuous outcome and "D" for the dichotomous outcome. |
n.Resampling |
a numeric value of the number of resampling (default=0). If you don't want resampling, please set n.Resampling=0. |
type.Resampling |
resampling methods (default="bootstrap"). see details. |
Adjustment |
If TRUE, a small sample adjustment will be applied when the sample size < 2000 and the trait is binary (default=TRUE). See details |
SexVar |
a sex variable name in “formula”. |
Model.Y |
indicator variable whether the model will be also used for ChrY. It should be TRUE if you want to use SKAT_ChrY function |
There are 2 different methods to get resampled residuals. "bootstrap" conducts the parametric bootstrap to resample residuals under the NULL model with considering covariates. "bootstrap.fast" (only for binary traits) is a fast implementation of "bootstrap". If there is no covariate, "bootstrap" is equivalent to the permutation method.
When the trait is binary, the SKAT can produce conservative results when the sample size is small. To address this, we developed a small sample adjustment method, which adjusts asymptotic null distribution by estimating small sample moments. See also SKAT_Null_Model_MomentAdjust.
Since small sample adjustment uses random sampling to estimate the kurtosis of the test statistics, SKAT with the (kurtosis-based) small sample adjustment can yield slightly different p-values for each run. If you want to reproduce p-values, please set a seed number using set.seed function in R.
We recently developed more advanced methods to get p-values for binary traits, and the methods are implemented in SKATBinary. We recommend to use SKATBinary function instead of SKAT when your trait is binary.
This function returns an object that has model parameters and residuals of the NULL model of no association between genetic variables and outcome phenotypes. After obtaining it, please use SKAT function to conduct the association test.
Seunggeun Lee
data(SKAT.example) Z<-SKAT.example$Z ############################################################# # Compute the P-value of SKAT # binary trait obj<-SKAT_Null_Model(y.b ~ X, out_type="D", data=SKAT.example) SKAT(Z, obj, kernel = "linear.weighted")$p.value ############################################################# # When you have no covariate to adjust. # binary trait obj<-SKAT_Null_Model(y.b ~ 1, out_type="D", data=SKAT.example) SKAT(Z, obj, kernel = "linear.weighted")$p.value ######################################################### # Small sample adjustment IDX<-c(1:100,1001:1100) # With-adjustment obj<-SKAT_Null_Model(y.b[IDX] ~ X[IDX,],out_type="D", data=SKAT.example) SKAT(Z[IDX,], obj, kernel = "linear.weighted")$p.value # Without-adjustment obj<-SKAT_Null_Model(y.b[IDX] ~ X[IDX,],out_type="D", Adjustment=FALSE, data=SKAT.example) SKAT(Z[IDX,], obj, kernel = "linear.weighted")$p.value ######################################################### # Use SKATBinary SKATBinary(Z[IDX,], obj, kernel = "linear.weighted")$p.value
data(SKAT.example) Z<-SKAT.example$Z ############################################################# # Compute the P-value of SKAT # binary trait obj<-SKAT_Null_Model(y.b ~ X, out_type="D", data=SKAT.example) SKAT(Z, obj, kernel = "linear.weighted")$p.value ############################################################# # When you have no covariate to adjust. # binary trait obj<-SKAT_Null_Model(y.b ~ 1, out_type="D", data=SKAT.example) SKAT(Z, obj, kernel = "linear.weighted")$p.value ######################################################### # Small sample adjustment IDX<-c(1:100,1001:1100) # With-adjustment obj<-SKAT_Null_Model(y.b[IDX] ~ X[IDX,],out_type="D", data=SKAT.example) SKAT(Z[IDX,], obj, kernel = "linear.weighted")$p.value # Without-adjustment obj<-SKAT_Null_Model(y.b[IDX] ~ X[IDX,],out_type="D", Adjustment=FALSE, data=SKAT.example) SKAT(Z[IDX,], obj, kernel = "linear.weighted")$p.value ######################################################### # Use SKATBinary SKATBinary(Z[IDX,], obj, kernel = "linear.weighted")$p.value
Compute model parameters and residuals for SKAT with adjusting small sample moments when the trait is binary. You can also obtain resampled residuals that can be used to compute resampling p-value or to control family-wise error rate.
SKAT_Null_Model_MomentAdjust(formula, data=NULL, n.Resampling=0, type.Resampling="bootstrap", is_kurtosis_adj=TRUE, n.Resampling.kurtosis=10000)
SKAT_Null_Model_MomentAdjust(formula, data=NULL, n.Resampling=0, type.Resampling="bootstrap", is_kurtosis_adj=TRUE, n.Resampling.kurtosis=10000)
formula |
object of class “formula”: a symbolic description of the NULL model to be fitted. |
data |
optional data frame containing the variables in the model (default=NULL). If it is NULL, the variables are taken from 'environment(formula)' |
n.Resampling |
a numeric value of the number of resampling (default=0). If you don't want resampling, please set n.Resampling=0. |
type.Resampling |
resampling methods (default="bootstrap"). see details. |
is_kurtosis_adj |
If TRUE, the kurtosis adjustment will be applied. The small sample kurtosis will be estimated using the resampled phenotypes. |
n.Resampling.kurtosis |
a numeric value of the number of resampling for kurtosis estimation (default=10000). If is_kurtosis_ad=FALSE, it will be ignored. |
When the trait is binary, the SKAT can produce conservative results when the sample size is small. To address this, we developed a small sample adjustment method, which adjust asymptotic null distribution by estimating small sample variance and kurtosis. The small smaple variance is estimated analytically, and the small sample kurtosis is estimated using the resampling approach.
There are 2 different methods to get resampled residuals. "bootstrap" conducts the parametric bootstrap to resample residuals under the NULL model with considering covariates. "bootstrap.fast" (only for binary traits) is a fast implementation of "bootstrap". If there is no covariate, "bootstrap" is equivalent to the permutation method.
Since the kurtosis is estimated using random samples, SKAT with the kurtosis-based small sample adjustment can yield slightly different p-values for each run. If you want to reproduce p-values, please set a seed number using set.seed function in R.
This function returns an object that has model parameters and residuals of the NULL model of no association between genetic variants and outcome phenotypes. After obtaining it, please use SKAT function to conduct association tests.
Seunggeun Lee
data(SKAT.example) Z<-SKAT.example$Z ############################################################# # Compute the P-value of SKAT IDX<-c(1:100,1001:1100) # binary trait obj<-SKAT_Null_Model_MomentAdjust(y.b[IDX] ~ X[IDX,], data=SKAT.example) SKAT(Z[IDX,], obj, kernel = "linear.weighted")$p.value
data(SKAT.example) Z<-SKAT.example$Z ############################################################# # Compute the P-value of SKAT IDX<-c(1:100,1001:1100) # binary trait obj<-SKAT_Null_Model_MomentAdjust(y.b[IDX] ~ X[IDX,], data=SKAT.example) SKAT(Z[IDX,], obj, kernel = "linear.weighted")$p.value
Example data for SKAT.
SKAT.example contains the following objects:
a numeric genotype matrix of 2000 individuals and 67 SNPs. Each row represents a different individual, and each column represents a different SNP marker.
a numeric matrix of 2 covariates.
a numeric vector of continuous phenotypes.
a numeric vector of binary phenotypes.
Example data for SKAT.
SKAT.example contains the following objects:
a numeric genotype matrix of 2000 individuals and 52 SNPs. Each row represents a different individual, and each column represents a different SNP marker.
a numeric vector of continuous covariates.
a numeric vector of binary covariates.
a numeric vector of gender (male=1, female=2).
a numeric vector of binary phenotypes.
a list object of 10 genotype matrices.
Example data for SKAT_emmaX.
Example contains the following objects:
a numeric genotype matrix Each row represents a different individual, and each column represents a different SNP marker.
a numeric matrix of covariates.
a numeric vector of continuous phenotypes.
a kinship matrix.
Haplotype dataset generated by COSI with mimicking linkage disequilibrium (LD) structure of European ancestry.
This list object contains the following objects:
a numeric matrix of 10,000 haplotypes over 200k BP region. Each row represents a different haplotype, and each column represents a different SNP marker. It is generated by the calibration coalescent model (COSI) with mimicking LD structure of European ancestry.
a dataframe of SNP information.
Schaffner, S.F. and Foo, C. and Gabriel, S. and Reich, D. and Daly, M.J. and Altshuler, D. (2005) Calibrating a coalescent simulation of human genome sequence variation. Genome Research, 15, 1576-1583.
Iteratively carry out association tests with phenotypes and SNP sets in SSD file.
SKAT.SSD.All(SSD.INFO, obj, ..., obj.SNPWeight=NULL) SKATBinary.SSD.All(SSD.INFO, obj, ..., obj.SNPWeight=NULL) SKATBinary_Robust.SSD.All(SSD.INFO, obj, ...,obj.SNPWeight=NULL) SKAT_CommonRare.SSD.All(SSD.INFO, obj, ..., obj.SNPWeight=NULL) SKAT_CommonRare_Robust.SSD.All(SSD.INFO, obj, ..., obj.SNPWeight=NULL)
SKAT.SSD.All(SSD.INFO, obj, ..., obj.SNPWeight=NULL) SKATBinary.SSD.All(SSD.INFO, obj, ..., obj.SNPWeight=NULL) SKATBinary_Robust.SSD.All(SSD.INFO, obj, ...,obj.SNPWeight=NULL) SKAT_CommonRare.SSD.All(SSD.INFO, obj, ..., obj.SNPWeight=NULL) SKAT_CommonRare_Robust.SSD.All(SSD.INFO, obj, ..., obj.SNPWeight=NULL)
SSD.INFO |
SSD_INFO object returned from Open_SSD. |
obj |
output object from SKAT_Null_Model. |
... |
further arguments to be passed to “SKAT” or “SKATBinary”. |
obj.SNPWeight |
output object from Read_SNP_WeightFile (default=NULL). If NULL, the beta weight with the “weights.beta” parameter will be used. |
Please see SKAT or SKATBinary for details.
results |
dataframe that contains SetID, p-values (P.value), the number of markers in the SNP sets (N.Marker.All), and the number of markers to test for an association after excluding non-polymorphic or high missing rates markers (N.Marker.Test). The output dataframe from SKATBinary.SSD.All (and others) have more columns. For example, the outcome from SKATBinary.SSD.All have columns for the method to compute p-values and the minimum achievable p-values (MAP). |
P.value.Resampling |
the matrix that contains p-values of resampled phenotypes. |
OUT.snp.mac |
each element in the list is a vector of MAC of SNPs used in the test. The names are SNP-IDs. |
Seunggeun Lee
This function computes p-values of Burden test, SKAT, and SKAT-O for binary traits using asymptotic and efficient resampling methods.
SKATBinary(Z, obj, kernel = "linear.weighted", method="SKAT" , method.bin="Hybrid", weights.beta=c(1,25), weights = NULL , r.corr=0, impute.method = "bestguess", is_check_genotype=TRUE , is_dosage = FALSE, missing_cutoff=0.15, max_maf=1 , estimate_MAF=1, N.Resampling=2 *10^6, seednum=100, epsilon=10^-6 , SetID=NULL) SKATBinary.SSD.OneSet(SSD.INFO, SetID, obj, ... ,obj.SNPWeight=NULL) SKATBinary.SSD.OneSet_SetIndex(SSD.INFO, SetIndex, obj, ... ,obj.SNPWeight=NULL)
SKATBinary(Z, obj, kernel = "linear.weighted", method="SKAT" , method.bin="Hybrid", weights.beta=c(1,25), weights = NULL , r.corr=0, impute.method = "bestguess", is_check_genotype=TRUE , is_dosage = FALSE, missing_cutoff=0.15, max_maf=1 , estimate_MAF=1, N.Resampling=2 *10^6, seednum=100, epsilon=10^-6 , SetID=NULL) SKATBinary.SSD.OneSet(SSD.INFO, SetID, obj, ... ,obj.SNPWeight=NULL) SKATBinary.SSD.OneSet_SetIndex(SSD.INFO, SetIndex, obj, ... ,obj.SNPWeight=NULL)
Z |
a numeric genotype matrix with each row as a different individual and each column as a separate gene/snp. Each genotype should be coded as 0, 1, 2, and 9 (or NA) for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele. |
obj |
output object from SKAT_Null_Model. |
kernel |
type of kernel (default= "linear.weighted"). See SKAT page for details. |
method |
type of gene based test (default= "SKAT"). The possible choices are "SKAT", "Burden" and "SKATO", which represents SKAT, Burden and SKAT-O tests, respectively. This parameter differs from the "method" parameter in SKAT function. "Burden" is equivalent to method="davies" and r.corr=1 and "SKATO" is equivalent to method="optimal.adj" in the SKAT function. When method="Burden" or method="SKATO", r.corr will be ignored. |
method.bin |
type of method to compute a p-value (default="Hybrid"). Possible choices are "Hybrid", "ER", "ER.A", "QA", "MA" and "UA". See details |
weights.beta |
a numeric vector of parameters of beta weights. It is only used for weighted kernels. If you want to use your own weights, please specify the “weights” parameter. |
weights |
a numeric vector of weights for the weighted kernels. See SKAT page for details. |
impute.method |
a method to impute missing genotypes (default= "bestguess"). "bestguess" imputes missing genotypes as most likely values (0,1,2), "random" imputes missing genotypes by generating binomial(2,p) random variables (p is the MAF), and "fixed" imputes missing genotypes by assigning the mean genotype value (2p). |
r.corr |
the |
is_check_genotype |
a logical value indicating whether to check the validity of the genotype matrix Z (default= TRUE). See SKAT page for details. |
is_dosage |
a logical value indicating whether the matrix Z is a dosage matrix. If it is TRUE, SKAT will ignore “is_check_genotype”. |
missing_cutoff |
a cutoff of the missing rates of SNPs (default=0.15). Any SNPs with missing rates higher than the cutoff will be excluded from the analysis. |
max_maf |
a cutoff of the maximum minor allele frequencies (MAF) (default=1, no cutoff). Any SNPs with MAF > cutoff will be excluded from the analysis. |
estimate_MAF |
a numeric value indicating how to estimate MAFs for the weight calculation and the missing genotype imputation. See SKAT page for details. |
N.Resampling |
a number of resampling to be conducted to get p-values (default=2 *10^6). |
seednum |
a seed number for random number generation (default=100). If NULL, no seed number will be assigned. |
epsilon |
a precision level (default=10^-6). |
SSD.INFO |
an SSD_INFO object returned from Open_SSD. |
SetID |
a character value of Set ID. You can find a set ID of each set from SetInfo object of SSD.INFO. In SKATBinary function, this parameter is for the internal use only. |
SetIndex |
a numeric value of Set index. You can find a set index of each set from SetInfo object of SSD.INFO |
... |
further arguments to be passed to “SKATBinary” |
obj.SNPWeight |
an output object of Read_SNP_WeightFile (default=NULL). If NULL, the beta weight with the “weights.beta” parameter will be used. |
This function implements six methods (method.bin) to compute p-values: 1) Efficient resampling (ER); 2) Quantile adjusted moment matching (QA); 3) Moment matching adjustment (MA); 4) No adjustment (UA); 5) Adaptive ER (ER.A); and 6) Hybrid. "Hybrid" selects a method based on the total minor allele count (MAC), the number of individuals with minor alleles (m), and the degree of case-control imbalance. When method.bin="ER" or "ER.A", SKATBinary compute mid-p-values and minimum achievable mid p-values.
If seednum is not NULL, set.seed(seednum) function is used to specify seeds to get the same p-values of ER based methods for different runs. Therefore, please set seednum=NULL, if you do not want to set seeds.
SKATBinary uses impute.method="bestguess" as a default method for the imputation, which is different from SKAT that uses impute.method="fixed" as a default method. We changed it because SKATBinary with impute.method="fixed" can yield false positives when variates are very rare and missing rates between cases and controls are unbalanced. When missing rates between cases and controls are highly unbalanced, SKAT impute.method="fixed" can also yield false positives, but it happens less likely. So we did not change the default imputation method in SKAT.
p.value |
p-value. It will be the mid p-value if ER or ER.A are used to compute the p-value. |
p.value.standard |
(ER and ER.A only) standard p-value. |
p.value.resampling |
p-values from resampled outcome. You can obtain it when n.Resampling (in SKAT_Null_Model) > 0. See SKAT_Null_Model page. |
p.value.standard.resampling |
(ER and ER.A only)standard p-values from resampled outcomes. |
m |
the number of individuals with minor alleles. |
MAP |
minimum possible p-values. It is available when the method.bin="ER" and m is sufficiently small. |
MAC |
total minor allele count (MAC). |
n.total |
(ER only) the number of resampling to be generated to get the p-value. It can be smaller than N.Resampling when the total number of configurations of case-controls among individuals with minor alleles are smaller than N.Resampling. |
is.accurate |
logical value for the accuracy of the p-value. If it is false, more resampling is needed to accurately estimate the p-value. |
param$n.marker |
a number of SNPs in the genotype matrix |
param$n.marker.test |
a number of SNPs used for the test. It can be different from param$n.marker when some markers are monomorphic or have higher missing rates than the missing_cutoff. |
method.bin |
a type of method to be used to compute the p-value. |
test.snp.mac |
a vector of minor allele count (MAC) of the snps tested. The name is SNP-ID. |
Seunggeun Lee
Lee, S., Fuchsberger, C., Kim, S., Scott, L. (2015) An efficient resampling method for calibrating single and gene-based rare variant association analysis in case-control studies. Biostatistics, in press.
data(SKATBinary.example) Z<-SKATBinary.example$Z obj<-SKAT_Null_Model(y ~ x1 + x2, out_type="D", data=SKATBinary.example) # run SKAT (default method) with Hybrid out = SKATBinary(Z, obj) # p-value out$p.value # MAP out$MAP # method used to compute p-value (method.bin) out$method.bin # # Run burden and SKAT-O with Hybrid SKATBinary(Z, obj, method="Burden")$p.value SKATBinary(Z, obj, method="SKATO")$p.value # # Run with SKAT-QA, -MA and -UA SKATBinary(Z, obj, method.bin="QA")$p.value SKATBinary(Z, obj, method.bin="MA")$p.value SKATBinary(Z, obj, method.bin="UA")$p.value # UA from SKAT function SKAT(Z, obj)$p.value # # Run with Adaptive ER out =SKATBinary(Z, obj, method.bin="ER.A") out$p.value # the number of total resampling is smaller than 2*10^6 (default value) out$n.total
data(SKATBinary.example) Z<-SKATBinary.example$Z obj<-SKAT_Null_Model(y ~ x1 + x2, out_type="D", data=SKATBinary.example) # run SKAT (default method) with Hybrid out = SKATBinary(Z, obj) # p-value out$p.value # MAP out$MAP # method used to compute p-value (method.bin) out$method.bin # # Run burden and SKAT-O with Hybrid SKATBinary(Z, obj, method="Burden")$p.value SKATBinary(Z, obj, method="SKATO")$p.value # # Run with SKAT-QA, -MA and -UA SKATBinary(Z, obj, method.bin="QA")$p.value SKATBinary(Z, obj, method.bin="MA")$p.value SKATBinary(Z, obj, method.bin="UA")$p.value # UA from SKAT function SKAT(Z, obj)$p.value # # Run with Adaptive ER out =SKATBinary(Z, obj, method.bin="ER.A") out$p.value # the number of total resampling is smaller than 2*10^6 (default value) out$n.total
This function computes p-values of robust burden test, SKAT, and SKAT-O for binary traits using SPA and ER.
SKATBinary_Robust(Z, obj, kernel = "linear.weighted", method="SKAT" , r.corr=NULL, weights.beta=c(1,25), weights = NULL , impute.method = "bestguess", is_check_genotype=TRUE , is_dosage = FALSE, missing_cutoff=0.15, max_maf=1 , estimate_MAF=1) SKATBinary_Robust.SSD.OneSet(SSD.INFO , SetID, obj, ...,obj.SNPWeight=NULL) SKATBinary_Robust.SSD.OneSet_SetIndex(SSD.INFO , SetIndex, obj, ... ,obj.SNPWeight=NULL)
SKATBinary_Robust(Z, obj, kernel = "linear.weighted", method="SKAT" , r.corr=NULL, weights.beta=c(1,25), weights = NULL , impute.method = "bestguess", is_check_genotype=TRUE , is_dosage = FALSE, missing_cutoff=0.15, max_maf=1 , estimate_MAF=1) SKATBinary_Robust.SSD.OneSet(SSD.INFO , SetID, obj, ...,obj.SNPWeight=NULL) SKATBinary_Robust.SSD.OneSet_SetIndex(SSD.INFO , SetIndex, obj, ... ,obj.SNPWeight=NULL)
Z |
a numeric genotype matrix with each row as a different individual and each column as a separate gene/snp. Each genotype should be coded as 0, 1, 2, and 9 (or NA) for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele. Now we support both matrix and sparse matrix. |
obj |
output object from SKAT_Null_Model. |
kernel |
type of kernel (default= "linear.weighted"). The possible choices are "linear" and "linear.weighted". |
method |
type of gene based test (default= "SKAT"). The possible choices are "SKAT", "Burden" and "SKATO", which represents robust SKAT, Burden and SKAT-O tests, respectively. |
r.corr |
the |
weights.beta |
a numeric vector of parameters of beta weights. It is only used for weighted kernels. If you want to use your own weights, please specify the “weights” parameter. |
weights |
a numeric vector of weights for the weighted kernels. See SKAT page for details. |
impute.method |
a method to impute missing genotypes (default= "bestguess"). "bestguess" imputes missing genotypes as most likely values (0,1,2), "random" imputes missing genotypes by generating binomial(2,p) random variables (p is the MAF), and "fixed" imputes missing genotypes by assigning the mean genotype value (2p). |
is_check_genotype |
a logical value indicating whether to check the validity of the genotype matrix Z (default= TRUE). See SKAT page for details. |
is_dosage |
a logical value indicating whether the matrix Z is a dosage matrix. If it is TRUE, SKAT will ignore “is_check_genotype”. |
missing_cutoff |
a cutoff of the missing rates of SNPs (default=0.15). Any SNPs with missing rates higher than the cutoff will be excluded from the analysis. |
max_maf |
a cutoff of the maximum minor allele frequencies (MAF) (default=1, no cutoff). Any SNPs with MAF > cutoff will be excluded from the analysis. |
estimate_MAF |
a numeric value indicating how to estimate MAFs for the weight calculation and the missing genotype imputation. See SKAT page for details. |
SSD.INFO |
an SSD_INFO object returned from Open_SSD. |
SetID |
a character value of Set ID. You can find a set ID of each set from SetInfo object of SSD.INFO. In SKATBinary_Robust function, this parameter is for the internal use only. |
SetIndex |
a numeric value of Set index. You can find a set index of each set from SetInfo object of SSD.INFO |
obj.SNPWeight |
output object from Read_SNP_WeightFile (default=NULL). If NULL, the beta weight with the “weights.beta” parameter will be used. |
... |
further arguments to be passed to “SKATBinary_Robust” |
p.value |
p-value. It will be the p-value based on robust methods. |
p.value_singlevariant |
p-value for each single variant in this region-based test. |
mac |
total minor allele count (MAC). |
param$n.marker |
a number of SNPs in the genotype matrix. |
param$n.marker.test |
a number of SNPs used for the test. It can be different from param$n.marker when some markers are monomorphic or have higher missing rates than the missing_cutoff. |
param$rho |
the |
test.snp.mac |
a vector of minor allele count (MAC) of the snps tested. The name is SNP-ID. |
Zhangchen Zhao
Zhao, Z., Bi, W., Zhou, W., VandeHaar, P., Fritsche, L. G., & Lee, S. (2019). UK-Biobank Whole Exome Sequence Binary Phenome Analysis with Robust Region-based Rare Variant Test. The American Journal of Human Genetics, in press.
data(SKATBinary.example) Z<-SKATBinary.example$Z obj<-SKAT_Null_Model(y ~ x1 + x2, out_type="D", data=SKATBinary.example) # run SKAT (default method) with Hybrid out = SKATBinary_Robust(Z, obj) # p-value out$p.value # # Run burden and SKAT SKATBinary_Robust(Z, obj, method="Burden")$p.value SKATBinary_Robust(Z, obj, method="SKAT")$p.value
data(SKATBinary.example) Z<-SKATBinary.example$Z obj<-SKAT_Null_Model(y ~ x1 + x2, out_type="D", data=SKATBinary.example) # run SKAT (default method) with Hybrid out = SKATBinary_Robust(Z, obj) # p-value out$p.value # # Run burden and SKAT SKATBinary_Robust(Z, obj, method="Burden")$p.value SKATBinary_Robust(Z, obj, method="SKAT")$p.value
This function computes p-values of single variant test using the firth and efficient resampling methods.
SKATBinary_Single(Z, obj, method.bin="Hybrid" , impute.method = "bestguess", is_check_genotype=TRUE, is_dosage = FALSE , missing_cutoff=0.15, max_maf=1, estimate_MAF=1 , N.Resampling=2*10^6, seednum=100, epsilon=10^-6)
SKATBinary_Single(Z, obj, method.bin="Hybrid" , impute.method = "bestguess", is_check_genotype=TRUE, is_dosage = FALSE , missing_cutoff=0.15, max_maf=1, estimate_MAF=1 , N.Resampling=2*10^6, seednum=100, epsilon=10^-6)
Z |
a numeric genotype vector. Each genotype should be coded as 0, 1, 2, and 9 (or NA) for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele. |
obj |
output object from SKAT_Null_Model. |
method.bin |
a type of method to compute a p-value (default="Hybrid"). See details. |
impute.method |
a method to impute missing genotypes (default= "bestguess"). |
is_check_genotype |
a logical value indicating whether to check the validity of the genotype matrix Z (default= TRUE). See SKAT page for details. |
is_dosage |
a logical value indicating whether the matrix Z is a dosage matrix. If it is TRUE, SKAT will ignore “is_check_genotype”. |
missing_cutoff |
a cutoff of the missing rates of SNPs (default=0.15). Any SNPs with missing rates higher than the cutoff will be excluded from the analysis. |
max_maf |
a cutoff of the maximum minor allele frequencies (MAF) (default=1, no cutoff). Any SNPs with MAF > cutoff will be excluded from the analysis. |
estimate_MAF |
a numeric value indicating how to estimate MAFs for the weight calculation and the missing genotype imputation. See SKAT page for details |
N.Resampling |
a number of resampling to be conducted to get p-values (default=2 *10^6). |
seednum |
a seed number for random number generation (default=100). If NULL, no seed number will be assigned. |
epsilon |
a precision level (default=10^-6). |
This function implements three methods (method.bin) to compute p-values: 1) Efficient resampling (ER); 2) Firth biased adjusted likelihood ratio test (Firth); and 3) Hybrid. "Hybrid" selects a method based on the total minor allele count (MAC), the number of individuals with minor alleles (m), and the degree of case-control imbalance.
Adaptive ER (ER.A) is not implemented yet.
If seednum is not NULL, set.seed(seednum) function is used to specify seeds to get the same p-values of ER based methods for different runs. Therefore, please set seednum=NULL, if you do not want to set seeds.
p.value |
p-value. It will be the mid p-value if ER is used to compute the p-value. |
p.value.standard |
(ER only) standard p-value. |
p.value.resampling |
p-values from resampled outcome. You can obtain it when n.Resampling in SKAT_Null_Model was > 0. See the SKAT_Null_Model. |
p.value.standard.resampling |
(ER only) standard p-values from resampled outcome. |
m |
the number of individuals with minor alleles. |
MAP |
the minimum possible p-values. It is available when the method.bin="ER" and m is sufficiently small. |
MAC |
the total minor allele count (MAC). |
n.total |
(ER only) the number of resampling to be generated to get the p-value. It can be smaller than N.Resampling when the total number of configurations of case-controls among individuals with minor alleles are smaller than N.Resampling. |
is.accurate |
logical value for the accuracy of the p-value. If it is false, more resampling is needed to accurately estimate the p-value. |
method.bin |
a type of method to be used to compute the p-value. |
Seunggeun Lee
Lee, S., Fuchsberger, C., Kim, S., Scott, L. (2015) An efficient resampling method for calibrating single and gene-based rare variant association analysis in case-control studies. Biostatistics, in press.
data(SKATBinary.example) Z<-SKATBinary.example$Z obj<-SKAT_Null_Model(y ~ x1 + x2, out_type="D", data=SKATBinary.example) out = SKATBinary_Single(Z[,1], obj) # p-value out$p.value # MAP out$MAP # method used to compute p-value (method.bin) out$method.bin # # Use firth method to compute p-value SKATBinary_Single(Z[,1], obj, method.bin="Firth")$p.value
data(SKATBinary.example) Z<-SKATBinary.example$Z obj<-SKAT_Null_Model(y ~ x1 + x2, out_type="D", data=SKATBinary.example) out = SKATBinary_Single(Z[,1], obj) # p-value out$p.value # MAP out$MAP # method used to compute p-value (method.bin) out$method.bin # # Use firth method to compute p-value SKATBinary_Single(Z[,1], obj, method.bin="Firth")$p.value
Example data for SKAT.
SKAT.example contains the following objects:
a numeric genotype matrix of 2000 individuals and 11 SNPs. Each row represents a different individual, and each column represents a different SNP marker.
a numeric vector of continuous covariates.
a numeric vector of binary covariates.
a numeric vector of binary phenotypes.
a list object of 30 genotype matrices.
Internal variable for SSD and functions for other packages