Package 'SKAT'

Title: SNP-Set (Sequence) Kernel Association Test
Description: Functions for kernel-regression-based association tests including Burden test, SKAT and SKAT-O. These methods aggregate individual SNP score statistics in a SNP set and efficiently compute SNP-set level p-values.
Authors: Seunggeun (Shawn) Lee and Zhangchen Zhao, with contributions from Larisa Miropolsky and Michael Wu
Maintainer: Seunggeun (Shawn) Lee <[email protected]>
License: GPL(>=2)
Version: 2.3.1
Built: 2025-02-12 15:33:31 UTC
Source: https://github.com/leelabsg/skat

Help Index


Close SNP set data file (SSD)

Description

Close the opened SNP Set data file (SSD). After using the SSD file, it must be closed.

Usage

Close_SSD()

Author(s)

Seunggeun Lee, Larisa Miropolsky


Generate SNP set data file (SSD)

Description

Generate a SNP set data file (SSD) from binary plink data files using user specified SNP sets.

Usage

Generate_SSD_SetID(File.Bed, File.Bim, File.Fam, File.SetID,
	 File.SSD, File.Info, Is.FlipGenotype=TRUE)

Arguments

File.Bed

name of the binary ped file (BED).

File.Bim

name of the binary map file (BIM).

File.Fam

name of the FAM file (FAM).

File.SetID

name of the SNP set ID file that defines SNP sets. The first column must be Set ID, and the second column must be SNP ID. There should be no header!!

File.SSD

name of the SSD file generated.

File.Info

name of the SSD info file generated.

Is.FlipGenotype

internal use only, please do not change

Details

The SetID file is a white-space (space or tab) delimitered file with 2 columns: SetID and SNP_ID.

Please keep in mind that there should be no header! The SNP_IDs and SetIDs should be less than 50 characters, otherwise, it will return an error message.

The SSD file is a binary formated file with genotypes. The SSD info file is a text file with general information on data and SNP sets (first 6 rows), and information on each set (after 8th row).

Author(s)

Seunggeun Lee, Larisa Miropolsky


Estimate the effective number of tests for Bonferroni correction

Description

Estimate the effective number of tests for Bonferroni correction using the minimum achievable p-values (MAP).

Usage

Get_EffectiveNumberTest(MAP, alpha=0.05, Is.MidP=TRUE)

Arguments

MAP

a vector of the minimum achievable p-values (MAP).

alpha

a significant level.

Is.MidP

a logical value indicating whether p-values are mid-p-values.

Value

Effective number of test for Bonferroni correction at the level alpha. MAP can be obtained from SKATBinary functions.

Author(s)

Seunggeun Lee

References

Lee, S., Fuchsberger, C., Kim, S., Scott, L. (2015) An efficient resampling method for calibrating single and gene-based rare variant association analysis in case-control studies. Biostatistics, in press.


Get Genotype data from SSD file

Description

Read a SSD file and return a genotype matrix.

Usage

Get_Genotypes_SSD(SSD_INFO, Set_Index, is_ID = TRUE)
	
	Get_Genotypes_SSD_Sparse(SSD_INFO, Set_Index)

Arguments

SSD_INFO

SSD_INFO object returned from Open_SSD.

Set_Index

a numeric value of Set index. The set index of each set can be found from SetInfo object of SSD.INFO.

is_ID

a logical value indicating whether to read SNP ID (default=TRUE). If TRUE, it reads SNP IDs and use them as column names.

Value

A genotype matrix with n rows and m columns, where n is the number of samples and m is the number of SNPs. Get_Genotypes_SSD_Sparse returns a sparse matrix. Get_Genotypes_SSD_Sparse always returns SNP IDs as the column names, so does not have is_ID parameter.

Author(s)

Seunggeun Lee, Larisa Miropolsky


Get the logistic weight

Description

Get logistic weights from either a genotype matrix (Z) or a vector of minor allele frequncies (MAF). Users can apply this weights to SKAT by giving it as the “weights” parameter. The logistic weight gives equal weights to rare variants and nearly zero weight to common variants.

Usage

Get_Logistic_Weights(Z, par1=0.07, par2=150)


	Get_Logistic_Weights_MAF(MAF, par1=0.07, par2=150)

Arguments

Z

a numeric genotype matrix with each row as a different individual and each column as a separate gene/snp. Each genotype should be coded as 0, 1, 2, and 9 for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele.

MAF

a numeric vector of minor allele frequncies.

par1

a numeric value of the first parameter of the logistic weight (default= 0.07).

par2

a numeric value of the second parameter of the logistic weight(default= 150).

Details

The formula for the weight is

weights=e(par1MAF)par21+e(par1MAF)par2.weights = \frac{e^{(par1 - MAF) par2}}{1 + e^{(par1 - MAF) par2}}.

Value

A vector of the logistic weight.

Author(s)

Seunggeun Lee

Examples

data(SKAT.example)


#############################################################
#	Compute the P-value of SKAT with the logistic Weight (par1=0.07, par2=150)

# Use logistic weight
Z<-SKAT.example$Z
obj<-SKAT_Null_Model(y.c ~ X, out_type="C", data=SKAT.example)
weights<-Get_Logistic_Weights(Z, par1=0.07, par2=150)
SKAT(Z, obj, kernel = "linear.weighted", weights=weights)$p.value

# Weights function
MAF<-colMeans(Z)/2
plot(MAF,weights)

Get the required sample size to achieve the given power

Description

Get the sample sizes required to achieve the given power.

Usage

Get_RequiredSampleSize(obj, Power=0.8)

Arguments

obj

an object returned from Power_Continuous or Power_Logistic.

Power

a value of the power to be achieved (default= 0.8).

Details

This function computes required sample sizes using simple interpolation.

Value

A list object for the required sample sizes.

Author(s)

Seunggeun Lee


Compute a resampling p-value

Description

Compute a resampling p-value using resampled residuals. To use it, SKAT_Null_Model or SKAT_Null_Model_MomentAdjust should have n.Resampling > 0.

Usage

Get_Resampling_Pvalue(obj)

	Get_Resampling_Pvalue_1(p.value, p.value.resampling)

Arguments

obj

SKAT outcome object.

p.value

a numeric value of the SKAT p-value.

p.value.resampling

a vector of p-values from the resampled residuals.

Details

See SKAT_Null_Model

Value

p.value

the resampling p-value. It is computed as (n1 +1)/(n+1), where n is the number of resampling (n.Resampling in SKAT_Null_Model or SKAT_Null_Model_MomentAdjust), and n1 is the number of resampled residual p-values smaller than the original sample p-value.

is_smaller

a logical value indicates whether the resampling p-value should be smaller. If n1=0, then it is TRUE, otherwise it is FALSE.

Author(s)

Seunggeun Lee


Open SNP set data file (SSD)

Description

Open a SNP Set data file (SSD). After finishing using the SSD file, you must close the file by callinsg Close_SSD function.

Usage

Open_SSD(File.SSD, File.Info)

Arguments

File.SSD

name of the SSD file .

File.Info

name of the SSD info file.

Value

a list object of SSD.INFO.

Author(s)

Seunggeun Lee, Larisa Miropolsky


Power calculation, continuous traits

Description

Compute an average power of SKAT and SKAT-O for testing association between a genomic region and continuous phenotypes with a given disease model.

Usage

Power_Continuous(Haplotypes=NULL, SNP.Location=NULL, SubRegion.Length=-1
, Causal.Percent=5, Causal.MAF.Cutoff=0.03, alpha =c(0.01,10^(-3),10^(-6))
, N.Sample.ALL = 500 * (1:10), Weight.Param=c(1,25), N.Sim=100
, BetaType = "Log", MaxBeta=1.6, Negative.Percent=0)

 
Power_Continuous_R(Haplotypes=NULL, SNP.Location, SubRegion.Length=-1
, Causal.Percent=5, Causal.MAF.Cutoff=0.03, alpha =c(0.01,10^(-3),10^(-6))
, N.Sample.ALL = 500 * (1:10), Weight.Param=c(1,25), N.Sim=100
, BetaType = "Log", MaxBeta=1.6, Negative.Percent=0, r.corr=0)

Arguments

Haplotypes

a haplotype matrix with each row as a different individual and each column as a separate SNP (default= NULL). Each element of the matrix should be either 0 (major allel) or 1 (minor allele). If NULL, SKAT.haplotype dataset will be used to compute power.

SNP.Location

a numeric vector of SNP locations that should be matched with the SNPs in the Haplotype matrix (default= NULL). It is used to obtain subregions. When Haplotype=NULL, it should be NULL.

SubRegion.Length

a value of the length of subregions (default= -1). Each subregion will be randomly selected, and then the average power will be calculated by taking the average over the estimated powers of all subregions. If SubRegion.Length=-1 (default), the length of the subregion will be the same as the length of the whole region, so there will no random selection of subregions.

Causal.Percent

a value of the percentage of causal SNPs among rare SNPs (MAF < Causal.MAF.Cutoff)(default= 5).

Causal.MAF.Cutoff

a value of MAF cutoff for the causal SNPs. Only SNPs that have MAFs smaller than the cutoff will be considered as causal SNPs (default= 0.03).

alpha

a vector of the significance levels (default= c(0.01,10^(-3),10^(-6))).

N.Sample.ALL

a vector of the sample sizes (default= 500 * (1:10)).

Weight.Param

a vector of parameters of beta weights (default= c(1,25)).

N.Sim

a value of number of causal SNP/SubRegion sets to be generated to compute the average power (default= 100). Power will be computed for each causal SNP/SubRegion set, and then the average power will be obtained by taking average over the computed powers.

BetaType

a type of effect sizes (default= “Log”). “Log” indicates that effect sizes of causal variants equal to clog10(MAF)c|log10(MAF)|, and “Fixed” indicates that effect sizes of all causal variants are the same.

MaxBeta

a numeric value of the maximum effect size (default= 1.6). When BetaType="Log", the maximum effect size is MaxBeta (when MAF=0.0001). When BetaType="Fixed", all causal variants have the same effect size (= MaxBeta). See details

Negative.Percent

a numeric value of the percentage of coefficients of causal variants that are negative (default= 0).

r.corr

(Power_Continuous_R only) the ρ\rho parameter for the compound symmetric correlation kernel (default= 0). See details.

Details

By default it uses the haplotype information in the SKAT.haplotypes dataset. So if you want to use the SKAT.haplotypes dataset, you can left Haplotypes and SNP.Location as NULL.

When BetaType=“Log”, MaxBeta is a coeffecient value (β\beta) of the causal SNP at MAF =104= 10^{-4} and used to obtain c value of the function clog10(MAF)c|log10(MAF)|. For example, if MaxBeta=1.6, c=1.6/4=0.4c = 1.6/4 = 0.4. Then a variant with MAF=0.001 has β=1.2\beta = 1.2 and a variant with MAF=0.01 has β=0.8\beta = 0.8.

When SubRegion.Length is small such as 3kb or 5kb, it is possible that you can have different estimated power for each run with N.Sim = 5010050 \sim 100. Then, please increase the N.Sim to 5001000500 \sim 1000 to obtain stable results.

R.sq is computed under the no linkage disequilibrium assumption.

Power_Continuous_R computes power with new class of kernels with the compound symmetric correlation structure. It uses a slightly different approach, and thus Power_Continuous and Power_Continuous_R can produce slightly different results although r.corr=0.

If you want to computer power of SKAT-O by estimating the optimal r.corr, use r.corr=2. The estimated optimal r.corr is r.corr=p12(2p21)2r.corr = p_1^2 ( 2p_2-1)^2, where p1p_1 is a proportion of causal variants, and p2p_2 is a proportion of negatively associated causal variants among the causal variants.

Value

Power

A matrix with each row as a different sample size and each column as a different significance level. Each element of the matrix is the estimated power.

R.sq

Proportion of phenotype variance explained by genetic variants.

r.corr

r.corr value. When r.corr=2 is used, it provides the estimated r.corr value. See details.

Author(s)

Seunggeun Lee

Examples

#
#	Calculate the average power of randomly selected 3kb regions 
#	with the following conditions.
#
#	Causal percent = 20%
#	Negative percent = 20%
#	Max effect size  = 2 at MAF = 10^-4
#
#	When you use this function, please increase N.Sim (more than 100)	
#

out.c<-Power_Continuous(SubRegion.Length=3000, 
Causal.Percent= 20, N.Sim=5, MaxBeta=2,Negative.Percent=20)
out.c

#
#	Calculate the required sample sizes to achieve 80% power

Get_RequiredSampleSize(out.c, Power=0.8)

Power calculation, Dichotomous traits

Description

Compute an average power of SKAT and SKAT-O for testing association between a genomic region and dichotomous phenotypes from case-control studies with a given disease model.

Usage

Power_Logistic(Haplotypes = NULL, SNP.Location = NULL, SubRegion.Length=-1
, Prevalence=0.01, Case.Prop=0.5, Causal.Percent=5, Causal.MAF.Cutoff=0.03
, alpha =c(0.01,10^(-3),10^(-6)), N.Sample.ALL = 500 * (1:10)
, Weight.Param=c(1,25), N.Sim=100, OR.Type = "Log"
, MaxOR=5, Negative.Percent=0)

Power_Logistic_R(Haplotypes = NULL, SNP.Location = NULL, SubRegion.Length=-1
, Prevalence=0.01, Case.Prop=0.5, Causal.Percent=5, Causal.MAF.Cutoff=0.03
, alpha =c(0.01,10^(-3),10^(-6)), N.Sample.ALL = 500 * (1:10)
, Weight.Param=c(1,25), N.Sim=100, OR.Type = "Log"
, MaxOR=5, Negative.Percent=0, r.corr=0)

Arguments

Haplotypes

a haplotype matrix with each row as a different individual and each column as a separate SNP (default= NULL). Each element of the matrix should be either 0 (major allel) or 1 (minor allele). If NULL, SKAT.haplotype dataset will be used to compute power.

SNP.Location

a numeric vector of SNP locations which should be matched with the SNPs in the Haplotype matrix (default= NULL). It is used to obtain subregions. When Haplotype=NULL, it should be NULL.

SubRegion.Length

a value of the length of subregions (default= -1). Each subregion will be randomly selected, and then the average power will be calculated by taking the average over the estimated powers of all subregions. If SubRegion.Length=-1 (default), the length of the subregion is the same as the length of the whole region, so there will no random selection of subregions.

Prevalence

a value of disease prevalence.

Case.Prop

a value of the proportion of cases. For example, Case.Prop=0.5 means 50 % of samples are cases and 50 % of samples are controls.

Causal.Percent

a value of the percentage of causal SNPs among rare SNPs (MAF < Causal.MAF.Cutoff)(default= 5).

Causal.MAF.Cutoff

a value of MAF cutoff for the causal SNPs. Only SNPs that have MAFs smaller than this are considered as causal SNPs (default= 0.03).

alpha

a vector of the significance levels (default= c(0.01,10^(-3),10^(-6))).

N.Sample.ALL

a vector of the sample sizes (default= 500 * (1:10)).

Weight.Param

a vector of parameters of beta weights (default= c(1,25)).

N.Sim

a value of number of causal SNP/SubRegion sets to be generated to compute the average power (default= 100). Power will be computed for each causal SNP/SubRegion set, and then the average power will be obtained by taking mean of the computed powers.

OR.Type

a function type of effect sizes (default= “Log”). “Log” indicates that log odds ratio of causal variants equal to clog10(MAF)c|log10(MAF)|, and “Fixed” indicates that log odds ratio of all causal variants are the same.

MaxOR

a numeric value of the maximum odds ratio (default= 5). When OR.Type=“Log”, the maximum odds ratio is MaxOR (when MAF=0.0001). When OR.Type=“Fixed”, all causal variants have the same odds ratio (= MaxOR). See details

Negative.Percent

a numeric value of the percentage of coefficients of causal variants that are negative (default= 0).

r.corr

(Power_Logistic_R only) the ρ\rho parameter of new class of kernels with compound symmetric correlation structure for genotype effects (default= 0). See details.

Details

By default it uses the haplotype information in the SKAT.haplotypes dataset. So you can left Haplotypes and SNP.Location as NULL if you want to use the SKAT.haplotypes dataset.

When OR.Type="Log", MaxOR is a odds ratio of the causal SNP at MAF =104= 10^{-4} and used to obtain c value in the function logOR=clog10(MAF)\log OR = c|log10(MAF)|. For example, if MaxOR=5, c=log(5)/4=0.402c = log(5)/4 = 0.402. Then a variant with MAF=0.001 has log odds ratio = 1.206 and a variant with MAF=0.01 has log odds ratio = 0.804.

When SubRegion.Length is small such as 3kb or 5kb, it is possible that you can have different estimated power for each run with N.Sim = 5010050 \sim 100. Then, please increase N.Sim to 5001000500 \sim 1000 to obtain stable results.

Power_Logistic_R computes the power with new class of kernels with the compound symmetric correlation structure. It uses a slightly different approach, and thus Power_Logistic and Power_Logistic_R can produce slightly different results although r.corr=0.

If you want to computer power of SKAT-O by estimating the optimal r.corr, use r.corr=2. The estimated optimal r.corr is r.corr=p12(2p21)2r.corr = p_1^2 ( 2p_2-1)^2, where p1p_1 is a proportion of causal variants, and p2p_2 is a proportion of negatively associated causal variants among the causal variants.

Value

Power

A matrix with each row as a different sample size and each column as a different significance level. Each element of the matrix is the estimated power.

r.corr

r.corr value. When r.corr=2 is used, it provides the estimated r.corr value. See details.

Author(s)

Seunggeun Lee

Examples

#
#	Calculate the average power of randomly selected 3kb regions 
#	with the following conditions.
#
#	Causal percent = 20%
#	Negative percent = 20%
#	Max OR = 7 at MAF = 10^-4
#
#	When you use this function, please increase N.Sim (more than 100)	
#
out.b<-Power_Logistic(SubRegion.Length=3000, 
Causal.Percent= 20, N.Sim=5 ,MaxOR=7,Negative.Percent=20)

out.b

#
#	Calculate the required sample sizes to achieve 80% power

Get_RequiredSampleSize(out.b, Power=0.8)

Adjusted QQ plot

Description

Draws a MAP-adjusted QQ plot

Usage

QQPlot_Adj(Pval, MAP, main="QQ plot", ntry=500, confidence=0.95, Is.unadjsted=TRUE
	, Is.legend=TRUE, xlab="Expected Quantiles (-log10 P-values)"
	, ylab="Observed Quantiles (-log10 P-values)")

Arguments

Pval

a vector of p-values.

MAP

a vector of MAP.

main

a main title.

ntry

a numeric value of the number for resampling (default= 500).

confidence

a value for the confidence band (default= 0.95).

Is.unadjsted

a logical value to indicate whether to plot the unadjusted QQ plot (default= TRUE).

Is.legend

a logical value to indicate whether to plot a legend (default= TRUE).

xlab

a label for the x axis.

ylab

a label for the y axis.

Author(s)

Seunggeun Lee

References

Lee, S., Fuchsberger, C., Kim, S., Scott, L. (2015) An efficient resampling method for calibrating single and gene-based rare variant association analysis in case-control studies. Biostatistics, in press.


Read a file with custom weights

Description

Read a file with custom weights

Usage

Read_SNP_WeightFile(FileName)

Arguments

FileName

input file name of a custom weight.

Details

The file should be a white-space (space or tab) delimitered file with 2 columns: SNP_ID and weight value.

Please keep in mind that there should be no header!!

Value

Output object has a hash table of SNP IDs and weights.

Author(s)

Seunggeun Lee


Obtain significant SNP sets after controlling family wise error rate (FWER)

Description

Obtain significant SNP sets after controlling for family wise error rate (FWER) using resampled residuals. To use it, SKAT_Null_Model or SKAT_Null_Model_MomentAdjust should have n.Resampling > 0.

Usage

Resampling_FWER(obj,FWER=0.05)

	Resampling_FWER_1(P.value, P.value.Resampling, FWER=0.05)

Arguments

obj

object returned from SKAT.SSD.All or SKATBinary.SSD.All.

P.value

a vector of SKAT p-values. If 100 genes were tested, this vector should have 100 p-values.

P.value.Resampling

a matrix of p-values of the resampled residuals. Each row represents each gene/snp set, and each column represents resampling set. For example, if you have 100 genes, and conducted resampling 1000 times ( ex.n.Resampling=1000 in SKAT_Null_Model), then it should be a 100 x 1000 matrix.

FWER

a numeric value of FWER rate to control (default=0.05)

Value

results

If the returned object from SKAT.SSD.all (or SKATBinary.SSD.All) are used, it is a sub-table of significant snp sets of the result table in the obj. If you use P.value and P.value.Resampling, it is a vector of significant p-values. If there is no significant snp set, it is NULL.

n

a numeric value of the number of significant snp sets.

ID

a vector of indexes of significant snp sets.

Author(s)

Seunggeun Lee


SNP-set (Sequence) Kernel Association Test

Description

Test for association between a set of SNPS/genes and continuous or dichotomous phenotypes using kernel regression framework.

Usage

SKAT(Z, obj, kernel = "linear.weighted", 
  method="davies", weights.beta=c(1,25), weights=NULL, 
  impute.method="fixed", r.corr=0, is_check_genotype=TRUE,
  is_dosage = FALSE, missing_cutoff=0.15 , max_maf=1, estimate_MAF=1)

SKAT.SSD.OneSet(SSD.INFO, SetID, obj, ... ,obj.SNPWeight=NULL)

SKAT.SSD.OneSet_SetIndex(SSD.INFO, SetIndex, obj, ... ,obj.SNPWeight=NULL)

Arguments

Z

a numeric genotype matrix with each row as a different individual and each column as a separate gene/snp. Each genotype should be coded as 0, 1, 2, and 9 (or NA) for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele. Missing genotypes will be imputed by the simple Hardy-Weinberg equilibrium (HWE) based imputation.

obj

an output object of the SKAT_Null_Model function.

kernel

a type of kernel (default= "linear.weighted"). See detail section.

method

a method to compute the p-value (default= "davies"). "davies" represents an exact method that computes the p-value by inverting the characteristic function of the mixture chisq, "liu" represents an approximation method that matches the first 3 moments, "liu.mod" represents modified "liu" method that matches kurtosis instead of skewness to improve tail probability approximation,"SKATO" and "optimal.adj" represent a SKAT-O based on an unified approach, and "optimal" is an old version of the implementation of SKAT-O. See details.

weights.beta

a numeric vector of parameters for the beta weights for the weighted kernels. If you want to use your own weights, please use the “weights” parameter. It will be ignored if “weights” parameter is not null.

weights

a numeric vector of weights for the weighted kernels. It is w\sqrt{w} in the SKAT paper. So if you want to use the Madsen and Browning (2009) weight, you should set each element of weights as 1/p(1p)1/ \sqrt{p(1-p)}, not 1/p(1p)1/ p(1-p). When it is NULL, the beta weight with the “weights.beta” parameter is used.

impute.method

a method to impute missing genotypes (default= "fixed"). "bestguess" imputes missing genotypes as most likely values (0,1,2), "random" imputes missing genotypes by generating binomial(2,p) random variables (p is the MAF), and "fixed" imputes missing genotypes by assigning the mean genotype values (2p).

r.corr

the ρ\rho parameter for the compound symmetric correlation structure kernels (default= 0). If you give a vector value, SKAT will conduct the optimal test. It will be ignored if method=“optimal” or method=“optimal.adj”. See details.

is_check_genotype

a logical value indicating whether to check the validity of the genotype matrix Z (default= TRUE). If Z has non-SNP data, please set it FALSE, otherwise you will get an error message. If it is FALSE and you use weighted kernels, the weights should be given through the “weights” parameter.

is_dosage

a logical value indicating whether the matrix Z is a dosage matrix. If it is TRUE, SKAT will ignore “is_check_genotype”.

missing_cutoff

a cutoff of the missing rates of SNPs (default=0.15). Any SNPs with missing rates higher than the cutoff will be excluded from the analysis.

max_maf

a cutoff of the maximum minor allele frequencies (MAF) (default=1, no cutoff). Any SNPs with MAF > cutoff will be excluded from the analysis.

estimate_MAF

a numeric value indicating how to estimate MAFs for the weight calculation and the missing genotype imputation. If estimate_MAF=1 (default), SKAT uses all samples to estimate MAFs. If estimate_MAF=2, only samples with non-missing phenotypes and covariates are used to estimate MAFs.

SSD.INFO

an SSD_INFO object returned from Open_SSD.

SetID

a character value of Set ID. A set ID of each set can be found from SetInfo object in SSD.INFO.

SetIndex

a numeric value of Set index. A set index of each set can be found from SetInfo object in SSD.INFO.

...

further arguments to be passed to “SKAT”

obj.SNPWeight

an output object of Read_SNP_WeightFile (default=NULL). If NULL, the beta weight with the “weights.beta” parameter will be used.

Details

There are 6 types of pre-specified kernels: "linear", "linear.weighted", "IBS", "IBS.weighted", "quadratic" and "2wayIX". Among them, "2wayIX" is a product kernel consisting of main effects and SNP-SNP interaction terms.

If users want to use dosage values instead of genotypes, set is_dosage=TRUE. Please keep in mind that plink formatted files (so SSD files) cannot used for dosages. Instead, you should make a genotype matrix Z to run SKAT.

The kernel matrix for the weighted linear kernel is K=GWWGK = G W W G, where G is a genotype matrix and W is a diagonal weight matrix. Please note that it is different from the notation we used in the original SKAT paper, which was K=GWGK = G W G. The Madsen and Browning (2009) weight is w=1/p(1p)w = 1/ \sqrt{p(1-p)} in the current notation. By the previous notation, it is w=1/p(1p)w = 1/ p(1-p).

If you want to use the SSD file, you need to open it first using Open_SSD, and then use either SKAT.SSD.OneSet or SKAT.SSD.OneSet_SetIndex. Set index is a numeric value and automatically assigned to each set (from 1).

The r.corr represents a ρ\rho parameter of the unified test, Qρ=(1ρ)QS+ρQBQ_{\rho} = (1-\rho) Q_S + \rho Q_B, where QSQ_S is a SKAT test statistic, and QBQ_B is a weighted burden test statistic. Therefore, ρ=0\rho=0 results in the original weighted linear kernel SKAT, and ρ=1\rho=1 results in the weighted burden test (default: ρ=0\rho=0). If r.corr is a vector, SKAT-O will be conducted with adaptively seleting ρ\rho from given r.corr values. ρ\rho should be a value between 0 and 1. When method="optimal" or method="optimal.adj", the r.corr parameter will be ignored.

We slightly changed the implementation for SKAT-O to improve the estimation of p-values. You can run it by using method="optimal.adj" or "SKATO". It uses a grid of eight points ρ=(0,0.12,0.22,0.32,0.42,0.52,0.5,1)\rho=(0, 0.1^2, 0.2^2, 0.3^2, 0.4^2, 0.5^2, 0.5, 1) for the search of the optimal ρ\rho. If you want to use the original implementation of SKAT-O, use method="optimal" that carries out SKAT-O with an equal sized grid of 11 points (from 0 to 1).

If the true p.value is very small, you can have p.value=0 due to numerical reasons. In this case, please see pval.zero.msg that shows how small it is. For example, if the p.value is smaller than 106010^{-60}, it has "Pvalue < 1.000000e-60".

By default, SKAT uses impute.method="fixed" that imputes missing genotypes as the mean genotype values (2p). When variates are very rare and missing rates between cases and controls are highly unbalanced, impute.method="fixed" can yield inflated type I error rate. In this case, we recommend to use impute.method="bestguess", which does not suffer the same problem.

Value

p.value

p-value of SKAT.

p.value.resampling

p-values from resampled outcomes. You can get it when you use obj from SKAT_Null_Model function with resampling. See the SKAT_Null_Model.

p.value.noadj

p-value of SKAT without the small sample adjustment. It only appears when small sample adjustment is applied.

p.value.noadj.resampling

p-values from resampled outcomes without the small sample adjustment.

pval.zero.msg

(only when p.value=0) text message that shows how small the p.value is. ex. "Pvalue < 1.000000e-60" when the p.value is smaller than 106010^{-60}

Q

test statistic of SKAT. It has NA when method="optimal.adj" or "optimal".

param

estimated parameters of each method.

param$Is_Converged

(only with method="davies") an indicator of the convergence (1=convergence, 0=non-convergence). When 0 (not converged), "liu" method will be used to compute p-values.

param$n.marker

a number of SNPs in the genotype matrix.

param$n.marker.test

a number of SNPs used for the test. It can be different from param$n.marker when some markers are monomorphic or have higher missing rates than the missing_cutoff.

test.snp.mac

a vector of minor allele count (MAC) of the snps tested. The name is SNP-ID.

Author(s)

Seunggeun Lee, Micheal Wu

References

Lee, S., Emond, M.J., Bamshad, M.J., Barnes, K.C., Rieder, M.J., Nickerson, D.A., NHLBI GO Exome Sequencing Project-ESP Lung Project Team, Christiani, D.C., Wurfel, M.M. and Lin, X. (2012) Optimal unified approach for rare variant association testing with application to small sample case-control whole-exome sequencing studies. American Journal of Human Genetics, 91, 224-237.

Lee, S., Wu, M. C., and Lin, X. (2012) Optimal tests for rare variant effects in sequencing association studies. Biostatistics, 13, 762-775.

Wu, M. C.*, Lee, S.*, Cai, T., Li, Y., Boehnke, M., and Lin, X. (2011) Rare Variant Association Testing for Sequencing Data Using the Sequence Kernel Association Test (SKAT). American Journal of Human Genetics, 89, 82-93. \ * contributed equally.

Wu, M. C., Kraft, P., Epstein, M. P.,Taylor, D., M., Chanock, S. J., Hunter, D., J., and Lin, X. (2010) Powerful SNP Set Analysis for Case-Control Genome-wide Association Studies. American Journal of Human Genetics, 86, 929-942.

Davies R.B. (1980) Algorithm AS 155: The Distribution of a Linear Combination of chi-2 Random Variables, Journal of the Royal Statistical Society. Series C , 29, 323-333.

H. Liu, Y. Tang, H.H. Zhang (2009) A new chi-square approximation to the distribution of non-negative definite quadratic forms in non-central normal variables, Computational Statistics and Data Analysis, 53, 853-856.

Duchesne, P. and Lafaye De Micheaux, P. (2010) Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods, Computational Statistics and Data Analysis, 54, 858-862.

Examples

data(SKAT.example)


#############################################################
#	SKAT with default Beta(1,25) Weights 
#		- without covariates
Z<-SKAT.example$Z
# continuous trait
obj<-SKAT_Null_Model(y.c ~ 1, out_type="C", data=SKAT.example)
SKAT(Z, obj)$p.value

# dichotomous trait 
obj<-SKAT_Null_Model(y.b ~ 1, out_type="D", data=SKAT.example)
SKAT(Z, obj)$p.value


##################################################
#	SKAT with default Beta(1,25) Weights
#		- with covariates

# continuous trait
obj<-SKAT_Null_Model(y.c ~ X, out_type="C", data=SKAT.example)
SKAT(Z, obj)$p.value

obj.b<-SKAT_Null_Model(y.b ~ X, out_type="D", data=SKAT.example)
SKAT(Z, obj.b)$p.value

##################################################
#	SKAT with default Beta(1,25) Weights
#		- Optimal Test

SKAT(Z, obj, method="optimal.adj")$p.value

# you can get the same p-value by using method="SKATO"
SKAT(Z, obj, method="SKATO")$p.value

#############################################################
#	SKAT with Beta(1,30) Weights

SKAT(Z, obj, weights.beta=c(1,30))$p.value

SNP-set (Sequence) Kernel Association Test for X and Y chromosome variables

Description

Test for association between a set of SNPS/genes in the X chromosome and continuous or dichotomous outcomes using the kernel machine.

Usage

SKAT_ChrX(Z, obj, is_X.inact =TRUE
, kernel = "linear.weighted", method="davies", weights.beta=c(1,25)
, weights = NULL, impute.method = "fixed", r.corr=0, is_check_genotype=TRUE
, is_dosage = FALSE, missing_cutoff=0.15, max_maf=1, estimate_MAF=1, SetID=NULL)

SKAT_ChrY(Z, obj, kernel = "linear.weighted", method="davies", weights.beta=c(1,25)
, weights = NULL, impute.method = "fixed", r.corr=0, is_check_genotype=TRUE
, is_dosage = FALSE, missing_cutoff=0.15, max_maf=1, estimate_MAF=1, SetID=NULL)

Arguments

Z

a numeric genotype matrix with each row as a different individual and each column as a separate gene/snp. Each genotype should be coded as 0, 1, 2, and 9 (or NA) for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele. Missing genotypes will be imputed by the simple Hardy-Weinberg equilibrium (HWE) based imputation.

obj

output object of the SKAT_Null_Model_ChrX function. For SKAT_ChrY, SKAT_Null_Model_ChrX function should be used with Model.Y=TRUE

is_X.inact

an indicator variable for the X-inactivation coding (default=TRUE). Male genotypes are coded as g=(0,2) when it is TRUE, and g=(0,1) when it is false.

kernel

a type of kernel (default= "linear.weighted").

method

a method to compute the p-value (default= "davies"). See SKAT page for details.

weights.beta

a numeric vector of parameters of beta weights. See SKAT page for details.

weights

a numeric vector of weights for the weighted kernels. See SKAT page for details.

impute.method

a method to impute missing genotypes (default= "fixed"). "fixed" imputes missing genotypes by assigning the mean genotype value (2p), and "bestguess" uses best guess genotype values.

r.corr

the ρ\rho parameter for the compound symmetric kernel. See SKAT page for details.

is_check_genotype

a logical value indicating whether to check the validity of the genotype matrix Z (default= TRUE). See SKAT page for details.

is_dosage

a logical value indicating whether the matrix Z is a dosage matrix. If it is TRUE, SKAT will ignore “is_check_genotype”.

missing_cutoff

a cutoff of the missing rates of SNPs (default=0.15). Any SNPs with missing rates higher than the cutoff will be excluded from the analysis.

max_maf

a cutoff of the maximum minor allele frequencies (MAF) (default=1, no cutoff). Any SNPs with MAF > cutoff will be excluded from the analysis.

estimate_MAF

a numeric value indicating how to estimate MAFs for the weight calculation and the missing genotype imputation. See SKAT page for details.

SetID

Internal use only.

Details

For details of parameters, please see SKAT page.

Value

p.value

p-value of SKAT.

p.value.resampling

p-values from resampled outcome. You can get it when you use obj from SKAT_Null_Model function with resampling. See the SKAT_Null_Model.

p.value.noadj

p-value of SKAT without the small sample adjustment. It only appears when small sample adjustment is applied.

p.value.noadj.resampling

p-values from resampled outcome without the small sample adjustment. It only appears when small sample adjustment is applied.

pval.zero.msg

(only when p.value=0) text message that shows how small the p.value is. ex. "Pvalue < 1.000000e-60" when p.value is smaller than 106010^{-60}

Q

the test statistic of SKAT. It has NA when method="optimal.adj" or "optimal".

param

estimated parameters of each method.

param$Is_Converged

(only with method="davies") an indicator of the convergence. When 0 (not converged), "liu" method is used to compute p-value.

param$n.marker

a number of SNPs in the genotype matrix

param$n.marker.test

a number of SNPs used for the test. It can be different from param$n.marker when some markers are monomorphic or have higher missing rates than the missing_cutoff.

Author(s)

Clement Ma and Seunggeun Lee

Examples

data(SKAT.example.ChrX)
Z<-SKAT.example.ChrX$Z

#############################################################
#	Compute the P-value of SKAT 

# binary trait
obj.x<-SKAT_Null_Model_ChrX(y ~ x1 +x2 + Gender, 
  SexVar="Gender", out_type="D", data=SKAT.example.ChrX)

# SKAT
SKAT_ChrX(Z, obj.x, kernel = "linear.weighted")

# Burden
SKAT_ChrX(Z, obj.x, kernel = "linear.weighted", r.corr=1)

# SKAT-O
SKAT_ChrX(Z, obj.x, kernel = "linear.weighted", method="SKATO")

#############################################################
# Fit the Y chromosome function
# In this example, since male has only one copy of X (and Y), we reuse X chromosome genotype matrix.

# binary trait
obj.x<-SKAT_Null_Model_ChrX(y ~ x1 +x2 + Gender, 
  SexVar="Gender", out_type="D", Model.Y=TRUE, data=SKAT.example.ChrX)

SKAT_ChrY(Z, obj.x, kernel = "linear.weighted", method="SKATO")

SKAT for the combined effect of common and rare variants

Description

Sequence Kernel association test for the combined effect of common and rare variants.

Usage

SKAT_CommonRare(Z, obj, weights.beta.rare=c(1,25)
	, weights.beta.common=c(0.5,0.5), weights=NULL
	, method="C", r.corr.rare=0, r.corr.common=0, CommonRare_Cutoff=NULL
	, test.type="Joint", is_dosage=FALSE, missing_cutoff=0.15
	, estimate_MAF=1, SetID1=NULL)


	SKAT_CommonRare.SSD.OneSet(SSD.INFO
	, SetID, obj, ..., obj.SNPWeight=NULL)

	SKAT_CommonRare.SSD.OneSet_SetIndex(SSD.INFO
	, SetIndex, obj, ..., obj.SNPWeight=NULL )

Arguments

Z

a numeric genotype matrix with each row as a different individual and each column as a separate gene/snp. Each genotype should be coded as 0, 1, 2, and 9 (or NA) for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele. Missing genotypes will be imputed by the simple Hardy-Weinberg equilibrium (HWE) based imputation.

obj

an output object of the SKAT_Null_Model function.

weights.beta.rare

a numeric vector of parameters of beta weights for rare variants (default=c(1,25)).

weights.beta.common

a numeric vector of parameters of beta weights for common variants (default=c(0.5,0.5)).

weights

a numeric vector of weights for both common and rare variants. When it is NULL, the beta weight with the “weights.beta.rare” and “weights.beta.common” parameter are used. When method ="C", the coefficient to combine common and rare variants test statistics will be calculated after applying the weights to variants.

method

a method to combine common and rare variant effects (default="C"). "C" represents the combined sum test, and "A" represents the adaptive sum test. "AR" represents a different type of adaptive test in which common variants are projected over rare variants.

r.corr.rare

the ρ\rho parameter for rare variants (default= 0). ρ\rho =0 and 1 indicate SKAT and Burden test, respectively

r.corr.common

the ρ\rho parameter for common variants (default= 0). ρ\rho =0 and 1 indicate SKAT and Burden test, respectively

CommonRare_Cutoff

MAF cutoff for common vs rare variants (default=NULL). It should be a numeric value between 0 and 0.5, or NULL. When it is NULL, 1/2SampleSize1/ \sqrt{2 SampleSize } will be used.

test.type

a string to indicate test type (default="Joint"). "Joint" indicates the joint test of the combined effects of common and rare variants. "Rare.Only" and "Common.Only" will conduct test only with rare and common variants, respectively.

is_dosage

see SKAT

missing_cutoff

see SKAT

estimate_MAF

see SKAT

SetID1

internal use only

SSD.INFO

an SSD_INFO object returned from Open_SSD.

SetID

a character value of Set ID. You can find a set ID of each set from SetInfo object of SSD.INFO

SetIndex

a numeric value of Set index. You can find a set index of each set from SetInfo object of SSD.INFO

obj.SNPWeight

output object from Read_SNP_WeightFile (default=NULL). If NULL, the beta weight with the “weights.beta” parameter will be used.

...

furthuer arguments to be passed to “SKAT_CommonRare”

Details

The small sample adjustment for binary traits is not implemented for "A" and "AR".

Value

p.value

p-value.

p.value.resampling

p-values from resampled phenotypes. You can get it when you use obj from SKAT_Null_Model function with resampling. See the SKAT_Null_Model.

n.rare

the number of rare variants used for the test

n.common

the number of common variants used for the test

Cutoff

the MAF cut-off to divide common and rare variants

Q

the test statistic of SKAT. It has NA when method="A" or "AR".

param

estimated parameters of each method.

param$Is_Converged

an indicator of the convergence. 1 indicates the method is converged, and 0 indicates the method is not converged. When 0 (not converged), "liu.mod" method is used to compute p-value.

param$n.marker

a number of SNPs in the genotype matrix

param$n.marker.test

a number of SNPs used for the test. It can be different from param$n.marker when some markers are monomorphic or have higher missing rates than the missing_cutoff.

test.snp.mac

a vector of minor allele count (MAC) of the snps tested. The name is SNP-ID.

Author(s)

Seunggeun Lee

References

Ionita-Laza, I.*, Lee, S.*, Makarov, V., Buxbaum, J. Lin, X. (2013). Sequence kernel association tests for the combined effect of rare and common variants. American Journal of Human Genetics, 92, 841-853. * contributed equally.

Examples

data(SKAT.example)
Z<-SKAT.example$Z


# continuous trait
obj<-SKAT_Null_Model(y.c ~ X, out_type="C", data=SKAT.example)
SKAT_CommonRare(Z, obj)$p.value
SKAT_CommonRare(Z, obj, method="A")$p.value
SKAT_CommonRare(Z, obj, method="AR")$p.value


# dichotomous trait 
obj<-SKAT_Null_Model(y.b ~ X, out_type="D", data=SKAT.example)

# Combined sum test in the manuscript (SKAT-C and Burden-C)
SKAT_CommonRare(Z, obj)$p.value
SKAT_CommonRare(Z, obj, r.corr.rare=1, r.corr.common=1 )$p.value


# Test only with common variant
SKAT_CommonRare(Z, obj, test.type="Common.Only")$p.value

# Test only with rare variant
SKAT_CommonRare(Z, obj, test.type="Rare.Only")$p.value


# Use CommonRare_Cutoff=0.01 instead of CommonRare_Cutoff = NULL
SKAT_CommonRare(Z, obj, CommonRare_Cutoff=0.01)$p.value

# Use custom weights; the first 10 variants have higher weights 
weights<-rep(1,67); weights[1:10]<-2
SKAT_CommonRare(Z, obj, weights=weights)$p.value

SNP set test (both common and rare variants) for binary traits with robust region-based methods

Description

This function computes p-values of robust burden test, SKAT, and SKAT-O for binary traits using SPA and ER.

Usage

SKAT_CommonRare_Robust(Z, obj, kernel = "linear.weighted"
	, method="SKAT", r.corr=NULL,weights.beta.rare=c(1,25)
	, weights.beta.common=c(0.5,0.5), weights = NULL
	, CommonRare_Cutoff=NULL, impute.method = "bestguess"
	, is_check_genotype=TRUE,is_dosage = FALSE, missing_cutoff=0.15
	, max_maf=1, estimate_MAF=1)

	SKAT_CommonRare_Robust.SSD.OneSet(SSD.INFO
	, SetID, obj, ..., obj.SNPWeight=NULL)

	SKAT_CommonRare_Robust.SSD.OneSet_SetIndex(SSD.INFO
	, SetIndex, obj, ..., obj.SNPWeight=NULL )

Arguments

Z

a numeric genotype matrix with each row as a different individual and each column as a separate gene/snp. Each genotype should be coded as 0, 1, 2, and 9 (or NA) for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele. Now we support both matrix and sparse matrix.

obj

output object from SKAT_Null_Model.

kernel

type of kernel (default= "linear.weighted"). The possible choices are "linear" and "linear.weighted".

method

type of gene based test (default= "SKAT"). The possible choices are "SKAT", "Burden" and "SKATO", which represents robust SKAT, Burden and SKAT-O tests, respectively.

r.corr

the ρ\rho parameter for all variants. ρ\rho =0 and 1 indicate SKAT and Burden test, respectively.

weights.beta.rare

a numeric vector of parameters of beta weights for rare variants. It is only used for weighted kernels. If you want to use your own weights, please specify the "weights" parameter.

weights.beta.common

a numeric vector of parameters of beta weights for common variants. It is only used for weighted kernels. If you want to use your own weights, please specify the "weights" parameter.

weights

a numeric vector of weights for the weighted kernels. See SKAT page for details.

CommonRare_Cutoff

MAF cutoff for common vs rare variants (default=NULL). It should be a numeric value between 0 and 0.5, or NULL. When it is NULL, 1/2SampleSize1/ \sqrt{2 SampleSize } will be used.

impute.method

a method to impute missing genotypes (default= "bestguess"). "bestguess" imputes missing genotypes as most likely values (0,1,2), "random" imputes missing genotypes by generating binomial(2,p) random variables (p is the MAF), and "fixed" imputes missing genotypes by assigning the mean genotype value (2p).

is_check_genotype

a logical value indicating whether to check the validity of the genotype matrix Z (default= TRUE). See SKAT page for details.

is_dosage

a logical value indicating whether the matrix Z is a dosage matrix. If it is TRUE, SKAT will ignore “is_check_genotype”.

missing_cutoff

a cutoff of the missing rates of SNPs (default=0.15). Any SNPs with missing rates higher than the cutoff will be excluded from the analysis.

max_maf

a cutoff of the maximum minor allele frequencies (MAF) (default=1, no cutoff). Any SNPs with MAF > cutoff will be excluded from the analysis.

estimate_MAF

a numeric value indicating how to estimate MAFs for the weight calculation and the missing genotype imputation. See SKAT page for details.

SSD.INFO

an SSD_INFO object returned from Open_SSD.

SetID

a character value of Set ID. You can find a set ID of each set from SetInfo object of SSD.INFO. In SKATBinary_Robust function, this parameter is for the internal use only.

SetIndex

a numeric value of Set index. You can find a set index of each set from SetInfo object of SSD.INFO

obj.SNPWeight

output object from Read_SNP_WeightFile (default=NULL). If NULL, the beta weight with the “weights.beta” parameter will be used.

...

further arguments to be passed to “SKATBinary_Robust”

Value

p.value

p-value. It will be the p-value based on robust methods.

p.value_singlevariant

p-value for each single variant in this region-based test.

param$n.marker

a number of SNPs in the genotype matrix.

param$n.marker.test

a number of SNPs used for the test. It can be different from param$n.marker when some markers are monomorphic or have higher missing rates than the missing_cutoff.

param$rho

the ρ\rho parameter for all variants.

n.common

A number of common markers used for the test.

mac.common

The total minor allele count (MAC) of common markers used for the test.

n.rare

A number of rare markers used for the test.

mac.rare

The total minor allele count (MAC) of rare markers used for the test.

Author(s)

Zhangchen Zhao

References

Zhao, Z., Bi, W., Zhou, W., VandeHaar, P., Fritsche, L. G., & Lee, S. (2019). UK-Biobank Whole Exome Sequence Binary Phenome Analysis with Robust Region-based Rare Variant Test. The American Journal of Human Genetics, in press.

Examples

data(SKATBinary.example)
Z<-SKATBinary.example$Z

obj<-SKAT_Null_Model(y ~ x1 + x2, out_type="D", data=SKATBinary.example)

# run SKAT (default method) with Hybrid
out = SKAT_CommonRare_Robust(Z, obj)

# p-value
out$p.value

#
#	Run burden and SKAT

SKAT_CommonRare_Robust(Z, obj, method="Burden")$p.value
SKAT_CommonRare_Robust(Z, obj, method="SKAT")$p.value

Get parameters and residuals from the null model with incorporating the kinship structure

Description

Compute model parameters and residuals for SKAT with incorporating the kinship structure.

Usage

SKAT_NULL_emmaX (formula, data=NULL, K=NULL, 
Kin.File=NULL, ngrids=100, llim=-10, ulim=10, esp=1e-10, Is.GetEigenResult=FALSE)

Arguments

formula

an object of class “formula”: a symbolic description of the NULL model to be fitted.

data

an optional data frame containing the variables in the model (default=NULL). If it is NULL, the variables are taken from 'environment(formula)'

K

a kinship matrix. If K=NULL, the function reads Kin.File to get a kinship matrix.

Kin.File

an emmax-kin output file name. If K=NULL, the function reads this file.

ngrids

Number of grids to search for the optimal variance component

llim

Lower bound of log ratio of two variance components

ulim

Upper bound of log ratio of two variance components

esp

Tolerance of numerical precision error

Is.GetEigenResult

Return intermediate eigen-decomposition results

Details

The Emma package code was used to implement this function.

Resampling is not implemented.

Value

This function returns an object that has model parameters and residuals of the NULL model of no associations. After obtaining it, use SKAT function to carry out the association test.

Author(s)

Seunggeun Lee

Examples

data(SKAT.fam.example)

K = SKAT.fam.example$K
Z = SKAT.fam.example$Z
obj<-SKAT_NULL_emmaX(y ~ X, K=K, data=SKAT.fam.example)
SKAT(Z, obj)$p.value

# SKAT-O
SKAT(Z, obj, method="optimal.adj")$p.value

Get parameters and residuals from the NULL model

Description

Compute model parameters and residuals for SKAT. You also can obtain resampled residuals that can be used to compute resampling p-value or to control family-wise error rate.

Usage

SKAT_Null_Model(formula, data=NULL, out_type="C", n.Resampling=0
, type.Resampling="bootstrap", Adjustment=TRUE)

SKAT_Null_Model_ChrX(formula, SexVar, data=NULL, out_type="C", n.Resampling=0
, type.Resampling="bootstrap", Adjustment=TRUE, Model.Y=FALSE)

Arguments

formula

an object of class “formula”: a symbolic description of the NULL model to be fitted.

data

an optional data frame containing the variables in the model (default=NULL). If it is NULL, the variables are taken from 'environment(formula)'

out_type

an indicator of the outcome type. "C" for the continuous outcome and "D" for the dichotomous outcome.

n.Resampling

a numeric value of the number of resampling (default=0). If you don't want resampling, please set n.Resampling=0.

type.Resampling

resampling methods (default="bootstrap"). see details.

Adjustment

If TRUE, a small sample adjustment will be applied when the sample size < 2000 and the trait is binary (default=TRUE). See details

SexVar

a sex variable name in “formula”.

Model.Y

indicator variable whether the model will be also used for ChrY. It should be TRUE if you want to use SKAT_ChrY function

Details

There are 2 different methods to get resampled residuals. "bootstrap" conducts the parametric bootstrap to resample residuals under the NULL model with considering covariates. "bootstrap.fast" (only for binary traits) is a fast implementation of "bootstrap". If there is no covariate, "bootstrap" is equivalent to the permutation method.

When the trait is binary, the SKAT can produce conservative results when the sample size is small. To address this, we developed a small sample adjustment method, which adjusts asymptotic null distribution by estimating small sample moments. See also SKAT_Null_Model_MomentAdjust.

Since small sample adjustment uses random sampling to estimate the kurtosis of the test statistics, SKAT with the (kurtosis-based) small sample adjustment can yield slightly different p-values for each run. If you want to reproduce p-values, please set a seed number using set.seed function in R.

We recently developed more advanced methods to get p-values for binary traits, and the methods are implemented in SKATBinary. We recommend to use SKATBinary function instead of SKAT when your trait is binary.

Value

This function returns an object that has model parameters and residuals of the NULL model of no association between genetic variables and outcome phenotypes. After obtaining it, please use SKAT function to conduct the association test.

Author(s)

Seunggeun Lee

Examples

data(SKAT.example)
Z<-SKAT.example$Z
#############################################################
#	Compute the P-value of SKAT 

# binary trait
obj<-SKAT_Null_Model(y.b ~ X, out_type="D", data=SKAT.example)
SKAT(Z, obj, kernel = "linear.weighted")$p.value


#############################################################
# 	When you have no covariate to adjust.

# binary trait
obj<-SKAT_Null_Model(y.b ~ 1, out_type="D", data=SKAT.example)
SKAT(Z, obj, kernel = "linear.weighted")$p.value



#########################################################
# Small sample adjustment
IDX<-c(1:100,1001:1100)

# With-adjustment
obj<-SKAT_Null_Model(y.b[IDX] ~ X[IDX,],out_type="D", data=SKAT.example)
SKAT(Z[IDX,], obj, kernel = "linear.weighted")$p.value

# Without-adjustment
obj<-SKAT_Null_Model(y.b[IDX] ~ X[IDX,],out_type="D", Adjustment=FALSE, data=SKAT.example)
SKAT(Z[IDX,], obj, kernel = "linear.weighted")$p.value

#########################################################
# 	Use SKATBinary 

SKATBinary(Z[IDX,], obj, kernel = "linear.weighted")$p.value

Get parameters and residuals from the NULL model for small sample adjustment

Description

Compute model parameters and residuals for SKAT with adjusting small sample moments when the trait is binary. You can also obtain resampled residuals that can be used to compute resampling p-value or to control family-wise error rate.

Usage

SKAT_Null_Model_MomentAdjust(formula, data=NULL, n.Resampling=0,
type.Resampling="bootstrap", is_kurtosis_adj=TRUE, n.Resampling.kurtosis=10000)

Arguments

formula

object of class “formula”: a symbolic description of the NULL model to be fitted.

data

optional data frame containing the variables in the model (default=NULL). If it is NULL, the variables are taken from 'environment(formula)'

n.Resampling

a numeric value of the number of resampling (default=0). If you don't want resampling, please set n.Resampling=0.

type.Resampling

resampling methods (default="bootstrap"). see details.

is_kurtosis_adj

If TRUE, the kurtosis adjustment will be applied. The small sample kurtosis will be estimated using the resampled phenotypes.

n.Resampling.kurtosis

a numeric value of the number of resampling for kurtosis estimation (default=10000). If is_kurtosis_ad=FALSE, it will be ignored.

Details

When the trait is binary, the SKAT can produce conservative results when the sample size is small. To address this, we developed a small sample adjustment method, which adjust asymptotic null distribution by estimating small sample variance and kurtosis. The small smaple variance is estimated analytically, and the small sample kurtosis is estimated using the resampling approach.

There are 2 different methods to get resampled residuals. "bootstrap" conducts the parametric bootstrap to resample residuals under the NULL model with considering covariates. "bootstrap.fast" (only for binary traits) is a fast implementation of "bootstrap". If there is no covariate, "bootstrap" is equivalent to the permutation method.

Since the kurtosis is estimated using random samples, SKAT with the kurtosis-based small sample adjustment can yield slightly different p-values for each run. If you want to reproduce p-values, please set a seed number using set.seed function in R.

Value

This function returns an object that has model parameters and residuals of the NULL model of no association between genetic variants and outcome phenotypes. After obtaining it, please use SKAT function to conduct association tests.

Author(s)

Seunggeun Lee

Examples

data(SKAT.example)
Z<-SKAT.example$Z
#############################################################
#	Compute the P-value of SKAT 

IDX<-c(1:100,1001:1100)

# binary trait
obj<-SKAT_Null_Model_MomentAdjust(y.b[IDX] ~ X[IDX,], data=SKAT.example)
SKAT(Z[IDX,], obj, kernel = "linear.weighted")$p.value

Example data for SKAT

Description

Example data for SKAT.

Format

SKAT.example contains the following objects:

Z

a numeric genotype matrix of 2000 individuals and 67 SNPs. Each row represents a different individual, and each column represents a different SNP marker.

X

a numeric matrix of 2 covariates.

y.c

a numeric vector of continuous phenotypes.

y.b

a numeric vector of binary phenotypes.


Example data for SKAT

Description

Example data for SKAT.

Format

SKAT.example contains the following objects:

Z

a numeric genotype matrix of 2000 individuals and 52 SNPs. Each row represents a different individual, and each column represents a different SNP marker.

x1

a numeric vector of continuous covariates.

x2

a numeric vector of binary covariates.

Gender

a numeric vector of gender (male=1, female=2).

y

a numeric vector of binary phenotypes.

Z.A

a list object of 10 genotype matrices.


Example data for SKAT_NULL_emmaX

Description

Example data for SKAT_emmaX.

Format

Example contains the following objects:

Z

a numeric genotype matrix Each row represents a different individual, and each column represents a different SNP marker.

X

a numeric matrix of covariates.

y

a numeric vector of continuous phenotypes.

K

a kinship matrix.


Haplotype dataset for power calculation

Description

Haplotype dataset generated by COSI with mimicking linkage disequilibrium (LD) structure of European ancestry.

Format

This list object contains the following objects:

Haplotype

a numeric matrix of 10,000 haplotypes over 200k BP region. Each row represents a different haplotype, and each column represents a different SNP marker. It is generated by the calibration coalescent model (COSI) with mimicking LD structure of European ancestry.

SNPInfo

a dataframe of SNP information.

References

Schaffner, S.F. and Foo, C. and Gabriel, S. and Reich, D. and Daly, M.J. and Altshuler, D. (2005) Calibrating a coalescent simulation of human genome sequence variation. Genome Research, 15, 1576-1583.


SNP-set Kernel Association Test

Description

Iteratively carry out association tests with phenotypes and SNP sets in SSD file.

Usage

SKAT.SSD.All(SSD.INFO, obj, ..., obj.SNPWeight=NULL)
	
	SKATBinary.SSD.All(SSD.INFO, obj, ..., obj.SNPWeight=NULL)
	
	SKATBinary_Robust.SSD.All(SSD.INFO, obj, ...,obj.SNPWeight=NULL)
	
	SKAT_CommonRare.SSD.All(SSD.INFO, obj, ..., obj.SNPWeight=NULL)

	SKAT_CommonRare_Robust.SSD.All(SSD.INFO, obj, ..., obj.SNPWeight=NULL)

Arguments

SSD.INFO

SSD_INFO object returned from Open_SSD.

obj

output object from SKAT_Null_Model.

...

further arguments to be passed to “SKAT” or “SKATBinary”.

obj.SNPWeight

output object from Read_SNP_WeightFile (default=NULL). If NULL, the beta weight with the “weights.beta” parameter will be used.

Details

Please see SKAT or SKATBinary for details.

Value

results

dataframe that contains SetID, p-values (P.value), the number of markers in the SNP sets (N.Marker.All), and the number of markers to test for an association after excluding non-polymorphic or high missing rates markers (N.Marker.Test). The output dataframe from SKATBinary.SSD.All (and others) have more columns. For example, the outcome from SKATBinary.SSD.All have columns for the method to compute p-values and the minimum achievable p-values (MAP).

P.value.Resampling

the matrix that contains p-values of resampled phenotypes.

OUT.snp.mac

each element in the list is a vector of MAC of SNPs used in the test. The names are SNP-IDs.

Author(s)

Seunggeun Lee


SNP set test for binary traits with asymptotic and efficient resampling methods

Description

This function computes p-values of Burden test, SKAT, and SKAT-O for binary traits using asymptotic and efficient resampling methods.

Usage

SKATBinary(Z, obj, kernel = "linear.weighted", method="SKAT"
	, method.bin="Hybrid", weights.beta=c(1,25), weights = NULL
	, r.corr=0, impute.method = "bestguess", is_check_genotype=TRUE
	, is_dosage = FALSE, missing_cutoff=0.15, max_maf=1
	, estimate_MAF=1, N.Resampling=2 *10^6, seednum=100, epsilon=10^-6
	, SetID=NULL)
	
	SKATBinary.SSD.OneSet(SSD.INFO, SetID, obj, ... ,obj.SNPWeight=NULL)

	SKATBinary.SSD.OneSet_SetIndex(SSD.INFO, SetIndex, obj, ... ,obj.SNPWeight=NULL)

Arguments

Z

a numeric genotype matrix with each row as a different individual and each column as a separate gene/snp. Each genotype should be coded as 0, 1, 2, and 9 (or NA) for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele.

obj

output object from SKAT_Null_Model.

kernel

type of kernel (default= "linear.weighted"). See SKAT page for details.

method

type of gene based test (default= "SKAT"). The possible choices are "SKAT", "Burden" and "SKATO", which represents SKAT, Burden and SKAT-O tests, respectively. This parameter differs from the "method" parameter in SKAT function. "Burden" is equivalent to method="davies" and r.corr=1 and "SKATO" is equivalent to method="optimal.adj" in the SKAT function. When method="Burden" or method="SKATO", r.corr will be ignored.

method.bin

type of method to compute a p-value (default="Hybrid"). Possible choices are "Hybrid", "ER", "ER.A", "QA", "MA" and "UA". See details

weights.beta

a numeric vector of parameters of beta weights. It is only used for weighted kernels. If you want to use your own weights, please specify the “weights” parameter.

weights

a numeric vector of weights for the weighted kernels. See SKAT page for details.

impute.method

a method to impute missing genotypes (default= "bestguess"). "bestguess" imputes missing genotypes as most likely values (0,1,2), "random" imputes missing genotypes by generating binomial(2,p) random variables (p is the MAF), and "fixed" imputes missing genotypes by assigning the mean genotype value (2p).

r.corr

the ρ\rho parameter for the compound symmetric correlation structure kernel (default= 0). If it is a vector, SKAT will conduct the optimal test. It is ignored when method="Burden" or method="SKATO".

is_check_genotype

a logical value indicating whether to check the validity of the genotype matrix Z (default= TRUE). See SKAT page for details.

is_dosage

a logical value indicating whether the matrix Z is a dosage matrix. If it is TRUE, SKAT will ignore “is_check_genotype”.

missing_cutoff

a cutoff of the missing rates of SNPs (default=0.15). Any SNPs with missing rates higher than the cutoff will be excluded from the analysis.

max_maf

a cutoff of the maximum minor allele frequencies (MAF) (default=1, no cutoff). Any SNPs with MAF > cutoff will be excluded from the analysis.

estimate_MAF

a numeric value indicating how to estimate MAFs for the weight calculation and the missing genotype imputation. See SKAT page for details.

N.Resampling

a number of resampling to be conducted to get p-values (default=2 *10^6).

seednum

a seed number for random number generation (default=100). If NULL, no seed number will be assigned.

epsilon

a precision level (default=10^-6).

SSD.INFO

an SSD_INFO object returned from Open_SSD.

SetID

a character value of Set ID. You can find a set ID of each set from SetInfo object of SSD.INFO. In SKATBinary function, this parameter is for the internal use only.

SetIndex

a numeric value of Set index. You can find a set index of each set from SetInfo object of SSD.INFO

...

further arguments to be passed to “SKATBinary”

obj.SNPWeight

an output object of Read_SNP_WeightFile (default=NULL). If NULL, the beta weight with the “weights.beta” parameter will be used.

Details

This function implements six methods (method.bin) to compute p-values: 1) Efficient resampling (ER); 2) Quantile adjusted moment matching (QA); 3) Moment matching adjustment (MA); 4) No adjustment (UA); 5) Adaptive ER (ER.A); and 6) Hybrid. "Hybrid" selects a method based on the total minor allele count (MAC), the number of individuals with minor alleles (m), and the degree of case-control imbalance. When method.bin="ER" or "ER.A", SKATBinary compute mid-p-values and minimum achievable mid p-values.

If seednum is not NULL, set.seed(seednum) function is used to specify seeds to get the same p-values of ER based methods for different runs. Therefore, please set seednum=NULL, if you do not want to set seeds.

SKATBinary uses impute.method="bestguess" as a default method for the imputation, which is different from SKAT that uses impute.method="fixed" as a default method. We changed it because SKATBinary with impute.method="fixed" can yield false positives when variates are very rare and missing rates between cases and controls are unbalanced. When missing rates between cases and controls are highly unbalanced, SKAT impute.method="fixed" can also yield false positives, but it happens less likely. So we did not change the default imputation method in SKAT.

Value

p.value

p-value. It will be the mid p-value if ER or ER.A are used to compute the p-value.

p.value.standard

(ER and ER.A only) standard p-value.

p.value.resampling

p-values from resampled outcome. You can obtain it when n.Resampling (in SKAT_Null_Model) > 0. See SKAT_Null_Model page.

p.value.standard.resampling

(ER and ER.A only)standard p-values from resampled outcomes.

m

the number of individuals with minor alleles.

MAP

minimum possible p-values. It is available when the method.bin="ER" and m is sufficiently small.

MAC

total minor allele count (MAC).

n.total

(ER only) the number of resampling to be generated to get the p-value. It can be smaller than N.Resampling when the total number of configurations of case-controls among individuals with minor alleles are smaller than N.Resampling.

is.accurate

logical value for the accuracy of the p-value. If it is false, more resampling is needed to accurately estimate the p-value.

param$n.marker

a number of SNPs in the genotype matrix

param$n.marker.test

a number of SNPs used for the test. It can be different from param$n.marker when some markers are monomorphic or have higher missing rates than the missing_cutoff.

method.bin

a type of method to be used to compute the p-value.

test.snp.mac

a vector of minor allele count (MAC) of the snps tested. The name is SNP-ID.

Author(s)

Seunggeun Lee

References

Lee, S., Fuchsberger, C., Kim, S., Scott, L. (2015) An efficient resampling method for calibrating single and gene-based rare variant association analysis in case-control studies. Biostatistics, in press.

Examples

data(SKATBinary.example)
Z<-SKATBinary.example$Z

obj<-SKAT_Null_Model(y ~ x1 + x2, out_type="D", data=SKATBinary.example)

# run SKAT (default method) with Hybrid
out = SKATBinary(Z, obj)

# p-value
out$p.value

# MAP
out$MAP

# method used to compute p-value (method.bin)
out$method.bin


#
#	Run burden and SKAT-O with Hybrid

SKATBinary(Z, obj, method="Burden")$p.value
SKATBinary(Z, obj, method="SKATO")$p.value

#
#	Run with SKAT-QA, -MA and -UA

SKATBinary(Z, obj, method.bin="QA")$p.value

SKATBinary(Z, obj, method.bin="MA")$p.value

SKATBinary(Z, obj, method.bin="UA")$p.value

# UA from SKAT function
SKAT(Z, obj)$p.value


#
#	Run with Adaptive ER

out =SKATBinary(Z, obj, method.bin="ER.A")

out$p.value

# the number of total resampling is smaller than 2*10^6 (default value)
out$n.total

SNP set test for binary traits with robust region-based methods

Description

This function computes p-values of robust burden test, SKAT, and SKAT-O for binary traits using SPA and ER.

Usage

SKATBinary_Robust(Z, obj, kernel = "linear.weighted", method="SKAT"
	, r.corr=NULL, weights.beta=c(1,25), weights = NULL
	, impute.method = "bestguess",  is_check_genotype=TRUE
  , is_dosage = FALSE, missing_cutoff=0.15, max_maf=1
	, estimate_MAF=1)

	SKATBinary_Robust.SSD.OneSet(SSD.INFO
	, SetID, obj, ...,obj.SNPWeight=NULL)

	SKATBinary_Robust.SSD.OneSet_SetIndex(SSD.INFO
	, SetIndex, obj, ... ,obj.SNPWeight=NULL)

Arguments

Z

a numeric genotype matrix with each row as a different individual and each column as a separate gene/snp. Each genotype should be coded as 0, 1, 2, and 9 (or NA) for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele. Now we support both matrix and sparse matrix.

obj

output object from SKAT_Null_Model.

kernel

type of kernel (default= "linear.weighted"). The possible choices are "linear" and "linear.weighted".

method

type of gene based test (default= "SKAT"). The possible choices are "SKAT", "Burden" and "SKATO", which represents robust SKAT, Burden and SKAT-O tests, respectively.

r.corr

the ρ\rho parameter for all variants. ρ\rho =0 and 1 indicate SKAT and Burden test, respectively.

weights.beta

a numeric vector of parameters of beta weights. It is only used for weighted kernels. If you want to use your own weights, please specify the “weights” parameter.

weights

a numeric vector of weights for the weighted kernels. See SKAT page for details.

impute.method

a method to impute missing genotypes (default= "bestguess"). "bestguess" imputes missing genotypes as most likely values (0,1,2), "random" imputes missing genotypes by generating binomial(2,p) random variables (p is the MAF), and "fixed" imputes missing genotypes by assigning the mean genotype value (2p).

is_check_genotype

a logical value indicating whether to check the validity of the genotype matrix Z (default= TRUE). See SKAT page for details.

is_dosage

a logical value indicating whether the matrix Z is a dosage matrix. If it is TRUE, SKAT will ignore “is_check_genotype”.

missing_cutoff

a cutoff of the missing rates of SNPs (default=0.15). Any SNPs with missing rates higher than the cutoff will be excluded from the analysis.

max_maf

a cutoff of the maximum minor allele frequencies (MAF) (default=1, no cutoff). Any SNPs with MAF > cutoff will be excluded from the analysis.

estimate_MAF

a numeric value indicating how to estimate MAFs for the weight calculation and the missing genotype imputation. See SKAT page for details.

SSD.INFO

an SSD_INFO object returned from Open_SSD.

SetID

a character value of Set ID. You can find a set ID of each set from SetInfo object of SSD.INFO. In SKATBinary_Robust function, this parameter is for the internal use only.

SetIndex

a numeric value of Set index. You can find a set index of each set from SetInfo object of SSD.INFO

obj.SNPWeight

output object from Read_SNP_WeightFile (default=NULL). If NULL, the beta weight with the “weights.beta” parameter will be used.

...

further arguments to be passed to “SKATBinary_Robust”

Value

p.value

p-value. It will be the p-value based on robust methods.

p.value_singlevariant

p-value for each single variant in this region-based test.

mac

total minor allele count (MAC).

param$n.marker

a number of SNPs in the genotype matrix.

param$n.marker.test

a number of SNPs used for the test. It can be different from param$n.marker when some markers are monomorphic or have higher missing rates than the missing_cutoff.

param$rho

the ρ\rho parameter for all variants.

test.snp.mac

a vector of minor allele count (MAC) of the snps tested. The name is SNP-ID.

Author(s)

Zhangchen Zhao

References

Zhao, Z., Bi, W., Zhou, W., VandeHaar, P., Fritsche, L. G., & Lee, S. (2019). UK-Biobank Whole Exome Sequence Binary Phenome Analysis with Robust Region-based Rare Variant Test. The American Journal of Human Genetics, in press.

Examples

data(SKATBinary.example)
Z<-SKATBinary.example$Z

obj<-SKAT_Null_Model(y ~ x1 + x2, out_type="D", data=SKATBinary.example)

# run SKAT (default method) with Hybrid
out = SKATBinary_Robust(Z, obj)

# p-value
out$p.value

#
#	Run burden and SKAT

SKATBinary_Robust(Z, obj, method="Burden")$p.value
SKATBinary_Robust(Z, obj, method="SKAT")$p.value

Single variant tests for binary traits with Firth and efficient resampling methods

Description

This function computes p-values of single variant test using the firth and efficient resampling methods.

Usage

SKATBinary_Single(Z, obj, method.bin="Hybrid"
	, impute.method = "bestguess", is_check_genotype=TRUE, is_dosage = FALSE
	, missing_cutoff=0.15, max_maf=1, estimate_MAF=1
	, N.Resampling=2*10^6, seednum=100, epsilon=10^-6)

Arguments

Z

a numeric genotype vector. Each genotype should be coded as 0, 1, 2, and 9 (or NA) for AA, Aa, aa, and missing, where A is a major allele and a is a minor allele.

obj

output object from SKAT_Null_Model.

method.bin

a type of method to compute a p-value (default="Hybrid"). See details.

impute.method

a method to impute missing genotypes (default= "bestguess").

is_check_genotype

a logical value indicating whether to check the validity of the genotype matrix Z (default= TRUE). See SKAT page for details.

is_dosage

a logical value indicating whether the matrix Z is a dosage matrix. If it is TRUE, SKAT will ignore “is_check_genotype”.

missing_cutoff

a cutoff of the missing rates of SNPs (default=0.15). Any SNPs with missing rates higher than the cutoff will be excluded from the analysis.

max_maf

a cutoff of the maximum minor allele frequencies (MAF) (default=1, no cutoff). Any SNPs with MAF > cutoff will be excluded from the analysis.

estimate_MAF

a numeric value indicating how to estimate MAFs for the weight calculation and the missing genotype imputation. See SKAT page for details

N.Resampling

a number of resampling to be conducted to get p-values (default=2 *10^6).

seednum

a seed number for random number generation (default=100). If NULL, no seed number will be assigned.

epsilon

a precision level (default=10^-6).

Details

This function implements three methods (method.bin) to compute p-values: 1) Efficient resampling (ER); 2) Firth biased adjusted likelihood ratio test (Firth); and 3) Hybrid. "Hybrid" selects a method based on the total minor allele count (MAC), the number of individuals with minor alleles (m), and the degree of case-control imbalance.

Adaptive ER (ER.A) is not implemented yet.

If seednum is not NULL, set.seed(seednum) function is used to specify seeds to get the same p-values of ER based methods for different runs. Therefore, please set seednum=NULL, if you do not want to set seeds.

Value

p.value

p-value. It will be the mid p-value if ER is used to compute the p-value.

p.value.standard

(ER only) standard p-value.

p.value.resampling

p-values from resampled outcome. You can obtain it when n.Resampling in SKAT_Null_Model was > 0. See the SKAT_Null_Model.

p.value.standard.resampling

(ER only) standard p-values from resampled outcome.

m

the number of individuals with minor alleles.

MAP

the minimum possible p-values. It is available when the method.bin="ER" and m is sufficiently small.

MAC

the total minor allele count (MAC).

n.total

(ER only) the number of resampling to be generated to get the p-value. It can be smaller than N.Resampling when the total number of configurations of case-controls among individuals with minor alleles are smaller than N.Resampling.

is.accurate

logical value for the accuracy of the p-value. If it is false, more resampling is needed to accurately estimate the p-value.

method.bin

a type of method to be used to compute the p-value.

Author(s)

Seunggeun Lee

References

Lee, S., Fuchsberger, C., Kim, S., Scott, L. (2015) An efficient resampling method for calibrating single and gene-based rare variant association analysis in case-control studies. Biostatistics, in press.

Examples

data(SKATBinary.example)
Z<-SKATBinary.example$Z


obj<-SKAT_Null_Model(y ~ x1 + x2, out_type="D", data=SKATBinary.example)
out = SKATBinary_Single(Z[,1], obj)

# p-value
out$p.value

# MAP
out$MAP

# method used to compute p-value (method.bin)
out$method.bin


#
#	Use firth method to compute p-value
SKATBinary_Single(Z[,1], obj, method.bin="Firth")$p.value

Example data for SKAT

Description

Example data for SKAT.

Format

SKAT.example contains the following objects:

Z

a numeric genotype matrix of 2000 individuals and 11 SNPs. Each row represents a different individual, and each column represents a different SNP marker.

x1

a numeric vector of continuous covariates.

x2

a numeric vector of binary covariates.

y

a numeric vector of binary phenotypes.

Z.A

a list object of 30 genotype matrices.


Internal variables for SSD and functions for other packages

Description

Internal variable for SSD and functions for other packages