Imaging genetics is an emerging field aimed at identifying and characterizing genetic variants that influence measures derived from anatomical or functional brain images, which are in turn related to brain-related ill...Imaging genetics is an emerging field aimed at identifying and characterizing genetic variants that influence measures derived from anatomical or functional brain images, which are in turn related to brain-related illnesses or fundamental cognitive, emotional and behavioral processes, and are affected by environmental factors. Here we review the recent evolution of statistical approaches and outstanding challenges in imaging genetics, with a focus on population-based imaging genetic association studies. We show the trend in imaging genetics from candidate approaches to pure discovery science, and from univariate to multivariate analyses. We also discuss future directions and prospects of imaging genetics for ultimately helping understand the genetic and environmental underpinnings of various neuropsychiatric disorders and turning basic science into clinical strategies.展开更多
随着科技的创新和社会的进步,数据采集技术得到显著提升,高维数据流(High-dimensional Data Stream,HDS)在医学、机械、工业工程等领域开始广泛出现。除了HDS的在线监控之外,精确而高效的故障诊断也变的越来越重要。在本文中,我们将HDS...随着科技的创新和社会的进步,数据采集技术得到显著提升,高维数据流(High-dimensional Data Stream,HDS)在医学、机械、工业工程等领域开始广泛出现。除了HDS的在线监控之外,精确而高效的故障诊断也变的越来越重要。在本文中,我们将HDS的故障诊断问题阐述为一个新颖的多重检验问题,并通过控制过度遗漏发现概率(Missed Discovery Excessive Probability,MDX)来对HDS进行异常诊断,克服了传统诊断方法的限制,并能够显著的提高异常诊断的稳健性和精确度。我们给出了MDX的Monte-Carlo近似计算方法,并在此基础上提出了Oracle和DataDriven诊断程序。我们通过模拟研究和一个实例分析来阐明所提方法的优越特性。展开更多
In clinical trials, the primary efficacy endpoint often corresponds to a so-called "composite endpoint". Composite endpoints combine several events of interest within a single outcome variable. Thereby it is...In clinical trials, the primary efficacy endpoint often corresponds to a so-called "composite endpoint". Composite endpoints combine several events of interest within a single outcome variable. Thereby it is intended to enlarge the expected effect size and thereby increase the power of the study. However, composite endpoints also come along with serious challenges and problems. On the one hand, composite endpoints may lead to difficulties during the planning phase of a trial with respect to the sample size calculation, asthe expected clinical effect of an intervention on the composite endpoint depends on the effects on its single components and their correlations. This may lead to wrong assumptions on the sample size needed. Too optimistic assumptions on the expected effect may lead to an underpowered of the trial, whereas a too conservatively estimated effect results in an unnecessarily high sample size. On the other hand, the interpretation of composite endpoints may be difficult, as the observed effect of the composite does not necessarily reflect the effects of the single components. Therefore the demonstration of the clinical efficacy of a new intervention by exclusively evaluating the composite endpoint may be misleading. The present paper summarizes results and recommendations of the latest research addressing the above mentioned problems in the planning, analysis and interpretation of clinical trials with composite endpoints, thereby providing a practical guidance for users.展开更多
Variable selection has played an important role in statistical learning and scienti?c discoveries during the past ten years, and multiple testing is a fundamental problem in statistical inference and also has wide app...Variable selection has played an important role in statistical learning and scienti?c discoveries during the past ten years, and multiple testing is a fundamental problem in statistical inference and also has wide applications in many scienti?c ?elds. Signi?cant advances have been achieved in both areas. This study attempts to ?nd a connection between the adaptive LASSO(least absolute shrinkage and selection operator) and multiple testing procedures in linear regression models. We also propose procedures based on multiple testing methods to select variables and control the selection error rate, i.e., the false discovery rate. Simulation studies demonstrate that the proposed methods show good performance relative to controlling the selection error rate under a wide range of settings.展开更多
This study is undertaken to apply a bootstrap method of controlling the false discovery rate (FDR) when performing pairwise comparisons of normal means. Due to the dependency of test statistics in pairwise compariso...This study is undertaken to apply a bootstrap method of controlling the false discovery rate (FDR) when performing pairwise comparisons of normal means. Due to the dependency of test statistics in pairwise comparisons, many conventional multiple testing procedures can't be employed directly. Some modified pro- cedures that control FDR with dependent test statistics are too conservative. In the paper, by bootstrap and goodness-of-fit methods, we produce independent p-values for pairwise comparisons. Based on these indepen- dent p-values, plenty of procedures can be used, and two typical FDR controlling procedures are applied here. An example is provided to illustrate the proposed approach. Extensive simulations show the satisfactory FDR control and power performance of our approach. In addition, the proposed approach can be easily extended to more than two normal, or non-normal, balance or unbalance cases.展开更多
The false discovery proportion (FDP) is a useful measure of abundance of false positives when a large number of hypotheses are being tested simultaneously. Methods for controlling the expected value of the FDP, namely...The false discovery proportion (FDP) is a useful measure of abundance of false positives when a large number of hypotheses are being tested simultaneously. Methods for controlling the expected value of the FDP, namely the false discovery rate (FDR), have become widely used. It is highly desired to have an accurate prediction interval for the FDP in such applications. Some degree of dependence among test statistics exists in almost all applications involving multiple testing. Methods for constructing tight prediction intervals for the FDP that take account of dependence among test statistics are of great practical importance. This paper derives a formula for the variance of the FDP and uses it to obtain an upper prediction interval for the FDP, under some semi-parametric assumptions on dependence among test statistics. Simulation studies indicate that the proposed formula-based prediction interval has good coverage probability under commonly assumed weak dependence. The prediction interval is generally more accurate than those obtained from existing methods. In addition, a permutation-based upper prediction interval for the FDP is provided, which can be useful when dependence is strong and the number of tests is not too large. The proposed prediction intervals are illustrated using a prostate cancer dataset.展开更多
决策与判断研究中(甚至是实验心理学研究中)的许多问题关注某效应是否真实存在,及其背后的解释是什么。这些问题不关注该效应在某一特殊群体中是否显著。因此,可以通过分析单个被试来检验效应的显著性。如果有一个被试表现出了该效应,那...决策与判断研究中(甚至是实验心理学研究中)的许多问题关注某效应是否真实存在,及其背后的解释是什么。这些问题不关注该效应在某一特殊群体中是否显著。因此,可以通过分析单个被试来检验效应的显著性。如果有一个被试表现出了该效应,那么,这个效应就是存在的。根据这一观点,有时也可通过跨案例或者轮次(across cases or rounds)分析来验证效应的显著性,而不需要进行跨被试分析(across subjects)。这一观点也暗示在一些实验中可能存在反方向的效应。本文建议通过进行基于被试个体的统计分析来检验这样的效应,并介绍了一些不同形式的方法:PP概率图(probability probability plots);P值分布检验(tests of the distribution of p-values);分层取样多重检验的矫正(correction for multiple testing with step-down resampling)。这些方法都可以用于处理在对同样假设进行多重检验时无法避免的问题。另外,本文也列举了一些例子,其中有一部分例子存在反方向的效应,另一部分例子不存在。展开更多
全基因组关联研究(Genome-Wide Association Studies,GWAS)可以直接研究人类行为能力和基因型间的关联,为心理学研究者从全基因组层次探索人类行为能力的遗传基础提供了新的手段。GWAS中涉及大量位点和行为的关联检验,所以必须采用多重...全基因组关联研究(Genome-Wide Association Studies,GWAS)可以直接研究人类行为能力和基因型间的关联,为心理学研究者从全基因组层次探索人类行为能力的遗传基础提供了新的手段。GWAS中涉及大量位点和行为的关联检验,所以必须采用多重校正来控制整体虚报。尽管存在多种校正方法可供选择,但GWAS研究中不同校正方法的适用性,目前尚缺少系统研究,使得GWAS中多重校正方法的选择缺少理论和经验依据。GWAS中常用的校正方法有基于族错误率(Family-Wise Error Rate,FWER)标准的Bonferroni校正法,Holm递减调整法,排列检验法和基于错误发现率(False Discovery Rate,FDR)标准的BH法。对这4种多重校正方法的原理和流程进行了详细阐述;提出了一种GWAS数据仿真方法,并基于仿真数据对不同多重校正方法进行了定量比较。结果显示,前3种基于FWER的方法差别很小,它们对虚报的控制最为严格,但是检测出的真实关联的位点数却显著低于基于FDR的BH法。独立数据上,BH法所报告的SNPs对行为具有最高的解释率,即相对于其它方法,BH方法更好的平衡了虚报和击中。未来研究中可以考虑用BH法来对结果进行校正。展开更多
文摘Imaging genetics is an emerging field aimed at identifying and characterizing genetic variants that influence measures derived from anatomical or functional brain images, which are in turn related to brain-related illnesses or fundamental cognitive, emotional and behavioral processes, and are affected by environmental factors. Here we review the recent evolution of statistical approaches and outstanding challenges in imaging genetics, with a focus on population-based imaging genetic association studies. We show the trend in imaging genetics from candidate approaches to pure discovery science, and from univariate to multivariate analyses. We also discuss future directions and prospects of imaging genetics for ultimately helping understand the genetic and environmental underpinnings of various neuropsychiatric disorders and turning basic science into clinical strategies.
文摘In clinical trials, the primary efficacy endpoint often corresponds to a so-called "composite endpoint". Composite endpoints combine several events of interest within a single outcome variable. Thereby it is intended to enlarge the expected effect size and thereby increase the power of the study. However, composite endpoints also come along with serious challenges and problems. On the one hand, composite endpoints may lead to difficulties during the planning phase of a trial with respect to the sample size calculation, asthe expected clinical effect of an intervention on the composite endpoint depends on the effects on its single components and their correlations. This may lead to wrong assumptions on the sample size needed. Too optimistic assumptions on the expected effect may lead to an underpowered of the trial, whereas a too conservatively estimated effect results in an unnecessarily high sample size. On the other hand, the interpretation of composite endpoints may be difficult, as the observed effect of the composite does not necessarily reflect the effects of the single components. Therefore the demonstration of the clinical efficacy of a new intervention by exclusively evaluating the composite endpoint may be misleading. The present paper summarizes results and recommendations of the latest research addressing the above mentioned problems in the planning, analysis and interpretation of clinical trials with composite endpoints, thereby providing a practical guidance for users.
基金supported by National Natural Science Foundation of China (Grant Nos. 11671268, 11522105, and 11690012)
文摘Variable selection has played an important role in statistical learning and scienti?c discoveries during the past ten years, and multiple testing is a fundamental problem in statistical inference and also has wide applications in many scienti?c ?elds. Signi?cant advances have been achieved in both areas. This study attempts to ?nd a connection between the adaptive LASSO(least absolute shrinkage and selection operator) and multiple testing procedures in linear regression models. We also propose procedures based on multiple testing methods to select variables and control the selection error rate, i.e., the false discovery rate. Simulation studies demonstrate that the proposed methods show good performance relative to controlling the selection error rate under a wide range of settings.
基金Supported by the National Natural Science Foundation of China(No.11471030,11471035,71201160)
文摘This study is undertaken to apply a bootstrap method of controlling the false discovery rate (FDR) when performing pairwise comparisons of normal means. Due to the dependency of test statistics in pairwise comparisons, many conventional multiple testing procedures can't be employed directly. Some modified pro- cedures that control FDR with dependent test statistics are too conservative. In the paper, by bootstrap and goodness-of-fit methods, we produce independent p-values for pairwise comparisons. Based on these indepen- dent p-values, plenty of procedures can be used, and two typical FDR controlling procedures are applied here. An example is provided to illustrate the proposed approach. Extensive simulations show the satisfactory FDR control and power performance of our approach. In addition, the proposed approach can be easily extended to more than two normal, or non-normal, balance or unbalance cases.
文摘The false discovery proportion (FDP) is a useful measure of abundance of false positives when a large number of hypotheses are being tested simultaneously. Methods for controlling the expected value of the FDP, namely the false discovery rate (FDR), have become widely used. It is highly desired to have an accurate prediction interval for the FDP in such applications. Some degree of dependence among test statistics exists in almost all applications involving multiple testing. Methods for constructing tight prediction intervals for the FDP that take account of dependence among test statistics are of great practical importance. This paper derives a formula for the variance of the FDP and uses it to obtain an upper prediction interval for the FDP, under some semi-parametric assumptions on dependence among test statistics. Simulation studies indicate that the proposed formula-based prediction interval has good coverage probability under commonly assumed weak dependence. The prediction interval is generally more accurate than those obtained from existing methods. In addition, a permutation-based upper prediction interval for the FDP is provided, which can be useful when dependence is strong and the number of tests is not too large. The proposed prediction intervals are illustrated using a prostate cancer dataset.
基金Supported by a grant from the U.S.-Israel Bi-national Science Foundation (Ilana Ritov, co-PI)
文摘决策与判断研究中(甚至是实验心理学研究中)的许多问题关注某效应是否真实存在,及其背后的解释是什么。这些问题不关注该效应在某一特殊群体中是否显著。因此,可以通过分析单个被试来检验效应的显著性。如果有一个被试表现出了该效应,那么,这个效应就是存在的。根据这一观点,有时也可通过跨案例或者轮次(across cases or rounds)分析来验证效应的显著性,而不需要进行跨被试分析(across subjects)。这一观点也暗示在一些实验中可能存在反方向的效应。本文建议通过进行基于被试个体的统计分析来检验这样的效应,并介绍了一些不同形式的方法:PP概率图(probability probability plots);P值分布检验(tests of the distribution of p-values);分层取样多重检验的矫正(correction for multiple testing with step-down resampling)。这些方法都可以用于处理在对同样假设进行多重检验时无法避免的问题。另外,本文也列举了一些例子,其中有一部分例子存在反方向的效应,另一部分例子不存在。