摘要
定量构效关系(QSAR)模型是填补化学品环境安全数据空缺的重要工具.QSAR模型需要明确定义的应用域,才能合理地用于化学品管理.本文回顾了应用域的3种概念:描述符域、结构域和机理域.基于案例,重点介绍了基于分子指纹与相似性度量指标而计算结构域的方法、结构域的特点和优势.讨论了结构-活性地貌(structure-activity landscape)中呈现的活性悬崖(activity cliffs)现象及其成因.为了更好地理解描述符的适用性,解释QSAR机制及合理选择应用域的表征方法,有必要认识预测终点(endpoint)本质上所描述的系统,该系统复杂性和空间异质性,以及预测终点是否考察了系统行为的涌现.
In the field of environmental science and engineering,quantitative structure-activity relationship(QSAR)means the quantitative relationship between the structure of molecules(or their aggregates e.g.,nanoparticles)and certain endpoints.Herein,endpoints generally refer to physicochemical properties,biological effects or environmental behavior parameters,etc.that can be measured or modeled.Based on data sets of chemical structures and their known endpoint values(i.e.,training set),QSAR models could,by means of specific algorithms,establish the mathematical relationships between the digital features that characterize the molecular structure(i.e.,descriptors)and the endpoint values.Then,the established mathematical relationships can be employed to predict the endpoint values for given chemical structures.QSAR models are important tools for filling the data gap in environmental safety of chemicals and addressing the issues from so-called“emerging pollutants”that are closely related to the improper management of chemicals.Notably,QSAR models are intrinsically data-driven models.The relationships presented in the training set are not necessarily applicable to arbitrary chemical structures.The reliability of QSAR models is always limited to certain applicability domains.Therefore,acceptance of QSAR models in sound management of chemicals requires clearly defined applicability domains.This study reviewed three concepts of the applicability domain:Descriptor domain,structural domain and mechanism domain.For characterizing descriptor domain,methods based on hyper-rectangle,convex hull,joint probability density estimation and various types of distances were described.Notably,when Boolean fingerprints are used as descriptors,these methods become meaningless.Thus,implementation,characters and advantages of the structural domain based on fingerprints and similarity,were specially introduced.Moreover,structure-activity landscapes(SALs),as exemplified by a network-like similarity graph(NSG)and a 3 D topography of the endpoint
作者
王中钰
陈景文
傅志强
李雪花
Zhongyu Wang;Jingwen Chen;Zhiqiang Fu;Xuehua Li(Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology,Key Laboratory of Industrial Ecology and Environmental Engineering(Ministry of Education),School of Environmental Science and Technology,Dalian University of Technology,Dalian 116024,China)
出处
《科学通报》
EI
CAS
CSCD
北大核心
2022年第3期255-266,共12页
Chinese Science Bulletin
基金
国家重点研究发展计划(2018YFC1801604,2018YFE0110700)
国家自然科学基金(21661142001)资助。
关键词
定量构效关系(QSAR)
应用域
描述符
活性悬崖
结构-活性地貌
quantitative structure-activity relationship(QSAR)
applicability domain
descriptor
activity cliffs
structureactivity landscape