摘要
建筑信息模型(BIM)已经成为建筑行业信息技术应用的有效方案。随着BIM数据不断增长,为了高效使用BIM数据,很多研究将自然语言处理(NLP)引入BIM应用中。在中文环境中,由于缺乏建筑行业的术语特征,导致基础环节的中文分词在建筑领域BIM应用中的适应性较差。通过分析当前流行的BIM数据格式工业基础类(industry foundation class,IFC)文件,从中提取BIM模型特征,配合建筑领域术语特征加入分词模型中,以提高中文分词在建筑领域的性能。实验结果表明,与原始条件随机场(CRF)分词模型相比,在建筑领域测试集上,分词模型的F-measure提高了1.26%,其中,在仅加入BIM模型特征时,F-measure提升了0.10%,说明在分词模型中加入BIM模型特征对于提高中文分词在建筑领域的性能是有效的。同时,在BIM模型测试集上,相较于仅加入建筑领域术语特征,在加入BIM模型特征后,准确率从46.97%提升至87.74%,召回率从67.60%提升至94.77%,F-measure从55.43%提升至91.12%,提升了35.69%,有效提高了中文分词在建筑领域的BIM模型自适应性。
The building information model(BIM)has become an effective solution to information technology applications in the construction industry.With the continuous increase of BIM data,natural language processing(NLP)has been introduced into BIM applications in many studies to effectively utilize BIM data.In the Chinese language environment,due to the absence of terminology features in the building field,Chinese word segmentation cannot be efficiently adapted in BIM application.By analyzing the currently popular industry foundation class(IFC)files in BIM data format,this study extracted BIM model features from IFC files and added them together with architectural terminology characteristics into the statistical word segmentation model,thus improving the adaptability of Chinese word segmentation in the building field.The experimental results show that compared with the original conditional random fields(CRF)based word segmentation model,on the domain test set,the F-measure increased by 1.26%,and F-measure still increased by 0.10%with BIM model features added alone,indicating that appending BIM model features to the segmentation model can effectively improve the performance of Chinese word segmentation in the building field.Meanwhile,on the model test set,compared with the case of architectural terminology characteristics being appended alone,after BIM model features were appended,the precision rate increased from 46.97%to 87.74%,the recall rate from 67.60%to 94.77%,and the F-measure from 55.43%to 91.12%(by 35.69%),thereby effectively boosting the BIM model adaptability of Chinese word segmentation in the building field.
作者
张鑫
周小平
王佳
ZHANG Xin;ZHOU Xiao-ping;WANG Jia(School of Electrical and Information Engineering,Beijing University of Civil Engineering and Architecture,Beijing 100044,China;Beijing Key Laboratory of Intelligent Processing for Building Big Data,Beijing 102616,China)
出处
《图学学报》
CSCD
北大核心
2021年第2期316-324,共9页
Journal of Graphics
基金
国家自然科学基金项目(71601013)
北京市自然科学基金项目(4202017)
北京市青年拔尖人才培育项目(CIT&TCD201904050)
北京建筑大学青年英才项目
北京建筑大学市属高校基本科研业务费专项资金(X20039)。
关键词
建筑信息模型
工业基础类
中文分词
模型自适应
建筑信息提取
building information model
industry foundation classes
Chinese word segmentation
model adaptation
building information extraction