平方公里阵列(Square Kilometre Array,SKA)射电望远镜将在多个科学方向取得革命性的突破,而SKA软件系统是影响科学产品的关键因素之一.SKA区域中心是天文学家进行SKA数据分析、科学研究和学术交流的平台.处理SKA科学数据的软件环境需...平方公里阵列(Square Kilometre Array,SKA)射电望远镜将在多个科学方向取得革命性的突破,而SKA软件系统是影响科学产品的关键因素之一.SKA区域中心是天文学家进行SKA数据分析、科学研究和学术交流的平台.处理SKA科学数据的软件环境需要具备通用性、灵活性和高适应性.中国科学家已经建成了中国SKA区域中心原型机,部署了被大型超级计算机广泛使用的作业调度系统,并安装了能够处理当前主流射电望远镜观测数据的天文软件,还部署了多个科学数据处理管线,以方便不同科学方向的观测数据的自动化并行处理.本文介绍了中国SKA区域中心原型机的软件平台和处理SKA先导望远镜数据的管线,包括低频连续谱成像管线、谱线成像管线以及甚长基线干涉测量数据处理管线.国内外用户已经基于该平台成功开展了SKA相关科学研究.该平台的建设和运行为未来全面建设中国SKA区域中心提供了宝贵的实践经验.展开更多
In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a paral...In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.展开更多
Current methods for predicting missing values in datasets often rely on simplistic approaches such as taking median value of attributes, limiting their applicability. Real-world observations can be diverse, taking sto...Current methods for predicting missing values in datasets often rely on simplistic approaches such as taking median value of attributes, limiting their applicability. Real-world observations can be diverse, taking stock price as example, ranging from prices post-IPO to values before a company’s collapse, or instances where certain data points are missing due to stock suspension. In this paper, we propose a novel approach using Nonlinear Matrix Completion (NIMC) and Deep Matrix Completion (DIMC) to predict associations, and conduct experiment on financial data between dates and stocks. Our method leverages various types of stock observations to capture latent factors explaining the observed date-stock associations. Notably, our approach is nonlinear, making it suitable for datasets with nonlinear structures, such as the Russell 3000. Unlike traditional methods that may suffer from information loss, NIMC and DIMC maintain nearly complete information, especially in high-dimensional parameters. We compared our approach with state-of-the-art linear methods, including Inductive Matrix Completion, Nonlinear Inductive Matrix Completion, and Deep Inductive Matrix Completion. Our findings show that the nonlinear matrix completion method is particularly effective for handling nonlinear structured data, as exemplified by the Russell 3000. Additionally, we validate the information loss of the three methods across different dimensionalities.展开更多
文摘平方公里阵列(Square Kilometre Array,SKA)射电望远镜将在多个科学方向取得革命性的突破,而SKA软件系统是影响科学产品的关键因素之一.SKA区域中心是天文学家进行SKA数据分析、科学研究和学术交流的平台.处理SKA科学数据的软件环境需要具备通用性、灵活性和高适应性.中国科学家已经建成了中国SKA区域中心原型机,部署了被大型超级计算机广泛使用的作业调度系统,并安装了能够处理当前主流射电望远镜观测数据的天文软件,还部署了多个科学数据处理管线,以方便不同科学方向的观测数据的自动化并行处理.本文介绍了中国SKA区域中心原型机的软件平台和处理SKA先导望远镜数据的管线,包括低频连续谱成像管线、谱线成像管线以及甚长基线干涉测量数据处理管线.国内外用户已经基于该平台成功开展了SKA相关科学研究.该平台的建设和运行为未来全面建设中国SKA区域中心提供了宝贵的实践经验.
文摘In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.
文摘Current methods for predicting missing values in datasets often rely on simplistic approaches such as taking median value of attributes, limiting their applicability. Real-world observations can be diverse, taking stock price as example, ranging from prices post-IPO to values before a company’s collapse, or instances where certain data points are missing due to stock suspension. In this paper, we propose a novel approach using Nonlinear Matrix Completion (NIMC) and Deep Matrix Completion (DIMC) to predict associations, and conduct experiment on financial data between dates and stocks. Our method leverages various types of stock observations to capture latent factors explaining the observed date-stock associations. Notably, our approach is nonlinear, making it suitable for datasets with nonlinear structures, such as the Russell 3000. Unlike traditional methods that may suffer from information loss, NIMC and DIMC maintain nearly complete information, especially in high-dimensional parameters. We compared our approach with state-of-the-art linear methods, including Inductive Matrix Completion, Nonlinear Inductive Matrix Completion, and Deep Inductive Matrix Completion. Our findings show that the nonlinear matrix completion method is particularly effective for handling nonlinear structured data, as exemplified by the Russell 3000. Additionally, we validate the information loss of the three methods across different dimensionalities.