A low-than character feature embedding called radical embedding is proposed,and applied on a long-short term memory(LSTM) model for sentence segmentation of pre-modern Chinese texts.The dataset includes over 150 class...A low-than character feature embedding called radical embedding is proposed,and applied on a long-short term memory(LSTM) model for sentence segmentation of pre-modern Chinese texts.The dataset includes over 150 classical Chinese books from 3 different dynasties and contains different literary styles.LSTM-conditional random fields(LSTM-CRF) model is a state-of-the-art method for the sequence labeling problem.This model adds a component of radical embedding,which leads to improved performances.Experimental results based on the aforementioned Chinese books demonstrate better accuracy than earlier methods on sentence segmentation,especial in Tang’s epitaph texts(achieving an F1-score of 81.34%).展开更多
Question answering systems offer a friendly interface for human beings to interact with massive online information. It is time consuming for users to retrieve useful medical information with search engines among massi...Question answering systems offer a friendly interface for human beings to interact with massive online information. It is time consuming for users to retrieve useful medical information with search engines among massive online websites. An effort is made to build a Chinese Question Answering System in Medical Domain(CQASMD) to provide useful medical information for users. A large medical knowledge base with more than 300 thousand medical terms and their descriptions is firstly constructed to store the structured medical knowledge data, and classified with the FastText model. Furthermore, a Word2Vec model is adopted to capture the semantic meanings of words, and the questions and answers are processed with sentence embedding to capture semantic context information. Users' questions are firstly classified and processed into a sentence vector and a matching algorithm is adopted to match the most similar question. After querying the constructed medical knowledge base, the corresponding answers to previous questions are responded to users. The architecture and flowchart of CQASMD is proposed, which will play an important role in self disease diagnosis and treatment.展开更多
Online short-term rental platforms,such as Airbnb,have been becoming popular,and a better pricing strategy is imperative for hosts of new listings.In this paper,we analyzed the relationship between the description of ...Online short-term rental platforms,such as Airbnb,have been becoming popular,and a better pricing strategy is imperative for hosts of new listings.In this paper,we analyzed the relationship between the description of each listing and its price,and proposed a text-based price recommendation system called TAPE to recommend a reasonable price for newly added listings.We used deep learning techniques(e.g.,feedforward network,long short-term memory,and mean shift)to design and implement TAPE.Using two chronologically extracted datasets of the same four cities,we revealed important factors(e.g.,indoor equipment and high-density area)that positively or negatively affect each property’s price,and evaluated our preliminary and enhanced models.Our models achieved a Root-Mean-Square Error(RMSE)of 33.73 in Boston,20.50 in London,34.68 in Los Angeles,and 26.31 in New York City,which are comparable to an existing model that uses more features.展开更多
基金supported by the Fund of the key laboratory of rich-media knowledge organization and service of digital publishing content ( ZD2018-07 /05)
文摘A low-than character feature embedding called radical embedding is proposed,and applied on a long-short term memory(LSTM) model for sentence segmentation of pre-modern Chinese texts.The dataset includes over 150 classical Chinese books from 3 different dynasties and contains different literary styles.LSTM-conditional random fields(LSTM-CRF) model is a state-of-the-art method for the sequence labeling problem.This model adds a component of radical embedding,which leads to improved performances.Experimental results based on the aforementioned Chinese books demonstrate better accuracy than earlier methods on sentence segmentation,especial in Tang’s epitaph texts(achieving an F1-score of 81.34%).
基金the National Natural Science Foundation of China(No.61303094)the Program of Science and Technology Commission of Shanghai Municipality(Nos.16511102400 and 16111107801)the Innovation Program of Shanghai Municipal Education Commission(No.14YZ024)
文摘Question answering systems offer a friendly interface for human beings to interact with massive online information. It is time consuming for users to retrieve useful medical information with search engines among massive online websites. An effort is made to build a Chinese Question Answering System in Medical Domain(CQASMD) to provide useful medical information for users. A large medical knowledge base with more than 300 thousand medical terms and their descriptions is firstly constructed to store the structured medical knowledge data, and classified with the FastText model. Furthermore, a Word2Vec model is adopted to capture the semantic meanings of words, and the questions and answers are processed with sentence embedding to capture semantic context information. Users' questions are firstly classified and processed into a sentence vector and a matching algorithm is adopted to match the most similar question. After querying the constructed medical knowledge base, the corresponding answers to previous questions are responded to users. The architecture and flowchart of CQASMD is proposed, which will play an important role in self disease diagnosis and treatment.
文摘Online short-term rental platforms,such as Airbnb,have been becoming popular,and a better pricing strategy is imperative for hosts of new listings.In this paper,we analyzed the relationship between the description of each listing and its price,and proposed a text-based price recommendation system called TAPE to recommend a reasonable price for newly added listings.We used deep learning techniques(e.g.,feedforward network,long short-term memory,and mean shift)to design and implement TAPE.Using two chronologically extracted datasets of the same four cities,we revealed important factors(e.g.,indoor equipment and high-density area)that positively or negatively affect each property’s price,and evaluated our preliminary and enhanced models.Our models achieved a Root-Mean-Square Error(RMSE)of 33.73 in Boston,20.50 in London,34.68 in Los Angeles,and 26.31 in New York City,which are comparable to an existing model that uses more features.