Model validation is the most important part of building a supervised model.For building a model with good generalization performance one must have a sensible data splitting strategy,and this is crucial for model valid...Model validation is the most important part of building a supervised model.For building a model with good generalization performance one must have a sensible data splitting strategy,and this is crucial for model validation.In this study,we con-ducted a comparative study on various reported data splitting methods.The MixSim model was employed to generate nine simulated datasets with different probabilities of mis-classification and variable sample sizes.Then partial least squares for discriminant analysis and support vector machines for classification were applied to these datasets.Data splitting methods tested included variants of cross-validation,bootstrapping,bootstrapped Latin partition,Kennard-Stone algorithm(K-S)and sample set partitioning based on joint X-Y distances algorithm(SPXY).These methods were employed to split the data into training and validation sets.The estimated generalization performances from the validation sets were then compared with the ones obtained from the blind test sets which were generated from the same distribution but were unseen by the train-ing/validation procedure used in model construction.The results showed that the size of the data is the deciding factor for the qualities of the generalization performance estimated from the validation set.We found that there was a significant gap between the performance estimated from the validation set and the one from the test set for the all the data splitting methods employed on small datasets.Such disparity decreased when more samples were available for training/validation,and this is because the models were then moving towards approximations of the central limit theory for the simulated datasets used.We also found that having too many or too few samples in the training set had a negative effect on the estimated model performance,suggesting that it is necessary to have a good balance between the sizes of training set and validation set to have a reliable estimation of model performance.We also found that systematic sampling method such a展开更多
OBJECTIVE: To provide insight into the psychosocial factors underlying the utilisation of health services by women with reproductive tract infection (RTI) symptoms. METHODS: A cross-sectional study, adopting Aday and ...OBJECTIVE: To provide insight into the psychosocial factors underlying the utilisation of health services by women with reproductive tract infection (RTI) symptoms. METHODS: A cross-sectional study, adopting Aday and Andersen' s Social Behaviour Model, was conducted between 1998 and 1999 in Chinese Hebei province and Beijing. A total of 864 eligible married women (age 21 to 60 years) were face to face interviewed. RESULTS: The percentage of self-reported symptoms of RTIs in urban and rural women was 35.6 and 46.8, respectively; the proportion of women with RTIs who utilised health services was 27.5% and 26.7%, respectively. Compared to urban women, rural women had less knowledge on RTIs and more traditional beliefs, and were more satisfied with local health services. The results of logistic regression analysis showed that the common factor influencing health service utilisation in women with RTIs was current experience of RTIs. Knowledge about self-medication, perceived social stigma attached to RTIs, prior experience of RTIs, family income and perceived severity of RTIs were also predictors of utilisation of health services in rural women with RTIs. Satisfaction with health providers, information received from health providers, prior experience of RTIs, occupation and medical care coverage were predictors of utilisation of health services in urban women with RTIs. CONCLUSION: The prevalence of RTIs is high, but the rate of seeking health services is low. There is a great need for emphasizing culturally acceptable reproductive health education in different places to improve women' s ability for self-care. Regular medical check-ups for women are also important. It is necessary to improve the quality of health service, complete the reform of health insurance and alleviate women' s social stigma related to RTIs, giving women social and moral support.展开更多
Previous studies have suggested that the incidence of post-traumatic stress disorder in earthquake rescue workers is relatively high. Risk factors for this disorder include demographic characteristics, earthquake-rela...Previous studies have suggested that the incidence of post-traumatic stress disorder in earthquake rescue workers is relatively high. Risk factors for this disorder include demographic characteristics, earthquake-related high-risk factors, risk factors in the rescue process, personality, social support and coping style. This study examined the current status of a unit of 1 040 rescue workers who participated in earthquake relief for the Wenchuan earthquake that occurred on May 12th, 2008. Post-traumatic stress disorder was diagnosed primarily using the Clinician-Administered Post-traumatic Stress Disorder Scale during structured interviews. Univariate and multivariate sta-tistical analyses were used to examine major risk factors that contributed to the incidence of post-traumatic stress disorder. Results revealed that the incidence of this disorder in the rescue group was 5.96%. The impact factors in univariate analysis included death of family members, contact with corpses or witnessing of the deceased or seriously injured, near-death experience, severe injury or mental trauma in the rescue process and working at the epicenter of the earthquake. Correlation analysis suggested that post-traumatic stress disorder was positively correlated with psychotic and neurotic personalities, negative coping and low social support. Impact factors in mul-tivariate logistic regression analysis included near-death experience, severe injury or mental trauma, working in the epicenter of the rescue, neurotic personality, negative coping and low social support, among which low social support had the largest odds ratio of 20.42. Findings showed that the oc-currence of post-traumatic stress disorder was the result of the interaction of multiple factors.展开更多
The cross-section profile is a key signal for evaluating hot-rolled strip quality,and ignoring its defects can easily lead to a final failure.The characteristics of complex curve,significant irregular fluctuation and ...The cross-section profile is a key signal for evaluating hot-rolled strip quality,and ignoring its defects can easily lead to a final failure.The characteristics of complex curve,significant irregular fluctuation and imperfect sample data make it a challenge of recognizing cross-section defects,and current industrial judgment methods rely excessively on human decision making.A novel stacked denoising autoencoders(SDAE)model optimized with support vector machine(SVM)theory was proposed for the recognition of cross-section defects.Firstly,interpolation filtering and principal component analysis were employed to linearly reduce the data dimensionality of the profile curve.Secondly,the deep learning algorithm SDAE was used layer by layer for greedy unsupervised feature learning,and its final layer of back-propagation neural network was replaced by SVM for supervised learning of the final features,and the final model SDAE_SVM was obtained by further optimizing the entire network parameters via error back-propagation.Finally,the curve mirroring and combination stitching methods were used as data augmentation for the training set,which dealt with the problem of sample imbalance in the original data set,and the accuracy of cross-section defect prediction was further improved.The approach was applied in a 1780-mm hot rolling line of a steel mill to achieve the automatic diagnosis and classification of defects in cross-section profile of hot-rolled strip,which helps to reduce flatness quality concerns in downstream processes.展开更多
基金YX and RG thank Wellcome Trust for funding MetaboFlow(Grant 202952/Z/16/Z).
文摘Model validation is the most important part of building a supervised model.For building a model with good generalization performance one must have a sensible data splitting strategy,and this is crucial for model validation.In this study,we con-ducted a comparative study on various reported data splitting methods.The MixSim model was employed to generate nine simulated datasets with different probabilities of mis-classification and variable sample sizes.Then partial least squares for discriminant analysis and support vector machines for classification were applied to these datasets.Data splitting methods tested included variants of cross-validation,bootstrapping,bootstrapped Latin partition,Kennard-Stone algorithm(K-S)and sample set partitioning based on joint X-Y distances algorithm(SPXY).These methods were employed to split the data into training and validation sets.The estimated generalization performances from the validation sets were then compared with the ones obtained from the blind test sets which were generated from the same distribution but were unseen by the train-ing/validation procedure used in model construction.The results showed that the size of the data is the deciding factor for the qualities of the generalization performance estimated from the validation set.We found that there was a significant gap between the performance estimated from the validation set and the one from the test set for the all the data splitting methods employed on small datasets.Such disparity decreased when more samples were available for training/validation,and this is because the models were then moving towards approximations of the central limit theory for the simulated datasets used.We also found that having too many or too few samples in the training set had a negative effect on the estimated model performance,suggesting that it is necessary to have a good balance between the sizes of training set and validation set to have a reliable estimation of model performance.We also found that systematic sampling method such a
基金ThisstudywassupportedpartiallybytheFordFoundation (No 0 976 0 92 4)
文摘OBJECTIVE: To provide insight into the psychosocial factors underlying the utilisation of health services by women with reproductive tract infection (RTI) symptoms. METHODS: A cross-sectional study, adopting Aday and Andersen' s Social Behaviour Model, was conducted between 1998 and 1999 in Chinese Hebei province and Beijing. A total of 864 eligible married women (age 21 to 60 years) were face to face interviewed. RESULTS: The percentage of self-reported symptoms of RTIs in urban and rural women was 35.6 and 46.8, respectively; the proportion of women with RTIs who utilised health services was 27.5% and 26.7%, respectively. Compared to urban women, rural women had less knowledge on RTIs and more traditional beliefs, and were more satisfied with local health services. The results of logistic regression analysis showed that the common factor influencing health service utilisation in women with RTIs was current experience of RTIs. Knowledge about self-medication, perceived social stigma attached to RTIs, prior experience of RTIs, family income and perceived severity of RTIs were also predictors of utilisation of health services in rural women with RTIs. Satisfaction with health providers, information received from health providers, prior experience of RTIs, occupation and medical care coverage were predictors of utilisation of health services in urban women with RTIs. CONCLUSION: The prevalence of RTIs is high, but the rate of seeking health services is low. There is a great need for emphasizing culturally acceptable reproductive health education in different places to improve women' s ability for self-care. Regular medical check-ups for women are also important. It is necessary to improve the quality of health service, complete the reform of health insurance and alleviate women' s social stigma related to RTIs, giving women social and moral support.
基金supported by the Chinese Police OfficeSichuan Police OfficeYunnan Police Office
文摘Previous studies have suggested that the incidence of post-traumatic stress disorder in earthquake rescue workers is relatively high. Risk factors for this disorder include demographic characteristics, earthquake-related high-risk factors, risk factors in the rescue process, personality, social support and coping style. This study examined the current status of a unit of 1 040 rescue workers who participated in earthquake relief for the Wenchuan earthquake that occurred on May 12th, 2008. Post-traumatic stress disorder was diagnosed primarily using the Clinician-Administered Post-traumatic Stress Disorder Scale during structured interviews. Univariate and multivariate sta-tistical analyses were used to examine major risk factors that contributed to the incidence of post-traumatic stress disorder. Results revealed that the incidence of this disorder in the rescue group was 5.96%. The impact factors in univariate analysis included death of family members, contact with corpses or witnessing of the deceased or seriously injured, near-death experience, severe injury or mental trauma in the rescue process and working at the epicenter of the earthquake. Correlation analysis suggested that post-traumatic stress disorder was positively correlated with psychotic and neurotic personalities, negative coping and low social support. Impact factors in mul-tivariate logistic regression analysis included near-death experience, severe injury or mental trauma, working in the epicenter of the rescue, neurotic personality, negative coping and low social support, among which low social support had the largest odds ratio of 20.42. Findings showed that the oc-currence of post-traumatic stress disorder was the result of the interaction of multiple factors.
基金supported by the National Natural Science Foundation of China(No.52004029)the Joint Doctoral Program of China Scholarship Council(CSC)(202006460073)Liuzhou Science and Technology Plan Project,China(2021AAD0102).
文摘The cross-section profile is a key signal for evaluating hot-rolled strip quality,and ignoring its defects can easily lead to a final failure.The characteristics of complex curve,significant irregular fluctuation and imperfect sample data make it a challenge of recognizing cross-section defects,and current industrial judgment methods rely excessively on human decision making.A novel stacked denoising autoencoders(SDAE)model optimized with support vector machine(SVM)theory was proposed for the recognition of cross-section defects.Firstly,interpolation filtering and principal component analysis were employed to linearly reduce the data dimensionality of the profile curve.Secondly,the deep learning algorithm SDAE was used layer by layer for greedy unsupervised feature learning,and its final layer of back-propagation neural network was replaced by SVM for supervised learning of the final features,and the final model SDAE_SVM was obtained by further optimizing the entire network parameters via error back-propagation.Finally,the curve mirroring and combination stitching methods were used as data augmentation for the training set,which dealt with the problem of sample imbalance in the original data set,and the accuracy of cross-section defect prediction was further improved.The approach was applied in a 1780-mm hot rolling line of a steel mill to achieve the automatic diagnosis and classification of defects in cross-section profile of hot-rolled strip,which helps to reduce flatness quality concerns in downstream processes.