A Recommender System(RS)is a crucial part of several firms,particularly those involved in e-commerce.In conventional RS,a user may only offer a single rating for an item-that is insufficient to perceive consumer prefe...A Recommender System(RS)is a crucial part of several firms,particularly those involved in e-commerce.In conventional RS,a user may only offer a single rating for an item-that is insufficient to perceive consumer preferences.Nowadays,businesses in industries like e-learning and tourism enable customers to rate a product using a variety of factors to comprehend customers’preferences.On the other hand,the collaborative filtering(CF)algorithm utilizing AutoEncoder(AE)is seen to be effective in identifying user-interested items.However,the cost of these computations increases nonlinearly as the number of items and users increases.To triumph over the issues,a novel expanded stacked autoencoder(ESAE)with Kernel Fuzzy C-Means Clustering(KFCM)technique is proposed with two phases.In the first phase of offline,the sparse multicriteria rating matrix is smoothened to a complete matrix by predicting the users’intact rating by the ESAE approach and users are clustered using the KFCM approach.In the next phase of online,the top-N recommendation prediction is made by the ESAE approach involving only the most similar user from multiple clusters.Hence the ESAE_KFCM model upgrades the prediction accuracy of 98.2%in Top-N recommendation with a minimized recommendation generation time.An experimental check on the Yahoo!Movies(YM)movie dataset and TripAdvisor(TA)travel dataset confirmed that the ESAE_KFCM model constantly outperforms conventional RS algorithms on a variety of assessment measures.展开更多
This research is interested in the user ratings of Apps on Apple Stores. The purpose of this research is to have a better understanding of some characteristics of the good Apps on Apple Store so Apps makers can potent...This research is interested in the user ratings of Apps on Apple Stores. The purpose of this research is to have a better understanding of some characteristics of the good Apps on Apple Store so Apps makers can potentially focus on these traits to maximize their profit. The data for this research is collected from kaggle.com, and originally collected from iTunes Search API, according to the abstract of the data. Four different attributes contribute directly toward an App’s user rating: rating_count_tot, rating_count_ver, user_rating and user_rating_ver. The relationship between Apps receiving higher ratings and Apps receiving lower ratings is analyzed using Exploratory Data Analysis and Data Science technique “clustering” on their numerical attributes. Apps, which are represented as a data point, with similar characteristics in rating are classified as belonging to the same cluster, while common characteristics of all Apps in the same clusters are the determining traits of Apps for that cluster. Both techniques are achieved using Google Colab and libraries including pandas, numpy, seaborn, and matplotlib. The data reveals direct correlation from number of devices supported and languages supported to user rating and inverse correlation from size and price of the App to user rating. In conclusion, free small Apps that many different types of users are able to use are generally well rated by most users, according to the data.展开更多
Many energy performance analysis methodologies assign buildings a descriptive label that represents their main activity,often known as the primary space usage(PSU).This attribute comes from the intent of the design te...Many energy performance analysis methodologies assign buildings a descriptive label that represents their main activity,often known as the primary space usage(PSU).This attribute comes from the intent of the design team based on assumptions of how the majority of the spaces in the building will be used.In reality,the way a building’s occupants use the spaces can be different than what was intended.With the recent growth of hourly electricity meter data from the built environment,there is the opportunity to create unsupervised methods to analyze electricity consumption behavior to understand whether the PSU assigned is accurate.Misclassification or oversimplification of the use of the building is possible using these labels when applied to simulation inputs or benchmarking processes.To work towards accurate characterization of a building’s utilization,we propose a modular methodology for identifying potentially mislabeled buildings using distance-based clustering analysis based on hourly electricity consumption data.This method seeks to segment buildings according to their daily behavior and predict which ones are misfits according to their assigned PSU label.This process finds potentially uncharacteristic behavior that could be an indication of mixed-use or a misclassified PSU.Our results on two public data sets,from the Building Data Genome(BDG)Project and Washington DC(DGS),with 507 and 322 buildings respectively,show that 26%and 33%of these buildings are potentially mislabelled based on their load shape behavior.Such information provides a more realistic insight into their true consumption characteristics,enabling more accurate simulation scenarios.Applications of this process and a discussion of limitations and reproducibility are included.展开更多
文摘A Recommender System(RS)is a crucial part of several firms,particularly those involved in e-commerce.In conventional RS,a user may only offer a single rating for an item-that is insufficient to perceive consumer preferences.Nowadays,businesses in industries like e-learning and tourism enable customers to rate a product using a variety of factors to comprehend customers’preferences.On the other hand,the collaborative filtering(CF)algorithm utilizing AutoEncoder(AE)is seen to be effective in identifying user-interested items.However,the cost of these computations increases nonlinearly as the number of items and users increases.To triumph over the issues,a novel expanded stacked autoencoder(ESAE)with Kernel Fuzzy C-Means Clustering(KFCM)technique is proposed with two phases.In the first phase of offline,the sparse multicriteria rating matrix is smoothened to a complete matrix by predicting the users’intact rating by the ESAE approach and users are clustered using the KFCM approach.In the next phase of online,the top-N recommendation prediction is made by the ESAE approach involving only the most similar user from multiple clusters.Hence the ESAE_KFCM model upgrades the prediction accuracy of 98.2%in Top-N recommendation with a minimized recommendation generation time.An experimental check on the Yahoo!Movies(YM)movie dataset and TripAdvisor(TA)travel dataset confirmed that the ESAE_KFCM model constantly outperforms conventional RS algorithms on a variety of assessment measures.
文摘This research is interested in the user ratings of Apps on Apple Stores. The purpose of this research is to have a better understanding of some characteristics of the good Apps on Apple Store so Apps makers can potentially focus on these traits to maximize their profit. The data for this research is collected from kaggle.com, and originally collected from iTunes Search API, according to the abstract of the data. Four different attributes contribute directly toward an App’s user rating: rating_count_tot, rating_count_ver, user_rating and user_rating_ver. The relationship between Apps receiving higher ratings and Apps receiving lower ratings is analyzed using Exploratory Data Analysis and Data Science technique “clustering” on their numerical attributes. Apps, which are represented as a data point, with similar characteristics in rating are classified as belonging to the same cluster, while common characteristics of all Apps in the same clusters are the determining traits of Apps for that cluster. Both techniques are achieved using Google Colab and libraries including pandas, numpy, seaborn, and matplotlib. The data reveals direct correlation from number of devices supported and languages supported to user rating and inverse correlation from size and price of the App to user rating. In conclusion, free small Apps that many different types of users are able to use are generally well rated by most users, according to the data.
基金The Ministry of Education(MOE)of the Republic of Singapore(R296000181133)and the National University of Singapore(R296000158646)provided support for the development and implementation of this researchThis research was also supported by the Republic of Singapore’s National Research Foundation(NRF)through a grant to the Berkeley Education Alliance for Research in Singapore(BEARS)for the Singapore-Berkeley Building Efficiency and Sustainability in the Tropics 2(SinBerBEST2)Program.
文摘Many energy performance analysis methodologies assign buildings a descriptive label that represents their main activity,often known as the primary space usage(PSU).This attribute comes from the intent of the design team based on assumptions of how the majority of the spaces in the building will be used.In reality,the way a building’s occupants use the spaces can be different than what was intended.With the recent growth of hourly electricity meter data from the built environment,there is the opportunity to create unsupervised methods to analyze electricity consumption behavior to understand whether the PSU assigned is accurate.Misclassification or oversimplification of the use of the building is possible using these labels when applied to simulation inputs or benchmarking processes.To work towards accurate characterization of a building’s utilization,we propose a modular methodology for identifying potentially mislabeled buildings using distance-based clustering analysis based on hourly electricity consumption data.This method seeks to segment buildings according to their daily behavior and predict which ones are misfits according to their assigned PSU label.This process finds potentially uncharacteristic behavior that could be an indication of mixed-use or a misclassified PSU.Our results on two public data sets,from the Building Data Genome(BDG)Project and Washington DC(DGS),with 507 and 322 buildings respectively,show that 26%and 33%of these buildings are potentially mislabelled based on their load shape behavior.Such information provides a more realistic insight into their true consumption characteristics,enabling more accurate simulation scenarios.Applications of this process and a discussion of limitations and reproducibility are included.