Machine learning has been extensively applied in behavioural and social computing,encompassing a spectrum of applications such as social network analysis,click stream analysis,recommendation of points of interest,and ...Machine learning has been extensively applied in behavioural and social computing,encompassing a spectrum of applications such as social network analysis,click stream analysis,recommendation of points of interest,and sentiment analysis.The datasets pertinent to these applications are inherently linked to human behaviour and societal dynamics,posing a risk of disclosing personal or sensitive information if mishandled or subjected to attacks.展开更多
The rise of online-to-offline(O2O)e-commerce business has brought tremendous opportunities to the logistics industry.In the online-to-offline logistics business,it is essential to detect anomaly merchants with fraudul...The rise of online-to-offline(O2O)e-commerce business has brought tremendous opportunities to the logistics industry.In the online-to-offline logistics business,it is essential to detect anomaly merchants with fraudulent shipping behaviours,such as sending other merchants'packages for profit with their low discounts.This can help reduce the financial losses of platforms and ensure a healthy environment.Existing anomaly detection studies have mainly focused on online fraud behaviour detection,such as fraudulent purchase and comment behaviours in e-commerce.However,these methods are not suitable for anomaly merchant detection in logistics due to the more complex online and offline operation of package-sending behaviours and the interpretable requirements of offline deployment in logistics.MultiDet,a semi-supervised multiview fusion-based Anomaly Detection framework in online-to-offline logistics is proposed,which consists of a basic version SemiDet and an attention-enhanced multi-view fusion model.In SemiDet,pair-wise data augmentation is first conducted to promote model robustness and address the challenge of limited labelled anomaly instances.Then,SemiDet calculates the anomaly scoring of each merchant with an auto-encoder framework.Considering the multi-relationships among logistics merchants,a multi-view attention fusion-based anomaly detection network is further designed to capture merchants'mutual influences and improve the anomaly merchant detection performance.A post-hoc perturbation-based interpretation model is designed to output the importance of different views and ensure the trustworthiness of end-to-end anomaly detection.The framework based on an eight-month real-world dataset collected from one of the largest logistics platforms in China is evaluated,involving 6128 merchants and 16 million historical order consignor records in Beijing.Experimental results show that the proposed model outperforms other baselines in both AUC-ROC and AUC-PR metrics.展开更多
Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning.This task is very complex,as the behaviour of road agents depends on many factors and the number of po...Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning.This task is very complex,as the behaviour of road agents depends on many factors and the number of possible future trajectories can be consid-erable(multi-modal).Most prior approaches proposed to address multi-modal motion prediction are based on complex machine learning systems that have limited interpret-ability.Moreover,the metrics used in current benchmarks do not evaluate all aspects of the problem,such as the diversity and admissibility of the output.The authors aim to advance towards the design of trustworthy motion prediction systems,based on some of the re-quirements for the design of Trustworthy Artificial Intelligence.The focus is on evaluation criteria,robustness,and interpretability of outputs.First,the evaluation metrics are comprehensively analysed,the main gaps of current benchmarks are identified,and a new holistic evaluation framework is proposed.Then,a method for the assessment of spatial and temporal robustness is introduced by simulating noise in the perception system.To enhance the interpretability of the outputs and generate more balanced results in the proposed evaluation framework,an intent prediction layer that can be attached to multi-modal motion prediction models is proposed.The effectiveness of this approach is assessed through a survey that explores different elements in the visualisation of the multi-modal trajectories and intentions.The proposed approach and findings make a significant contribution to the development of trustworthy motion prediction systems for autono-mous vehicles,advancing the field towards greater safety and reliability.展开更多
As the scale of federated learning expands,solving the Non-IID data problem of federated learning has become a key challenge of interest.Most existing solutions generally aim to solve the overall performance improveme...As the scale of federated learning expands,solving the Non-IID data problem of federated learning has become a key challenge of interest.Most existing solutions generally aim to solve the overall performance improvement of all clients;however,the overall performance improvement often sacrifices the performance of certain clients,such as clients with less data.Ignoring fairness may greatly reduce the willingness of some clients to participate in federated learning.In order to solve the above problem,the authors propose Ada-FFL,an adaptive fairness federated aggregation learning algorithm,which can dynamically adjust the fairness coefficient according to the update of the local models,ensuring the convergence performance of the global model and the fairness between federated learning clients.By integrating coarse-grained and fine-grained equity solutions,the authors evaluate the deviation of local models by considering both global equity and individual equity,then the weight ratio will be dynamically allocated for each client based on the evaluated deviation value,which can ensure that the update differences of local models are fully considered in each round of training.Finally,by combining a regularisation term to limit the local model update to be closer to the global model,the sensitivity of the model to input perturbations can be reduced,and the generalisation ability of the global model can be improved.Through numerous experiments on several federal data sets,the authors show that our method has more advantages in convergence effect and fairness than the existing baselines.展开更多
With the introduction of more recent deep learning models such as encoder-decoder,text generation frameworks have gained a lot of popularity.In Natural Language Generation(NLG),controlling the information and style of...With the introduction of more recent deep learning models such as encoder-decoder,text generation frameworks have gained a lot of popularity.In Natural Language Generation(NLG),controlling the information and style of the output produced is a crucial and challenging task.The purpose of this paper is to develop informative and controllable text using social media language by incorporating topic knowledge into a keyword-to-text framework.A novel Topic-Controllable Key-to-Text(TC-K2T)generator that focuses on the issues of ignoring unordered keywords and utilising subject-controlled information from previous research is presented.TC-K2T is built on the framework of conditional language encoders.In order to guide the model to produce an informative and controllable language,the generator first inputs unordered keywords and uses subjects to simulate prior human knowledge.Using an additional probability term,the model in-creases the likelihood of topic words appearing in the generated text to bias the overall distribution.The proposed TC-K2T can produce more informative and controllable senescence,outperforming state-of-the-art models,according to empirical research on automatic evaluation metrics and human annotations.展开更多
The epidemic characters of Omicron(e.g.large-scale transmission)are significantly different from the initial variants of COVID-19.The data generated by large-scale transmission is important to predict the trend of epi...The epidemic characters of Omicron(e.g.large-scale transmission)are significantly different from the initial variants of COVID-19.The data generated by large-scale transmission is important to predict the trend of epidemic characters.However,the re-sults of current prediction models are inaccurate since they are not closely combined with the actual situation of Omicron transmission.In consequence,these inaccurate results have negative impacts on the process of the manufacturing and the service industry,for example,the production of masks and the recovery of the tourism industry.The authors have studied the epidemic characters in two ways,that is,investigation and prediction.First,a large amount of data is collected by utilising the Baidu index and conduct questionnaire survey concerning epidemic characters.Second,theβ-SEIDR model is established,where the population is classified as Susceptible,Exposed,Infected,Dead andβ-Recovered persons,to intelligently predict the epidemic characters of COVID-19.Note thatβ-Recovered persons denote that the Recovered persons may become Sus-ceptible persons with probabilityβ.The simulation results show that the model can accurately predict the epidemic characters.展开更多
As some recent information security legislation endowed users with unconditional rights to be forgotten by any trained machine learning model,personalised IoT service pro-viders have to put unlearning functionality in...As some recent information security legislation endowed users with unconditional rights to be forgotten by any trained machine learning model,personalised IoT service pro-viders have to put unlearning functionality into their consideration.The most straight-forward method to unlearn users'contribution is to retrain the model from the initial state,which is not realistic in high throughput applications with frequent unlearning requests.Though some machine unlearning frameworks have been proposed to speed up the retraining process,they fail to match decentralised learning scenarios.A decentralised unlearning framework called heterogeneous decentralised unlearning framework with seed(HDUS)is designed,which uses distilled seed models to construct erasable en-sembles for all clients.Moreover,the framework is compatible with heterogeneous on-device models,representing stronger scalability in real-world applications.Extensive experiments on three real-world datasets show that our HDUS achieves state-of-the-art performance.展开更多
Adversarial attacks have been posing significant security concerns to intelligent systems,such as speaker recognition systems(SRSs).Most attacks assume the neural networks in the systems are known beforehand,while bla...Adversarial attacks have been posing significant security concerns to intelligent systems,such as speaker recognition systems(SRSs).Most attacks assume the neural networks in the systems are known beforehand,while black-box attacks are proposed without such information to meet practical situations.Existing black-box attacks improve trans-ferability by integrating multiple models or training on multiple datasets,but these methods are costly.Motivated by the optimisation strategy with spatial information on the perturbed paths and samples,we propose a Dual Spatial Momentum Iterative Fast Gradient Sign Method(DS-MI-FGSM)to improve the transferability of black-box at-tacks against SRSs.Specifically,DS-MI-FGSM only needs a single data and one model as the input;by extending to the data and model neighbouring spaces,it generates adver-sarial examples against the integrating models.To reduce the risk of overfitting,DS-MI-FGSM also introduces gradient masking to improve transferability.The authors conduct extensive experiments regarding the speaker recognition task,and the results demonstrate the effectiveness of their method,which can achieve up to 92%attack success rate on the victim model in black-box scenarios with only one known model.展开更多
Underwater images are often with biased colours and reduced contrast because of the absorption and scattering effects when light propagates in water.Such images with degradation cannot meet the needs of underwater ope...Underwater images are often with biased colours and reduced contrast because of the absorption and scattering effects when light propagates in water.Such images with degradation cannot meet the needs of underwater operations.The main problem in classic underwater image restoration or enhancement methods is that they consume long calcu-lation time,and often,the colour or contrast of the result images is still unsatisfied.Instead of using the complicated physical model of underwater imaging degradation,we propose a new method to deal with underwater images by imitating the colour constancy mechanism of human vision using double-opponency.Firstly,the original image is converted to the LMS space.Then the signals are linearly combined,and Gaussian convolutions are per-formed to imitate the function of receptive fields(RFs).Next,two RFs with different sizes work together to constitute the double-opponency response.Finally,the underwater light is estimated to correct the colours in the image.Further contrast stretching on the luminance is optional.Experiments show that the proposed method can obtain clarified underwater images with higher quality than before,and it spends significantly less time cost compared to other previously published typical methods.展开更多
In the intricate network environment,the secure transmission of medical images faces challenges such as information leakage and malicious tampering,significantly impacting the accuracy of disease diagnoses by medical ...In the intricate network environment,the secure transmission of medical images faces challenges such as information leakage and malicious tampering,significantly impacting the accuracy of disease diagnoses by medical professionals.To address this problem,the authors propose a robust feature watermarking algorithm for encrypted medical images based on multi-stage discrete wavelet transform(DWT),Daisy descriptor,and discrete cosine transform(DCT).The algorithm initially encrypts the original medical image through DWT-DCT and Logistic mapping.Subsequently,a 3-stage DWT transformation is applied to the encrypted medical image,with the centre point of the LL3 sub-band within its low-frequency component serving as the sampling point.The Daisy descriptor matrix for this point is then computed.Finally,a DCT transformation is performed on the Daisy descriptor matrix,and the low-frequency portion is processed using the perceptual hashing algorithm to generate a 32-bit binary feature vector for the medical image.This scheme utilises cryptographic knowledge and zero-watermarking technique to embed watermarks without modifying medical images and can extract the watermark from test images without the original image,which meets the basic re-quirements of medical image watermarking.The embedding and extraction of water-marks are accomplished in a mere 0.160 and 0.411s,respectively,with minimal computational overhead.Simulation results demonstrate the robustness of the algorithm against both conventional attacks and geometric attacks,with a notable performance in resisting rotation attacks.展开更多
How to represent a human face pattern?While it is presented in a continuous way in human visual system,computers often store and process it in a discrete manner with 2D arrays of pixels.The authors attempt to learn a ...How to represent a human face pattern?While it is presented in a continuous way in human visual system,computers often store and process it in a discrete manner with 2D arrays of pixels.The authors attempt to learn a continuous surface representation for face image with explicit function.First,an explicit model(EmFace)for human face representation is pro-posed in the form of a finite sum of mathematical terms,where each term is an analytic function element.Further,to estimate the unknown parameters of EmFace,a novel neural network,EmNet,is designed with an encoder-decoder structure and trained from massive face images,where the encoder is defined by a deep convolutional neural network and the decoder is an explicit mathematical expression of EmFace.The authors demonstrate that our EmFace represents face image more accurate than the comparison method,with an average mean square error of 0.000888,0.000936,0.000953 on LFW,IARPA Janus Benchmark-B,and IJB-C datasets.Visualisation results show that,EmFace has a higher representation performance on faces with various expressions,postures,and other factors.Furthermore,EmFace achieves reasonable performance on several face image processing tasks,including face image restoration,denoising,and transformation.展开更多
A novel method based on the cross-modality intersecting features of the palm-vein and the palmprint is proposed for identity verification.Capitalising on the unique geometrical relationship between the two biometric m...A novel method based on the cross-modality intersecting features of the palm-vein and the palmprint is proposed for identity verification.Capitalising on the unique geometrical relationship between the two biometric modalities,the cross-modality intersecting points provides a stable set of features for identity verification.To facilitate flexibility in template changes,a template transformation is proposed.While maintaining non-invertibility,the template transformation allows transformation sizes beyond that offered by the con-ventional means.Extensive experiments using three public palm databases are conducted to verify the effectiveness the proposed system for identity recognition.展开更多
Since the fully convolutional network has achieved great success in semantic segmentation,lots of works have been proposed to extract discriminative pixel representations.However,the authors observe that existing meth...Since the fully convolutional network has achieved great success in semantic segmentation,lots of works have been proposed to extract discriminative pixel representations.However,the authors observe that existing methods still suffer from two typical challenges:(i)The intra-class feature variation between different scenes may be large,leading to the difficulty in maintaining the consistency between same-class pixels from different scenes;(ii)The inter-class feature distinction in the same scene could be small,resulting in the limited performance to distinguish different classes in each scene.The authors first rethink se-mantic segmentation from a perspective of similarity between pixels and class centers.Each weight vector of the segmentation head represents its corresponding semantic class in the whole dataset,which can be regarded as the embedding of the class center.Thus,the pixel-wise classification amounts to computing similarity in the final feature space between pixels and the class centers.Under this novel view,the authors propose a Class Center Similarity(CCS)layer to address the above-mentioned challenges by generating adaptive class centers conditioned on each scenes and supervising the similarities between class centers.The CCS layer utilises the Adaptive Class Center Module to generate class centers conditioned on each scene,which adapt the large intra-class variation between different scenes.Specially designed Class Distance Loss(CD Loss)is introduced to control both inter-class and intra-class distances based on the predicted center-to-center and pixel-to-center similarity.Finally,the CCS layer outputs the processed pixel-to-center similarity as the segmentation prediction.Extensive experiments demonstrate that our model performs favourably against the state-of-the-art methods.展开更多
In recent times,an image enhancement approach,which learns the global transformation function using deep neural networks,has gained attention.However,many existing methods based on this approach have a limitation:thei...In recent times,an image enhancement approach,which learns the global transformation function using deep neural networks,has gained attention.However,many existing methods based on this approach have a limitation:their transformation functions are too simple to imitate complex colour transformations between low-quality images and manually retouched high-quality images.In order to address this limitation,a simple yet effective approach for image enhancement is proposed.The proposed algorithm based on the channel-wise intensity transformation is designed.However,this transformation is applied to the learnt embedding space instead of specific colour spaces and then return enhanced features to colours.To this end,the authors define the continuous intensity transformation(CIT)to describe the mapping between input and output intensities on the embedding space.Then,the enhancement network is developed,which produces multi-scale feature maps from input images,derives the set of transformation functions,and performs the CIT to obtain enhanced images.Extensive experiments on the MIT-Adobe 5K dataset demonstrate that the authors’approach improves the performance of conventional intensity transforms on colour space metrics.Specifically,the authors achieved a 3.8%improvement in peak signal-to-noise ratio,a 1.8%improvement in structual similarity index measure,and a 27.5%improvement in learned perceptual image patch similarity.Also,the authors’algorithm outperforms state-of-the-art alternatives on three image enhancement datasets:MIT-Adobe 5K,Low-Light,and Google HDRþ.展开更多
Mapping in the dynamic environment is an important task for autonomous mobile robots due to the unavoidable changes in the workspace. In this paper, we propose a framework for RGBD SLAM in low dynamic environment, whi...Mapping in the dynamic environment is an important task for autonomous mobile robots due to the unavoidable changes in the workspace. In this paper, we propose a framework for RGBD SLAM in low dynamic environment, which can maintain a map keeping track of the latest environment. The main model describing the environment is a multi-session pose graph, which evolves over the multiple visits of the robot. The poses in the graph will be pruned when the 3D point scans corresponding to those poses are out of date. When the robot explores the new areas, its poses will be added to the graph. Thus the scans kept in the current graph will always give a map of the latest environment. The changes of the environment are detected by out-of-dated scans identification module through analyzing scans collected at different sessions. Besides, a redundant scans identification module is employed to further reduce the poses with redundant scans in order to keep the total number of poses in the graph with respect to the size of environment. In the experiments, the framework is first tuned and tested on data acquired by a Kinect from laboratory environment. Then the framework is applied to external dataset acquired by a Kinect II from a workspace of an industrial robot in another country, which is blind to the development phase, for further validation of the performance. After this two-step evaluation, the proposed framework is considered to be able to manage the map in date in dynamic or static environment with a noncumulative complexity and acceptable error level.展开更多
Sparse representation is an effective data classification algorithm that depends on the known training samples to categorise the test sample.It has been widely used in various image classification tasks.Sparseness in ...Sparse representation is an effective data classification algorithm that depends on the known training samples to categorise the test sample.It has been widely used in various image classification tasks.Sparseness in sparse representation means that only a few of instances selected from all training samples can effectively convey the essential class-specific information of the test sample,which is very important for classification.For deformable images such as human faces,pixels at the same location of different images of the same subject usually have different intensities.Therefore,extracting features and correctly classifying such deformable objects is very hard.Moreover,the lighting,attitude and occlusion cause more difficulty.Considering the problems and challenges listed above,a novel image representation and classification algorithm is proposed.First,the authors’algorithm generates virtual samples by a non-linear variation method.This method can effectively extract the low-frequency information of space-domain features of the original image,which is very useful for representing deformable objects.The combination of the original and virtual samples is more beneficial to improve the clas-sification performance and robustness of the algorithm.Thereby,the authors’algorithm calculates the expression coefficients of the original and virtual samples separately using the sparse representation principle and obtains the final score by a designed efficient score fusion scheme.The weighting coefficients in the score fusion scheme are set entirely automatically.Finally,the algorithm classifies the samples based on the final scores.The experimental results show that our method performs better classification than conventional sparse representation algorithms.展开更多
In some complicated tabletop object manipulation task for robotic system, demonstration based control is an efficient way to enhance the stability of execution. In this paper, we use a new optical hand tracking sensor...In some complicated tabletop object manipulation task for robotic system, demonstration based control is an efficient way to enhance the stability of execution. In this paper, we use a new optical hand tracking sensor, LeapMotion, to perform a non-contact demonstration for robotic systems. A Multi-LeapMotion hand tracking system is developed. The setup of the two sensors is analyzed to gain a optimal way for efficiently use the informations from the two sensors. Meanwhile, the coordinate systems of the Mult-LeapMotion hand tracking device and the robotic demonstration system are developed. With the recognition to the element actions and the delay calibration, the fusion principles are developed to get the improved and corrected gesture recognition. The gesture recognition and scenario experiments are carried out, and indicate the improvement of the proposed Multi-LeapMotion hand tracking system in tabletop object manipulation task for robotic demonstration.展开更多
By automatically learning the priors embedded in images with powerful modelling ca-pabilities,deep learning-based algorithms have recently made considerable progress in reconstructing the high-resolution hyperspectral...By automatically learning the priors embedded in images with powerful modelling ca-pabilities,deep learning-based algorithms have recently made considerable progress in reconstructing the high-resolution hyperspectral(HR-HS)image.With previously collected large-amount of external data,these methods are intuitively realised under the full supervision of the ground-truth data.Thus,the database construction in merging the low-resolution(LR)HS(LR-HS)and HR multispectral(MS)or RGB image research paradigm,commonly named as HSI SR,requires collecting corresponding training triplets:HR-MS(RGB),LR-HS and HR-HS image simultaneously,and often faces dif-ficulties in reality.The learned models with the training datasets collected simultaneously under controlled conditions may significantly degrade the HSI super-resolved perfor-mance to the real images captured under diverse environments.To handle the above-mentioned limitations,the authors propose to leverage the deep internal and self-supervised learning to solve the HSI SR problem.The authors advocate that it is possible to train a specific CNN model at test time,called as deep internal learning(DIL),by on-line preparing the training triplet samples from the observed LR-HS/HR-MS(or RGB)images and the down-sampled LR-HS version.However,the number of the training triplets extracted solely from the transformed data of the observation itself is extremely few particularly for the HSI SR tasks with large spatial upscale factors,which would result in limited reconstruction performance.To solve this problem,the authors further exploit deep self-supervised learning(DSL)by considering the observations as the unlabelled training samples.Specifically,the degradation modules inside the network were elaborated to realise the spatial and spectral down-sampling procedures for transforming the generated HR-HS estimation to the high-resolution RGB/LR-HS approximation,and then the reconstruction errors of the observations were formulated for measuring the network modelling performance.By cons展开更多
In content-based image retrieval(CBIR),primitive image signatures are critical because they represent the visual characteristics.Image signatures,which are algorithmically descriptive and accurately recognized visual ...In content-based image retrieval(CBIR),primitive image signatures are critical because they represent the visual characteristics.Image signatures,which are algorithmically descriptive and accurately recognized visual components,are used to appropriately index and retrieve comparable results.To differentiate an image in the category of qualifying contender,feature vectors must have image information's like colour,objects,shape,spatial viewpoints.Previous methods such as sketch-based image retrieval by salient contour(SBIR)and greedy learning of deep Boltzmann machine(GDBM)used spatial information to distinguish between image categories.This requires interest points and also feature analysis emerged image detection problems.Thus,a proposed model to overcome this issue and predict the repeating pattern as well as series of pixels that conclude similarity has been necessary.In this study,a technique called CBIR-similarity measure via artificial neural network interpolation(CBIR-SMANN)has been presented.By collecting datasets,the images are resized then subject to Gaussian filtering in the pre-processing stage,then by permitting them to the Hessian detector,the interesting points are gathered.Based on Skewness,mean,kurtosis and standard deviation features were extracted then given to ANN for interpolation.Interpolated results are stored in a database for retrieval.In the testing stage,the query image was inputted that is subjected to pre-processing,and feature extraction was then fed to the similarity measurement function.Thus,ANN helps to get similar images from the database.CBIR-SMANN have been implemented in the python tool and then evaluated for its performance.Results show that CBIR-SMANN exhibited a high recall value of 78%with a minimum retrieval time of 980 ms.This showed the supremacy of the proposed model was comparatively greater than the previous ones.展开更多
Current Chinese event detection methods commonly use word embedding to capture semantic representation,but these methods find it difficult to capture the dependence relationship between the trigger words and other wor...Current Chinese event detection methods commonly use word embedding to capture semantic representation,but these methods find it difficult to capture the dependence relationship between the trigger words and other words in the same sentence.Based on the simple evaluation,it is known that a dependency parser can effectively capture dependency relationships and improve the accuracy of event categorisation.This study proposes a novel architecture that models a hybrid representation to summarise semantic and structural information from both characters and words.This model can capture rich semantic features for the event detection task by incorporating the semantic representation generated from the dependency parser.The authors evaluate different models on kbp 2017 corpus.The experimental results show that the proposed method can significantly improve performance in Chinese event detection.展开更多
文摘Machine learning has been extensively applied in behavioural and social computing,encompassing a spectrum of applications such as social network analysis,click stream analysis,recommendation of points of interest,and sentiment analysis.The datasets pertinent to these applications are inherently linked to human behaviour and societal dynamics,posing a risk of disclosing personal or sensitive information if mishandled or subjected to attacks.
基金Major Project of Fundamental Research on Frontier Leading Technology of Jiangsu Province,Grant/Award Number:BK20222006Fundamental Research Funds for the Central Universities,Grant/Award Number:CUPL 20ZFG79001。
文摘The rise of online-to-offline(O2O)e-commerce business has brought tremendous opportunities to the logistics industry.In the online-to-offline logistics business,it is essential to detect anomaly merchants with fraudulent shipping behaviours,such as sending other merchants'packages for profit with their low discounts.This can help reduce the financial losses of platforms and ensure a healthy environment.Existing anomaly detection studies have mainly focused on online fraud behaviour detection,such as fraudulent purchase and comment behaviours in e-commerce.However,these methods are not suitable for anomaly merchant detection in logistics due to the more complex online and offline operation of package-sending behaviours and the interpretable requirements of offline deployment in logistics.MultiDet,a semi-supervised multiview fusion-based Anomaly Detection framework in online-to-offline logistics is proposed,which consists of a basic version SemiDet and an attention-enhanced multi-view fusion model.In SemiDet,pair-wise data augmentation is first conducted to promote model robustness and address the challenge of limited labelled anomaly instances.Then,SemiDet calculates the anomaly scoring of each merchant with an auto-encoder framework.Considering the multi-relationships among logistics merchants,a multi-view attention fusion-based anomaly detection network is further designed to capture merchants'mutual influences and improve the anomaly merchant detection performance.A post-hoc perturbation-based interpretation model is designed to output the importance of different views and ensure the trustworthiness of end-to-end anomaly detection.The framework based on an eight-month real-world dataset collected from one of the largest logistics platforms in China is evaluated,involving 6128 merchants and 16 million historical order consignor records in Beijing.Experimental results show that the proposed model outperforms other baselines in both AUC-ROC and AUC-PR metrics.
基金European Commission,Joint Research Center,Grant/Award Number:HUMAINTMinisterio de Ciencia e Innovación,Grant/Award Number:PID2020‐114924RB‐I00Comunidad de Madrid,Grant/Award Number:S2018/EMT‐4362 SEGVAUTO 4.0‐CM。
文摘Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning.This task is very complex,as the behaviour of road agents depends on many factors and the number of possible future trajectories can be consid-erable(multi-modal).Most prior approaches proposed to address multi-modal motion prediction are based on complex machine learning systems that have limited interpret-ability.Moreover,the metrics used in current benchmarks do not evaluate all aspects of the problem,such as the diversity and admissibility of the output.The authors aim to advance towards the design of trustworthy motion prediction systems,based on some of the re-quirements for the design of Trustworthy Artificial Intelligence.The focus is on evaluation criteria,robustness,and interpretability of outputs.First,the evaluation metrics are comprehensively analysed,the main gaps of current benchmarks are identified,and a new holistic evaluation framework is proposed.Then,a method for the assessment of spatial and temporal robustness is introduced by simulating noise in the perception system.To enhance the interpretability of the outputs and generate more balanced results in the proposed evaluation framework,an intent prediction layer that can be attached to multi-modal motion prediction models is proposed.The effectiveness of this approach is assessed through a survey that explores different elements in the visualisation of the multi-modal trajectories and intentions.The proposed approach and findings make a significant contribution to the development of trustworthy motion prediction systems for autono-mous vehicles,advancing the field towards greater safety and reliability.
基金National Natural Science Foundation of China,Grant/Award Number:62272114Joint Research Fund of Guangzhou and University,Grant/Award Number:202201020380+3 种基金Guangdong Higher Education Innovation Group,Grant/Award Number:2020KCXTD007Pearl River Scholars Funding Program of Guangdong Universities(2019)National Key R&D Program of China,Grant/Award Number:2022ZD0119602Major Key Project of PCL,Grant/Award Number:PCL2022A03。
文摘As the scale of federated learning expands,solving the Non-IID data problem of federated learning has become a key challenge of interest.Most existing solutions generally aim to solve the overall performance improvement of all clients;however,the overall performance improvement often sacrifices the performance of certain clients,such as clients with less data.Ignoring fairness may greatly reduce the willingness of some clients to participate in federated learning.In order to solve the above problem,the authors propose Ada-FFL,an adaptive fairness federated aggregation learning algorithm,which can dynamically adjust the fairness coefficient according to the update of the local models,ensuring the convergence performance of the global model and the fairness between federated learning clients.By integrating coarse-grained and fine-grained equity solutions,the authors evaluate the deviation of local models by considering both global equity and individual equity,then the weight ratio will be dynamically allocated for each client based on the evaluated deviation value,which can ensure that the update differences of local models are fully considered in each round of training.Finally,by combining a regularisation term to limit the local model update to be closer to the global model,the sensitivity of the model to input perturbations can be reduced,and the generalisation ability of the global model can be improved.Through numerous experiments on several federal data sets,the authors show that our method has more advantages in convergence effect and fairness than the existing baselines.
基金Australian Research Council,Grant/Award Numbers:DP22010371,LE220100078。
文摘With the introduction of more recent deep learning models such as encoder-decoder,text generation frameworks have gained a lot of popularity.In Natural Language Generation(NLG),controlling the information and style of the output produced is a crucial and challenging task.The purpose of this paper is to develop informative and controllable text using social media language by incorporating topic knowledge into a keyword-to-text framework.A novel Topic-Controllable Key-to-Text(TC-K2T)generator that focuses on the issues of ignoring unordered keywords and utilising subject-controlled information from previous research is presented.TC-K2T is built on the framework of conditional language encoders.In order to guide the model to produce an informative and controllable language,the generator first inputs unordered keywords and uses subjects to simulate prior human knowledge.Using an additional probability term,the model in-creases the likelihood of topic words appearing in the generated text to bias the overall distribution.The proposed TC-K2T can produce more informative and controllable senescence,outperforming state-of-the-art models,according to empirical research on automatic evaluation metrics and human annotations.
基金Key discipline construction project for traditional Chinese Medicine in Guangdong province,Grant/Award Number:20220104The construction project of inheritance studio of national famous and old traditional Chinese Medicine experts,Grant/Award Number:140000020132。
文摘The epidemic characters of Omicron(e.g.large-scale transmission)are significantly different from the initial variants of COVID-19.The data generated by large-scale transmission is important to predict the trend of epidemic characters.However,the re-sults of current prediction models are inaccurate since they are not closely combined with the actual situation of Omicron transmission.In consequence,these inaccurate results have negative impacts on the process of the manufacturing and the service industry,for example,the production of masks and the recovery of the tourism industry.The authors have studied the epidemic characters in two ways,that is,investigation and prediction.First,a large amount of data is collected by utilising the Baidu index and conduct questionnaire survey concerning epidemic characters.Second,theβ-SEIDR model is established,where the population is classified as Susceptible,Exposed,Infected,Dead andβ-Recovered persons,to intelligently predict the epidemic characters of COVID-19.Note thatβ-Recovered persons denote that the Recovered persons may become Sus-ceptible persons with probabilityβ.The simulation results show that the model can accurately predict the epidemic characters.
基金Australian Research Council,Grant/Award Numbers:FT210100624,DP190101985,DE230101033。
文摘As some recent information security legislation endowed users with unconditional rights to be forgotten by any trained machine learning model,personalised IoT service pro-viders have to put unlearning functionality into their consideration.The most straight-forward method to unlearn users'contribution is to retrain the model from the initial state,which is not realistic in high throughput applications with frequent unlearning requests.Though some machine unlearning frameworks have been proposed to speed up the retraining process,they fail to match decentralised learning scenarios.A decentralised unlearning framework called heterogeneous decentralised unlearning framework with seed(HDUS)is designed,which uses distilled seed models to construct erasable en-sembles for all clients.Moreover,the framework is compatible with heterogeneous on-device models,representing stronger scalability in real-world applications.Extensive experiments on three real-world datasets show that our HDUS achieves state-of-the-art performance.
基金The Major Key Project of PCL,Grant/Award Number:PCL2022A03National Natural Science Foundation of China,Grant/Award Numbers:61976064,62372137Zhejiang Provincial Natural Science Foundation of China,Grant/Award Number:LZ22F020007。
文摘Adversarial attacks have been posing significant security concerns to intelligent systems,such as speaker recognition systems(SRSs).Most attacks assume the neural networks in the systems are known beforehand,while black-box attacks are proposed without such information to meet practical situations.Existing black-box attacks improve trans-ferability by integrating multiple models or training on multiple datasets,but these methods are costly.Motivated by the optimisation strategy with spatial information on the perturbed paths and samples,we propose a Dual Spatial Momentum Iterative Fast Gradient Sign Method(DS-MI-FGSM)to improve the transferability of black-box at-tacks against SRSs.Specifically,DS-MI-FGSM only needs a single data and one model as the input;by extending to the data and model neighbouring spaces,it generates adver-sarial examples against the integrating models.To reduce the risk of overfitting,DS-MI-FGSM also introduces gradient masking to improve transferability.The authors conduct extensive experiments regarding the speaker recognition task,and the results demonstrate the effectiveness of their method,which can achieve up to 92%attack success rate on the victim model in black-box scenarios with only one known model.
文摘Underwater images are often with biased colours and reduced contrast because of the absorption and scattering effects when light propagates in water.Such images with degradation cannot meet the needs of underwater operations.The main problem in classic underwater image restoration or enhancement methods is that they consume long calcu-lation time,and often,the colour or contrast of the result images is still unsatisfied.Instead of using the complicated physical model of underwater imaging degradation,we propose a new method to deal with underwater images by imitating the colour constancy mechanism of human vision using double-opponency.Firstly,the original image is converted to the LMS space.Then the signals are linearly combined,and Gaussian convolutions are per-formed to imitate the function of receptive fields(RFs).Next,two RFs with different sizes work together to constitute the double-opponency response.Finally,the underwater light is estimated to correct the colours in the image.Further contrast stretching on the luminance is optional.Experiments show that the proposed method can obtain clarified underwater images with higher quality than before,and it spends significantly less time cost compared to other previously published typical methods.
基金National Natural Science Foundation of China,Grant/Award Numbers:62063004,62350410483Key Research and Development Project of Hainan Province,Grant/Award Number:ZDYF2021SHFZ093Zhejiang Provincial Postdoctoral Science Foundation,Grant/Award Number:ZJ2021028。
文摘In the intricate network environment,the secure transmission of medical images faces challenges such as information leakage and malicious tampering,significantly impacting the accuracy of disease diagnoses by medical professionals.To address this problem,the authors propose a robust feature watermarking algorithm for encrypted medical images based on multi-stage discrete wavelet transform(DWT),Daisy descriptor,and discrete cosine transform(DCT).The algorithm initially encrypts the original medical image through DWT-DCT and Logistic mapping.Subsequently,a 3-stage DWT transformation is applied to the encrypted medical image,with the centre point of the LL3 sub-band within its low-frequency component serving as the sampling point.The Daisy descriptor matrix for this point is then computed.Finally,a DCT transformation is performed on the Daisy descriptor matrix,and the low-frequency portion is processed using the perceptual hashing algorithm to generate a 32-bit binary feature vector for the medical image.This scheme utilises cryptographic knowledge and zero-watermarking technique to embed watermarks without modifying medical images and can extract the watermark from test images without the original image,which meets the basic re-quirements of medical image watermarking.The embedding and extraction of water-marks are accomplished in a mere 0.160 and 0.411s,respectively,with minimal computational overhead.Simulation results demonstrate the robustness of the algorithm against both conventional attacks and geometric attacks,with a notable performance in resisting rotation attacks.
基金National Natural Science Foundation of China,Grant/Award Number:92370117。
文摘How to represent a human face pattern?While it is presented in a continuous way in human visual system,computers often store and process it in a discrete manner with 2D arrays of pixels.The authors attempt to learn a continuous surface representation for face image with explicit function.First,an explicit model(EmFace)for human face representation is pro-posed in the form of a finite sum of mathematical terms,where each term is an analytic function element.Further,to estimate the unknown parameters of EmFace,a novel neural network,EmNet,is designed with an encoder-decoder structure and trained from massive face images,where the encoder is defined by a deep convolutional neural network and the decoder is an explicit mathematical expression of EmFace.The authors demonstrate that our EmFace represents face image more accurate than the comparison method,with an average mean square error of 0.000888,0.000936,0.000953 on LFW,IARPA Janus Benchmark-B,and IJB-C datasets.Visualisation results show that,EmFace has a higher representation performance on faces with various expressions,postures,and other factors.Furthermore,EmFace achieves reasonable performance on several face image processing tasks,including face image restoration,denoising,and transformation.
基金National Research Foundation of Korea funded by the Ministry of Education,Science and Technology,Grant/Award Number:NRF-2021R1A2C1093425。
文摘A novel method based on the cross-modality intersecting features of the palm-vein and the palmprint is proposed for identity verification.Capitalising on the unique geometrical relationship between the two biometric modalities,the cross-modality intersecting points provides a stable set of features for identity verification.To facilitate flexibility in template changes,a template transformation is proposed.While maintaining non-invertibility,the template transformation allows transformation sizes beyond that offered by the con-ventional means.Extensive experiments using three public palm databases are conducted to verify the effectiveness the proposed system for identity recognition.
基金Hubei Provincial Natural Science Foundation of China,Grant/Award Number:2022CFA055National Natural Science Foundation of China,Grant/Award Number:62176097。
文摘Since the fully convolutional network has achieved great success in semantic segmentation,lots of works have been proposed to extract discriminative pixel representations.However,the authors observe that existing methods still suffer from two typical challenges:(i)The intra-class feature variation between different scenes may be large,leading to the difficulty in maintaining the consistency between same-class pixels from different scenes;(ii)The inter-class feature distinction in the same scene could be small,resulting in the limited performance to distinguish different classes in each scene.The authors first rethink se-mantic segmentation from a perspective of similarity between pixels and class centers.Each weight vector of the segmentation head represents its corresponding semantic class in the whole dataset,which can be regarded as the embedding of the class center.Thus,the pixel-wise classification amounts to computing similarity in the final feature space between pixels and the class centers.Under this novel view,the authors propose a Class Center Similarity(CCS)layer to address the above-mentioned challenges by generating adaptive class centers conditioned on each scenes and supervising the similarities between class centers.The CCS layer utilises the Adaptive Class Center Module to generate class centers conditioned on each scene,which adapt the large intra-class variation between different scenes.Specially designed Class Distance Loss(CD Loss)is introduced to control both inter-class and intra-class distances based on the predicted center-to-center and pixel-to-center similarity.Finally,the CCS layer outputs the processed pixel-to-center similarity as the segmentation prediction.Extensive experiments demonstrate that our model performs favourably against the state-of-the-art methods.
基金National Research Foundation of Korea,Grant/Award Numbers:2022R1I1A3069113,RS-2023-00221365Electronics and Telecommunications Research Institute,Grant/Award Number:2014-3-00123。
文摘In recent times,an image enhancement approach,which learns the global transformation function using deep neural networks,has gained attention.However,many existing methods based on this approach have a limitation:their transformation functions are too simple to imitate complex colour transformations between low-quality images and manually retouched high-quality images.In order to address this limitation,a simple yet effective approach for image enhancement is proposed.The proposed algorithm based on the channel-wise intensity transformation is designed.However,this transformation is applied to the learnt embedding space instead of specific colour spaces and then return enhanced features to colours.To this end,the authors define the continuous intensity transformation(CIT)to describe the mapping between input and output intensities on the embedding space.Then,the enhancement network is developed,which produces multi-scale feature maps from input images,derives the set of transformation functions,and performs the CIT to obtain enhanced images.Extensive experiments on the MIT-Adobe 5K dataset demonstrate that the authors’approach improves the performance of conventional intensity transforms on colour space metrics.Specifically,the authors achieved a 3.8%improvement in peak signal-to-noise ratio,a 1.8%improvement in structual similarity index measure,and a 27.5%improvement in learned perceptual image patch similarity.Also,the authors’algorithm outperforms state-of-the-art alternatives on three image enhancement datasets:MIT-Adobe 5K,Low-Light,and Google HDRþ.
基金This work is supported by the National Natural Science Foundation of China (Grant No. NSFC: 61473258, U 1509210), and the Joint Centre for Robotics Research (JCRR) between Zhejiang University and the University of Technology, Sydney.
文摘Mapping in the dynamic environment is an important task for autonomous mobile robots due to the unavoidable changes in the workspace. In this paper, we propose a framework for RGBD SLAM in low dynamic environment, which can maintain a map keeping track of the latest environment. The main model describing the environment is a multi-session pose graph, which evolves over the multiple visits of the robot. The poses in the graph will be pruned when the 3D point scans corresponding to those poses are out of date. When the robot explores the new areas, its poses will be added to the graph. Thus the scans kept in the current graph will always give a map of the latest environment. The changes of the environment are detected by out-of-dated scans identification module through analyzing scans collected at different sessions. Besides, a redundant scans identification module is employed to further reduce the poses with redundant scans in order to keep the total number of poses in the graph with respect to the size of environment. In the experiments, the framework is first tuned and tested on data acquired by a Kinect from laboratory environment. Then the framework is applied to external dataset acquired by a Kinect II from a workspace of an industrial robot in another country, which is blind to the development phase, for further validation of the performance. After this two-step evaluation, the proposed framework is considered to be able to manage the map in date in dynamic or static environment with a noncumulative complexity and acceptable error level.
文摘Sparse representation is an effective data classification algorithm that depends on the known training samples to categorise the test sample.It has been widely used in various image classification tasks.Sparseness in sparse representation means that only a few of instances selected from all training samples can effectively convey the essential class-specific information of the test sample,which is very important for classification.For deformable images such as human faces,pixels at the same location of different images of the same subject usually have different intensities.Therefore,extracting features and correctly classifying such deformable objects is very hard.Moreover,the lighting,attitude and occlusion cause more difficulty.Considering the problems and challenges listed above,a novel image representation and classification algorithm is proposed.First,the authors’algorithm generates virtual samples by a non-linear variation method.This method can effectively extract the low-frequency information of space-domain features of the original image,which is very useful for representing deformable objects.The combination of the original and virtual samples is more beneficial to improve the clas-sification performance and robustness of the algorithm.Thereby,the authors’algorithm calculates the expression coefficients of the original and virtual samples separately using the sparse representation principle and obtains the final score by a designed efficient score fusion scheme.The weighting coefficients in the score fusion scheme are set entirely automatically.Finally,the algorithm classifies the samples based on the final scores.The experimental results show that our method performs better classification than conventional sparse representation algorithms.
基金This research is funded by National Natural Science Foundation of China under Project no. 61210013, Science and Technology Planning Project of Guangdong Province under no. 2014A020215027.
文摘In some complicated tabletop object manipulation task for robotic system, demonstration based control is an efficient way to enhance the stability of execution. In this paper, we use a new optical hand tracking sensor, LeapMotion, to perform a non-contact demonstration for robotic systems. A Multi-LeapMotion hand tracking system is developed. The setup of the two sensors is analyzed to gain a optimal way for efficiently use the informations from the two sensors. Meanwhile, the coordinate systems of the Mult-LeapMotion hand tracking device and the robotic demonstration system are developed. With the recognition to the element actions and the delay calibration, the fusion principles are developed to get the improved and corrected gesture recognition. The gesture recognition and scenario experiments are carried out, and indicate the improvement of the proposed Multi-LeapMotion hand tracking system in tabletop object manipulation task for robotic demonstration.
基金Ministry of Education,Culture,Sports,Science and Technology,Grant/Award Number:20K11867。
文摘By automatically learning the priors embedded in images with powerful modelling ca-pabilities,deep learning-based algorithms have recently made considerable progress in reconstructing the high-resolution hyperspectral(HR-HS)image.With previously collected large-amount of external data,these methods are intuitively realised under the full supervision of the ground-truth data.Thus,the database construction in merging the low-resolution(LR)HS(LR-HS)and HR multispectral(MS)or RGB image research paradigm,commonly named as HSI SR,requires collecting corresponding training triplets:HR-MS(RGB),LR-HS and HR-HS image simultaneously,and often faces dif-ficulties in reality.The learned models with the training datasets collected simultaneously under controlled conditions may significantly degrade the HSI super-resolved perfor-mance to the real images captured under diverse environments.To handle the above-mentioned limitations,the authors propose to leverage the deep internal and self-supervised learning to solve the HSI SR problem.The authors advocate that it is possible to train a specific CNN model at test time,called as deep internal learning(DIL),by on-line preparing the training triplet samples from the observed LR-HS/HR-MS(or RGB)images and the down-sampled LR-HS version.However,the number of the training triplets extracted solely from the transformed data of the observation itself is extremely few particularly for the HSI SR tasks with large spatial upscale factors,which would result in limited reconstruction performance.To solve this problem,the authors further exploit deep self-supervised learning(DSL)by considering the observations as the unlabelled training samples.Specifically,the degradation modules inside the network were elaborated to realise the spatial and spectral down-sampling procedures for transforming the generated HR-HS estimation to the high-resolution RGB/LR-HS approximation,and then the reconstruction errors of the observations were formulated for measuring the network modelling performance.By cons
文摘In content-based image retrieval(CBIR),primitive image signatures are critical because they represent the visual characteristics.Image signatures,which are algorithmically descriptive and accurately recognized visual components,are used to appropriately index and retrieve comparable results.To differentiate an image in the category of qualifying contender,feature vectors must have image information's like colour,objects,shape,spatial viewpoints.Previous methods such as sketch-based image retrieval by salient contour(SBIR)and greedy learning of deep Boltzmann machine(GDBM)used spatial information to distinguish between image categories.This requires interest points and also feature analysis emerged image detection problems.Thus,a proposed model to overcome this issue and predict the repeating pattern as well as series of pixels that conclude similarity has been necessary.In this study,a technique called CBIR-similarity measure via artificial neural network interpolation(CBIR-SMANN)has been presented.By collecting datasets,the images are resized then subject to Gaussian filtering in the pre-processing stage,then by permitting them to the Hessian detector,the interesting points are gathered.Based on Skewness,mean,kurtosis and standard deviation features were extracted then given to ANN for interpolation.Interpolated results are stored in a database for retrieval.In the testing stage,the query image was inputted that is subjected to pre-processing,and feature extraction was then fed to the similarity measurement function.Thus,ANN helps to get similar images from the database.CBIR-SMANN have been implemented in the python tool and then evaluated for its performance.Results show that CBIR-SMANN exhibited a high recall value of 78%with a minimum retrieval time of 980 ms.This showed the supremacy of the proposed model was comparatively greater than the previous ones.
基金973 Program,Grant/Award Number:2014CB340504The State Key Program of National Natural Science of China,Grant/Award Number:61533018+3 种基金National Natural Science Foundation of China,Grant/Award Number:61402220The Philosophy and Social Science Foundation of Hunan Province,Grant/Award Number:16YBA323Natural Science Foundation of Hunan Province,Grant/Award Number:2020JJ4525Scientific Research Fund of Hunan Provincial Education Department,Grant/Award Number:18B279,19A439。
文摘Current Chinese event detection methods commonly use word embedding to capture semantic representation,but these methods find it difficult to capture the dependence relationship between the trigger words and other words in the same sentence.Based on the simple evaluation,it is known that a dependency parser can effectively capture dependency relationships and improve the accuracy of event categorisation.This study proposes a novel architecture that models a hybrid representation to summarise semantic and structural information from both characters and words.This model can capture rich semantic features for the event detection task by incorporating the semantic representation generated from the dependency parser.The authors evaluate different models on kbp 2017 corpus.The experimental results show that the proposed method can significantly improve performance in Chinese event detection.