Grüss-type bounds for the covariance of transformed random variables.
Classifiers can be combined to reduce classification errors. We did experiments on a data set consisting of different sets of features of handwritten digits. Different types of classifiers were trained on these feature sets. The performances of these classifiers and combination rules were tested. The best results were acquired with the mean, median and product combination rules. The product was best for combining linear classifiers, the median for -NN classifiers. Training a classifier on all features...
The approximation of a high level quantile or of the expectation over a high quantile (Value at Risk (VaR) or Tail Value at Risk (TVaR) in risk management) is crucial for the insurance industry.We propose a new method to estimate high level quantiles of sums of risks. It is based on the estimation of the ratio between the VaR (or TVaR) of the sum and the VaR (or TVaR) of the maximum of the risks. We show that using the distribution of the maximum to approximate the VaR is much better than using...
A two-place function measuring the degree of non-symmetry for (quasi-)copulas is considered. We construct copulas which are maximally non-symmetric on certain subsets of the unit square. It is shown that there is no copula (and no quasi-copula) which is maximally non-symmetric on the whole unit square.
We investigate the role of the initialization for the stability of the қ-means clustering algorithm. As opposed to other papers, we consider the actual қ-means algorithm (also known as Lloyd algorithm). In particular we leverage on the property that this algorithm can get stuck in local optima of the қ-means objective function. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and when they...
We investigate the role of the initialization for the stability of the қ-means clustering algorithm. As opposed to other papers, we consider the actual қ-means algorithm (also known as Lloyd algorithm). In particular we leverage on the property that this algorithm can get stuck in local optima of the қ-means objective function. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations...
We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the outcome of the final clustering result. To this end we study the convergence of cluster quality measures...
In previous papers, evolution of dependence and ageing, for vectors of non-negative random variables, have been separately considered. Some analogies between the two evolutions emerge however in those studies. In the present paper, we propose a unified approach, based on semigroup arguments, explaining the origin of such analogies and relations among properties of stochastic dependence and ageing.
Magnetic Resonance Diffusion Tensor Imaging (MR–DTI) is a noninvasive in vivo method capable of examining the structure of human brain, providing information about the position and orientation of the neural tracts. After a short introduction to the principles of MR–DTI, this paper describes the steps of the proposed neural tract visualization technique based on the DTI data. The cornerstone of the algorithm is a texture diffusion procedure modeled mathematically by the problem for the Allen–Cahn...
In this article we propose small area estimators for both the small and large area parameters. When the objective is to estimate parameters at both levels, optimality is achieved by a sample design that combines fixed and proportional allocation. In such a design, one fraction of the sample is distributed proportionally among the small areas and the rest is evenly distributed. Simulation is used to assess the performance of the direct estimator and two composite small area estimators, for a range...
The purpose of feature selection in machine learning is at least two-fold - saving measurement acquisition costs and reducing the negative effects of the curse of dimensionality with the aim to improve the accuracy of the models and the classification rate of classifiers with respect to previously unknown data. Yet it has been shown recently that the process of feature selection itself can be negatively affected by the very same curse of dimensionality - feature selection methods may easily over-fit...
A national survey designed for estimating a specific population quantity is sometimes used for estimation of this quantity also for a small area, such as a province. Budget constraints do not allow a greater sample size for the small area, and so other means of improving estimation have to be devised. We investigate such methods and assess them by a Monte Carlo study. We explore how a complementary survey can be exploited in small area estimation. We use the context of the Spanish Labour Force Survey...