Histoire et préhistoire de l'analyse des données
The paper suggests a generalization of widely used Holt-Winters smoothing and forecasting method for seasonal time series. The general concept of seasonality modeling is introduced both for the additive and multiplicative case. Several special cases are discussed, including a linear interpolation of seasonal indices and a usage of trigonometric functions. Both methods are fully applicable for time series with irregularly observed data (just the special case of missing observations was covered up...
Given an n-sample from some unknown density f on [0,1], it is easy to construct an histogram of the data based on some given partition of [0,1], but not so much is known about an optimal choice of the partition, especially when the data set is not large, even if one restricts to partitions into intervals of equal length. Existing methods are either rules of thumbs or based on asymptotic considerations and often involve some smoothness properties of f. Our purpose in this paper is to give an automatic,...
A two-place function measuring the degree of non-symmetry for (quasi-)copulas is considered. We construct copulas which are maximally non-symmetric on certain subsets of the unit square. It is shown that there is no copula (and no quasi-copula) which is maximally non-symmetric on the whole unit square.
We construct a new class of data driven tests for uniformity, which have greater average power than existing ones for finite samples. Using a simulation study, we show that these tests as well as some "optimal maximum test" attain an average power close to the optimal Bayes test. Finally, we prove that, in the middle range of the power function, the loss in average power of the "optimal maximum test" with respect to the Neyman-Pearson tests, constructed separately for each alternative, in the Gaussian...
Nonsensitiveness regions for estimators of linear functions, for confidence ellipsoids, for the level of a test of a linear hypothesis on parameters and for the value of the power function are investigated in a linear model with variance components. The influence of the design of an experiment on the nonsensitiveness regions mentioned is numerically demonstrated and discussed on an example.
We investigate the role of the initialization for the stability of the қ-means clustering algorithm. As opposed to other papers, we consider the actual қ-means algorithm (also known as Lloyd algorithm). In particular we leverage on the property that this algorithm can get stuck in local optima of the қ-means objective function. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and when they...
We investigate the role of the initialization for the stability of the қ-means clustering algorithm. As opposed to other papers, we consider the actual қ-means algorithm (also known as Lloyd algorithm). In particular we leverage on the property that this algorithm can get stuck in local optima of the қ-means objective function. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations...
We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the outcome of the final clustering result. To this end we study the convergence of cluster quality measures...
If a nonlinear regression model is linearized in a non-sufficient small neighbourhood of the actual parameter, then all statistical inferences may be deteriorated. Some criteria how to recognize this are already developed. The aim of the paper is to demonstrate the behaviour of the program for utilization of these criteria.
The asymptotic behavior of global errors of functional estimates plays a key role in hypothesis testing and confidence interval building. Whereas for pointwise errors asymptotic normality often easily follows from standard Central Limit Theorems, global errors asymptotics involve some additional techniques such as strong approximation, martingale theory and Poissonization. We review these techniques in the framework of density estimation from independent identically distributed random variables,...
Using the Bahadur representation of a sample quantile for m-dependent and strong mixing random variables, we establish the asymptotic distribution of the Hurwicz estimator for the coefficient of autoregression in a linear process with innovations belonging to the domain of attraction of an α-stable law (1 < α < 2). The present paper extends Hurwicz's result to the autoregressive model.