Automatic error localisation for categorical, continuous and integer data.

Ton de Waal

SORT (2005)

  • Volume: 29, Issue: 1, page 57-100
  • ISSN: 1696-2281

Abstract

top
Data collected by statistical offices generally contain errors, which have to be corrected before reliable data can be published. This correction process is referred to as statistical data editing. At statistical offices, certain rules, so-called edits, are often used during the editing process to determine whether a record is consistent or not. Inconsistent records are considered to contain errors, while consistent records are considered error-free. In this article we focus on automatic error localisation based on the Fellegi-Holt paradigm, which says that the data should be made to satisfy all edits by changing the fewest possible number of fields. Adoption of this paradigm leads to a mathematical optimisation problem. We propose an algorithm to solve this optimisation problem for a mix of categorical, continuous and integer-valued data. We also propose a heuristic procedure based on the exact algorithm. For five realistic data sets involving only integer-valued variables we evaluate the performance of this heuristic procedure.

How to cite

top

Waal, Ton de. "Automatic error localisation for categorical, continuous and integer data.." SORT 29.1 (2005): 57-100. <http://eudml.org/doc/40467>.

@article{Waal2005,
abstract = {Data collected by statistical offices generally contain errors, which have to be corrected before reliable data can be published. This correction process is referred to as statistical data editing. At statistical offices, certain rules, so-called edits, are often used during the editing process to determine whether a record is consistent or not. Inconsistent records are considered to contain errors, while consistent records are considered error-free. In this article we focus on automatic error localisation based on the Fellegi-Holt paradigm, which says that the data should be made to satisfy all edits by changing the fewest possible number of fields. Adoption of this paradigm leads to a mathematical optimisation problem. We propose an algorithm to solve this optimisation problem for a mix of categorical, continuous and integer-valued data. We also propose a heuristic procedure based on the exact algorithm. For five realistic data sets involving only integer-valued variables we evaluate the performance of this heuristic procedure.},
author = {Waal, Ton de},
journal = {SORT},
keywords = {Datos estadísticos; Corrección de errores; Programación matemática; Programación entera; Optimización; Heurística; Datos categóricos; branch-and-bound; categorical data; continuous data; error localisation; Fourier-Motzkin elimination; integer-valued data; statistical data editing},
language = {eng},
number = {1},
pages = {57-100},
title = {Automatic error localisation for categorical, continuous and integer data.},
url = {http://eudml.org/doc/40467},
volume = {29},
year = {2005},
}

TY - JOUR
AU - Waal, Ton de
TI - Automatic error localisation for categorical, continuous and integer data.
JO - SORT
PY - 2005
VL - 29
IS - 1
SP - 57
EP - 100
AB - Data collected by statistical offices generally contain errors, which have to be corrected before reliable data can be published. This correction process is referred to as statistical data editing. At statistical offices, certain rules, so-called edits, are often used during the editing process to determine whether a record is consistent or not. Inconsistent records are considered to contain errors, while consistent records are considered error-free. In this article we focus on automatic error localisation based on the Fellegi-Holt paradigm, which says that the data should be made to satisfy all edits by changing the fewest possible number of fields. Adoption of this paradigm leads to a mathematical optimisation problem. We propose an algorithm to solve this optimisation problem for a mix of categorical, continuous and integer-valued data. We also propose a heuristic procedure based on the exact algorithm. For five realistic data sets involving only integer-valued variables we evaluate the performance of this heuristic procedure.
LA - eng
KW - Datos estadísticos; Corrección de errores; Programación matemática; Programación entera; Optimización; Heurística; Datos categóricos; branch-and-bound; categorical data; continuous data; error localisation; Fourier-Motzkin elimination; integer-valued data; statistical data editing
UR - http://eudml.org/doc/40467
ER -

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.