A backward selection procedure for approximating a discrete probability distribution by decomposable models

Francesco M. Malvestuto

A backward selection procedure for approximating a discrete probability distribution by decomposable models

Francesco M. Malvestuto

Kybernetika (2012)

Volume: 48, Issue: 5, page 825-844
ISSN: 0023-5954

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution

p

over a finite set

V

of

n

discrete variables and a positive integer

k

, find a decomposable model with tree-width

k

that best fits

p

. If

ℋ

is the generating hypergraph of a decomposable model and

p_{ℋ}

is the estimate of

p

under the model, we can measure the closeness of

p_{ℋ}

to

p

by the information divergence

D (p : p_{ℋ})

, so that the problem above reads: given

p

and

k

, find an acyclic, connected hypergraph

ℋ

of tree-width

k

such that

D (p : p_{ℋ})

is minimum. It is well-known that this problem is

N P

-hard. However, for

k = 1

it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow-Liu solution, a few forward-selection procedures have been proposed with the aim at finding a ‘good’ solution for an arbitrary

k

. We propose a backward-selection procedure which starts from the (trivial) optimal solution for

k = n - 1

, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every

k

.

How to cite

top

MLA
BibTeX
RIS

Malvestuto, Francesco M.. "A backward selection procedure for approximating a discrete probability distribution by decomposable models." Kybernetika 48.5 (2012): 825-844. <http://eudml.org/doc/251359>.

@article{Malvestuto2012,
abstract = {Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution $p$ over a finite set $V$ of $n$ discrete variables and a positive integer $k$, find a decomposable model with tree-width $k$ that best fits $p$. If $\mathcal \{H\}$ is the generating hypergraph of a decomposable model and $p_\{\mathcal \{H\}\}$ is the estimate of $p$ under the model, we can measure the closeness of $p_\{\mathcal \{H\}\}$ to $p$ by the information divergence $D(p: p_\{\mathcal \{H\}\})$, so that the problem above reads: given $p$ and $k$, find an acyclic, connected hypergraph $\{\mathcal \{H\}\}$ of tree-width $k$ such that $D(p: p_\{\mathcal \{H\}\})$ is minimum. It is well-known that this problem is $NP$-hard. However, for $k = 1$ it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow-Liu solution, a few forward-selection procedures have been proposed with the aim at finding a ‘good’ solution for an arbitrary $k$. We propose a backward-selection procedure which starts from the (trivial) optimal solution for $k=n-1$, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every $k$.},
author = {Malvestuto, Francesco M.},
journal = {Kybernetika},
keywords = {backward selection; information divergence; decomposable model; acyclic hypergraph; $k$-hypertree; backward selection; information divergence; decomposable model; acyclic hypergraph; -hypertree},
language = {eng},
number = {5},
pages = {825-844},
publisher = {Institute of Information Theory and Automation AS CR},
title = {A backward selection procedure for approximating a discrete probability distribution by decomposable models},
url = {http://eudml.org/doc/251359},
volume = {48},
year = {2012},
}

TY - JOUR
AU - Malvestuto, Francesco M.
TI - A backward selection procedure for approximating a discrete probability distribution by decomposable models
JO - Kybernetika
PY - 2012
PB - Institute of Information Theory and Automation AS CR
VL - 48
IS - 5
SP - 825
EP - 844
AB - Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution $p$ over a finite set $V$ of $n$ discrete variables and a positive integer $k$, find a decomposable model with tree-width $k$ that best fits $p$. If $\mathcal {H}$ is the generating hypergraph of a decomposable model and $p_{\mathcal {H}}$ is the estimate of $p$ under the model, we can measure the closeness of $p_{\mathcal {H}}$ to $p$ by the information divergence $D(p: p_{\mathcal {H}})$, so that the problem above reads: given $p$ and $k$, find an acyclic, connected hypergraph ${\mathcal {H}}$ of tree-width $k$ such that $D(p: p_{\mathcal {H}})$ is minimum. It is well-known that this problem is $NP$-hard. However, for $k = 1$ it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow-Liu solution, a few forward-selection procedures have been proposed with the aim at finding a ‘good’ solution for an arbitrary $k$. We propose a backward-selection procedure which starts from the (trivial) optimal solution for $k=n-1$, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every $k$.
LA - eng
KW - backward selection; information divergence; decomposable model; acyclic hypergraph; $k$-hypertree; backward selection; information divergence; decomposable model; acyclic hypergraph; -hypertree
UR - http://eudml.org/doc/251359
ER -

References

top

Almond, R., Kong, A., Optimality Issues in Constructing a Markov Tree from Graphical Models., Research Report A-3, Dept. Statistics, Harvard University, 1991.
Altmüller, A., Haralick, R. M., Approximating high dimensional probability disributions., In: Proc. XVII Int. Conf. on Patter Recognitions, 2004.
Altmüller, A., Haralick, R. M., Practical aspects of efficient forward selection in decomposable graphical models., In: Proc. XVI IEEE Int. Conf. on Tools with Artificial Intelligence, 2004, pp. 710-715.
Bach, F. R., Jordan, M. I., Thin junction trees., Adv. in Neural Inform. Proces. Systems 14 (2002), 569-572.
Badsberg, J.-H., Malvestuto, F. M., 10.1016/S0167-9473(01)00013-5, Comput. Statist. Data Anal. 37 (2001), 297-322. Zbl1061.65500 MR1856676 DOI10.1016/S0167-9473(01)00013-5
Beeri, C., Fagin, R., Maier, D., Yannakakis, M., 10.1145/2402.322389, J. ACM 30 (1983), 479-513. Zbl0624.68087 MR0709830 DOI10.1145/2402.322389
Beineke, L. W., Pippert, R. E., The enumeration of labelled 2-trees., J. Combin. Theory 6 (1969), 200-205. MR0234868
Bishop, Y., Fienberg, S. E., Holland, P. W., Discrete Multivariate Analysis: Theory and Practice., MIT Press, Cambridge 1975. Zbl1131.62043 MR0381130
Brown, D. T., 10.1016/S0019-9958(59)80016-4, Inform. and Control 2 (1959), 386-392. Zbl0117.14804 MR0110598 DOI10.1016/S0019-9958(59)80016-4
Chickering, D., Learning Bayesian networks is NP-complete., In: Learning from Data, Lecture Notes in Statist. 112 (1996), 121-130. MR1473013
Chow, C. K., Liu, C. N., 10.1109/TIT.1968.1054142, IEEE Trans. Inform. Theory 14 (1968), 462-467. Zbl0165.22305 DOI10.1109/TIT.1968.1054142
Cover, T. M., Elements of Information Theory., John Wiley and Sons, 1991. Zbl1140.94001 MR1122806
Csiszár, I., Körner, J., Information Theory., Academic Press, 1981. MR0666545
Dagum, P., Luby, M., 10.1016/0004-3702(93)90036-B, Artificial Intelligence 60 (1993), 141-153. MR1216898 DOI10.1016/0004-3702(93)90036-B
Dasgupta, S., Learning polytrees., In: Proc. XV Conference on Uncertainty in Artificial Intelligence, 1999, pp. 134-141.
Deshpande, A., Garofalakis, M., Jordan, M. I., Efficient stepwise selection in decomposable models., In: Proc. XVII Conf. on Uncertainty in Artificial Intelligence, 2001, pp. 128-135.
Ding, G., Lax, R. F., Chen, J., Chen, P. P., Marx, B. D., Comparison of greedy strategies for learning Markov networks of treewidth $k$ ., In: Proc. Int. Conf. on Machine Learning: Models, Technologies and Applications, 2007, pp. 294-301.
Havránek, T., On model search methods., In: Proc. IX Symp. on Computational Statistics, 1990, pp. 101-108.
Havránek, T., Simple formal systems in data analysis., In: Proc. Conf. on Symbolic-Numeric Data Analysis and Learning, 1991, pp. 373-381.
Jensen, F. V., Jensen, F., Optimal junction trees., In: Proc. X Conf. on Uncertainty in Artificial Intelligence (R. L. de Mantaras and D. Poole, eds.), 1994, pp. 360-366.
Karger, D., Srebro, N., Learning Markov networks: maximum bounded tree-width graphs., In: Proc. XII ACM-SIAM Symp. on Discrete Mathematics, 2001, pp. 392-401. Zbl0987.68067 MR1958431
Kloks, T., Tree-width., LNCS 842, Springer Verlag, Berlin 1994. Zbl0925.05052
Kocka, T., New algorithm for learning decomposable models., Unpublished manuscript, 2000.
Kovács, E., Szántai, T., Vine copulas as a mean for the construction of high dimensional probability distribution associated to a Markov network., arXiv:1105.1697v1, 2011.
Ku, H. H., Kullback, S., 10.1109/TIT.1969.1054336, IEEE Trans. Inform. Theory 15 (1969), 444-447. Zbl0174.23202 MR0243669 DOI10.1109/TIT.1969.1054336
Lauritzen, S. L., Graphical Models., Clarendon Press, Oxford 1996. MR1419991
II, P. M. Lewis, 10.1016/S0019-9958(59)90207-4, Inform. and Control 2 (1959), 214-225. MR0110597 DOI10.1016/S0019-9958(59)90207-4
Malvestuto, F. M., Operations research in the design of statistical databases (in Italian)., In: Proc. AIRO Meeting on Operations Research and Computer Science, 1986, pp. 117-130.
Malvestuto, F. M., 10.1109/21.120082, IEEE Trans. Systems, Man and Cybernetics 21 (1991), 1287-1294. DOI10.1109/21.120082
Malvestuto, F. M., An axiomatization of loglinear models with an application to the model search., In: Learning from Data, LNS 112 (1996), pp. 175-184.
Malvestuto, F. M., Designing a probabilistic database from a given set of full conditional independences., In: Proc. Workshop on Conditional Independence Structures and Graphical Models, 1999.
Malvestuto, F. M., 10.1023/A:1008979300007, Statist. Comput. 11 (2001), 155-169. MR1837135 DOI10.1023/A:1008979300007
Malvestuto, F. M., 10.1023/A:1014551721406, Ann. Math. and Artificial Intelligence 35 (2002), 253-285. Zbl1001.68033 MR1899954 DOI10.1023/A:1014551721406
Malvestuto, F. M., Tree and local computations in a cross-entropy minimization problem with marginal constraints., Kybernetika 46 (2010), 621-654. Zbl1204.93113 MR2722092
Meek, C., Finding a path is harder than finding a tree., J. Artificial Intelligence Res. 15 (2001), 383-389. Zbl0994.68120 MR1884083
Mezzini, M., Moscarini, M., 10.1016/j.tcs.2009.10.004, Theoret. Comput. Sci. 411 (2010), 958-966. Zbl1213.05251 MR2606034 DOI10.1016/j.tcs.2009.10.004
Nunez, K., Chen, J., Chen, P., Ding, G., Lax, R. F., Marx, B., Empirical comparison of greedy strategies for learning Markov networks of treewidth $k$ ., In: Proc. VII Int. Conf. on Machine Learning and Applications, 2008, pp. 106-113.
Rose, J. D., 10.1016/0012-365X(74)90042-9, Discrete Math. 7 (1974), 317-322. Zbl0285.05128 MR0335319 DOI10.1016/0012-365X(74)90042-9
Szántai, T., Kovács, E., Hypergraphs as a mean of discovering the dependence structure of a discrete multivariate probability distribution., In: Proc. Conf. on Applied Mathematical Programming and Modelling, 2008; also in Ann. Oper. Res. 193 (2012), 71-90. MR2874757
Szántai, T., Kovács, E., Discovering a junction tree behind a Markov network by a greedy algorithm., arXiv:1104.2762v3, 2011.
Tarjan, R. E., Yannakakis, M., 10.1137/0213035, SIAM J. Comput. 13 (1984), 566-579. MR0749707 DOI10.1137/0213035
Wermuth, N., 10.2307/2529341, Biometrics 32 (1976), 95-108. Zbl0357.62049 MR0403088 DOI10.2307/2529341
Wermuth, N., 10.2307/2529496, Biometrics 32 (1976), 253-256. Zbl0339.62079 DOI10.2307/2529496
Xiang, Y., Wong, S. K. M., Cercone, N., 10.1023/A:1007324100110, Mach. Learning 26 (1997), 65-72. Zbl0866.68088 DOI10.1023/A:1007324100110

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.