# A backward selection procedure for approximating a discrete probability distribution by decomposable models

Kybernetika (2012)

• Volume: 48, Issue: 5, page 825-844
• ISSN: 0023-5954

top

## Abstract

top
Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution $p$ over a finite set $V$ of $n$ discrete variables and a positive integer $k$, find a decomposable model with tree-width $k$ that best fits $p$. If $ℋ$ is the generating hypergraph of a decomposable model and ${p}_{ℋ}$ is the estimate of $p$ under the model, we can measure the closeness of ${p}_{ℋ}$ to $p$ by the information divergence $D\left(p:{p}_{ℋ}\right)$, so that the problem above reads: given $p$ and $k$, find an acyclic, connected hypergraph $ℋ$ of tree-width $k$ such that $D\left(p:{p}_{ℋ}\right)$ is minimum. It is well-known that this problem is $NP$-hard. However, for $k=1$ it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow-Liu solution, a few forward-selection procedures have been proposed with the aim at finding a ‘good’ solution for an arbitrary $k$. We propose a backward-selection procedure which starts from the (trivial) optimal solution for $k=n-1$, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every $k$.

## How to cite

top

Malvestuto, Francesco M.. "A backward selection procedure for approximating a discrete probability distribution by decomposable models." Kybernetika 48.5 (2012): 825-844. <http://eudml.org/doc/251359>.

@article{Malvestuto2012,
abstract = {Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution $p$ over a finite set $V$ of $n$ discrete variables and a positive integer $k$, find a decomposable model with tree-width $k$ that best fits $p$. If $\mathcal \{H\}$ is the generating hypergraph of a decomposable model and $p_\{\mathcal \{H\}\}$ is the estimate of $p$ under the model, we can measure the closeness of $p_\{\mathcal \{H\}\}$ to $p$ by the information divergence $D(p: p_\{\mathcal \{H\}\})$, so that the problem above reads: given $p$ and $k$, find an acyclic, connected hypergraph $\{\mathcal \{H\}\}$ of tree-width $k$ such that $D(p: p_\{\mathcal \{H\}\})$ is minimum. It is well-known that this problem is $NP$-hard. However, for $k = 1$ it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow-Liu solution, a few forward-selection procedures have been proposed with the aim at finding a ‘good’ solution for an arbitrary $k$. We propose a backward-selection procedure which starts from the (trivial) optimal solution for $k=n-1$, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every $k$.},
author = {Malvestuto, Francesco M.},
journal = {Kybernetika},
keywords = {backward selection; information divergence; decomposable model; acyclic hypergraph; $k$-hypertree; backward selection; information divergence; decomposable model; acyclic hypergraph; -hypertree},
language = {eng},
number = {5},
pages = {825-844},
publisher = {Institute of Information Theory and Automation AS CR},
title = {A backward selection procedure for approximating a discrete probability distribution by decomposable models},
url = {http://eudml.org/doc/251359},
volume = {48},
year = {2012},
}

TY - JOUR
AU - Malvestuto, Francesco M.
TI - A backward selection procedure for approximating a discrete probability distribution by decomposable models
JO - Kybernetika
PY - 2012
PB - Institute of Information Theory and Automation AS CR
VL - 48
IS - 5
SP - 825
EP - 844
AB - Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution $p$ over a finite set $V$ of $n$ discrete variables and a positive integer $k$, find a decomposable model with tree-width $k$ that best fits $p$. If $\mathcal {H}$ is the generating hypergraph of a decomposable model and $p_{\mathcal {H}}$ is the estimate of $p$ under the model, we can measure the closeness of $p_{\mathcal {H}}$ to $p$ by the information divergence $D(p: p_{\mathcal {H}})$, so that the problem above reads: given $p$ and $k$, find an acyclic, connected hypergraph ${\mathcal {H}}$ of tree-width $k$ such that $D(p: p_{\mathcal {H}})$ is minimum. It is well-known that this problem is $NP$-hard. However, for $k = 1$ it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow-Liu solution, a few forward-selection procedures have been proposed with the aim at finding a ‘good’ solution for an arbitrary $k$. We propose a backward-selection procedure which starts from the (trivial) optimal solution for $k=n-1$, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every $k$.
LA - eng
KW - backward selection; information divergence; decomposable model; acyclic hypergraph; $k$-hypertree; backward selection; information divergence; decomposable model; acyclic hypergraph; -hypertree
UR - http://eudml.org/doc/251359
ER -

## References

top
1. Almond, R., Kong, A., Optimality Issues in Constructing a Markov Tree from Graphical Models., Research Report A-3, Dept. Statistics, Harvard University, 1991.
2. Altmüller, A., Haralick, R. M., Approximating high dimensional probability disributions., In: Proc. XVII Int. Conf. on Patter Recognitions, 2004.
3. Altmüller, A., Haralick, R. M., Practical aspects of efficient forward selection in decomposable graphical models., In: Proc. XVI IEEE Int. Conf. on Tools with Artificial Intelligence, 2004, pp. 710-715.
4. Bach, F. R., Jordan, M. I., Thin junction trees., Adv. in Neural Inform. Proces. Systems 14 (2002), 569-572.
5. Badsberg, J.-H., Malvestuto, F. M., 10.1016/S0167-9473(01)00013-5, Comput. Statist. Data Anal. 37 (2001), 297-322. Zbl1061.65500MR1856676DOI10.1016/S0167-9473(01)00013-5
6. Beeri, C., Fagin, R., Maier, D., Yannakakis, M., 10.1145/2402.322389, J. ACM 30 (1983), 479-513. Zbl0624.68087MR0709830DOI10.1145/2402.322389
7. Beineke, L. W., Pippert, R. E., The enumeration of labelled 2-trees., J. Combin. Theory 6 (1969), 200-205. MR0234868
8. Bishop, Y., Fienberg, S. E., Holland, P. W., Discrete Multivariate Analysis: Theory and Practice., MIT Press, Cambridge 1975. Zbl1131.62043MR0381130
9. Brown, D. T., 10.1016/S0019-9958(59)80016-4, Inform. and Control 2 (1959), 386-392. Zbl0117.14804MR0110598DOI10.1016/S0019-9958(59)80016-4
10. Chickering, D., Learning Bayesian networks is NP-complete., In: Learning from Data, Lecture Notes in Statist. 112 (1996), 121-130. MR1473013
11. Chow, C. K., Liu, C. N., 10.1109/TIT.1968.1054142, IEEE Trans. Inform. Theory 14 (1968), 462-467. Zbl0165.22305DOI10.1109/TIT.1968.1054142
12. Cover, T. M., Elements of Information Theory., John Wiley and Sons, 1991. Zbl1140.94001MR1122806
13. Csiszár, I., Körner, J., Information Theory., Academic Press, 1981. MR0666545
14. Dagum, P., Luby, M., 10.1016/0004-3702(93)90036-B, Artificial Intelligence 60 (1993), 141-153. MR1216898DOI10.1016/0004-3702(93)90036-B
15. Dasgupta, S., Learning polytrees., In: Proc. XV Conference on Uncertainty in Artificial Intelligence, 1999, pp. 134-141.
16. Deshpande, A., Garofalakis, M., Jordan, M. I., Efficient stepwise selection in decomposable models., In: Proc. XVII Conf. on Uncertainty in Artificial Intelligence, 2001, pp. 128-135.
17. Ding, G., Lax, R. F., Chen, J., Chen, P. P., Marx, B. D., Comparison of greedy strategies for learning Markov networks of treewidth $k$., In: Proc. Int. Conf. on Machine Learning: Models, Technologies and Applications, 2007, pp. 294-301.
18. Havránek, T., On model search methods., In: Proc. IX Symp. on Computational Statistics, 1990, pp. 101-108.
19. Havránek, T., Simple formal systems in data analysis., In: Proc. Conf. on Symbolic-Numeric Data Analysis and Learning, 1991, pp. 373-381.
20. Jensen, F. V., Jensen, F., Optimal junction trees., In: Proc. X Conf. on Uncertainty in Artificial Intelligence (R. L. de Mantaras and D. Poole, eds.), 1994, pp. 360-366.
21. Karger, D., Srebro, N., Learning Markov networks: maximum bounded tree-width graphs., In: Proc. XII ACM-SIAM Symp. on Discrete Mathematics, 2001, pp. 392-401. Zbl0987.68067MR1958431
22. Kloks, T., Tree-width., LNCS 842, Springer Verlag, Berlin 1994. Zbl0925.05052
23. Kocka, T., New algorithm for learning decomposable models., Unpublished manuscript, 2000.
24. Kovács, E., Szántai, T., Vine copulas as a mean for the construction of high dimensional probability distribution associated to a Markov network., arXiv:1105.1697v1, 2011.
25. Ku, H. H., Kullback, S., 10.1109/TIT.1969.1054336, IEEE Trans. Inform. Theory 15 (1969), 444-447. Zbl0174.23202MR0243669DOI10.1109/TIT.1969.1054336
26. Lauritzen, S. L., Graphical Models., Clarendon Press, Oxford 1996. MR1419991
27. II, P. M. Lewis, 10.1016/S0019-9958(59)90207-4, Inform. and Control 2 (1959), 214-225. MR0110597DOI10.1016/S0019-9958(59)90207-4
28. Malvestuto, F. M., Operations research in the design of statistical databases (in Italian)., In: Proc. AIRO Meeting on Operations Research and Computer Science, 1986, pp. 117-130.
29. Malvestuto, F. M., 10.1109/21.120082, IEEE Trans. Systems, Man and Cybernetics 21 (1991), 1287-1294. DOI10.1109/21.120082
30. Malvestuto, F. M., An axiomatization of loglinear models with an application to the model search., In: Learning from Data, LNS 112 (1996), pp. 175-184.
31. Malvestuto, F. M., Designing a probabilistic database from a given set of full conditional independences., In: Proc. Workshop on Conditional Independence Structures and Graphical Models, 1999.
32. Malvestuto, F. M., 10.1023/A:1008979300007, Statist. Comput. 11 (2001), 155-169. MR1837135DOI10.1023/A:1008979300007
33. Malvestuto, F. M., 10.1023/A:1014551721406, Ann. Math. and Artificial Intelligence 35 (2002), 253-285. Zbl1001.68033MR1899954DOI10.1023/A:1014551721406
34. Malvestuto, F. M., Tree and local computations in a cross-entropy minimization problem with marginal constraints., Kybernetika 46 (2010), 621-654. Zbl1204.93113MR2722092
35. Meek, C., Finding a path is harder than finding a tree., J. Artificial Intelligence Res. 15 (2001), 383-389. Zbl0994.68120MR1884083
36. Mezzini, M., Moscarini, M., 10.1016/j.tcs.2009.10.004, Theoret. Comput. Sci. 411 (2010), 958-966. Zbl1213.05251MR2606034DOI10.1016/j.tcs.2009.10.004
37. Nunez, K., Chen, J., Chen, P., Ding, G., Lax, R. F., Marx, B., Empirical comparison of greedy strategies for learning Markov networks of treewidth $k$., In: Proc. VII Int. Conf. on Machine Learning and Applications, 2008, pp. 106-113.
38. Rose, J. D., 10.1016/0012-365X(74)90042-9, Discrete Math. 7 (1974), 317-322. Zbl0285.05128MR0335319DOI10.1016/0012-365X(74)90042-9
39. Szántai, T., Kovács, E., Hypergraphs as a mean of discovering the dependence structure of a discrete multivariate probability distribution., In: Proc. Conf. on Applied Mathematical Programming and Modelling, 2008; also in Ann. Oper. Res. 193 (2012), 71-90. MR2874757
40. Szántai, T., Kovács, E., Discovering a junction tree behind a Markov network by a greedy algorithm., arXiv:1104.2762v3, 2011.
41. Tarjan, R. E., Yannakakis, M., 10.1137/0213035, SIAM J. Comput. 13 (1984), 566-579. MR0749707DOI10.1137/0213035
42. Wermuth, N., 10.2307/2529341, Biometrics 32 (1976), 95-108. Zbl0357.62049MR0403088DOI10.2307/2529341
43. Wermuth, N., 10.2307/2529496, Biometrics 32 (1976), 253-256. Zbl0339.62079DOI10.2307/2529496
44. Xiang, Y., Wong, S. K. M., Cercone, N., 10.1023/A:1007324100110, Mach. Learning 26 (1997), 65-72. Zbl0866.68088DOI10.1023/A:1007324100110

## NotesEmbed?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.