A backward selection procedure for approximating a discrete probability distribution by decomposable models
Kybernetika (2012)
- Volume: 48, Issue: 5, page 825-844
- ISSN: 0023-5954
Access Full Article
topAbstract
topHow to cite
topMalvestuto, Francesco M.. "A backward selection procedure for approximating a discrete probability distribution by decomposable models." Kybernetika 48.5 (2012): 825-844. <http://eudml.org/doc/251359>.
@article{Malvestuto2012,
abstract = {Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution $p$ over a finite set $V$ of $n$ discrete variables and a positive integer $k$, find a decomposable model with tree-width $k$ that best fits $p$. If $\mathcal \{H\}$ is the generating hypergraph of a decomposable model and $p_\{\mathcal \{H\}\}$ is the estimate of $p$ under the model, we can measure the closeness of $p_\{\mathcal \{H\}\}$ to $p$ by the information divergence $D(p: p_\{\mathcal \{H\}\})$, so that the problem above reads: given $p$ and $k$, find an acyclic, connected hypergraph $\{\mathcal \{H\}\}$ of tree-width $k$ such that $D(p: p_\{\mathcal \{H\}\})$ is minimum. It is well-known that this problem is $NP$-hard. However, for $k = 1$ it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow-Liu solution, a few forward-selection procedures have been proposed with the aim at finding a ‘good’ solution for an arbitrary $k$. We propose a backward-selection procedure which starts from the (trivial) optimal solution for $k=n-1$, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every $k$.},
author = {Malvestuto, Francesco M.},
journal = {Kybernetika},
keywords = {backward selection; information divergence; decomposable model; acyclic hypergraph; $k$-hypertree; backward selection; information divergence; decomposable model; acyclic hypergraph; -hypertree},
language = {eng},
number = {5},
pages = {825-844},
publisher = {Institute of Information Theory and Automation AS CR},
title = {A backward selection procedure for approximating a discrete probability distribution by decomposable models},
url = {http://eudml.org/doc/251359},
volume = {48},
year = {2012},
}
TY - JOUR
AU - Malvestuto, Francesco M.
TI - A backward selection procedure for approximating a discrete probability distribution by decomposable models
JO - Kybernetika
PY - 2012
PB - Institute of Information Theory and Automation AS CR
VL - 48
IS - 5
SP - 825
EP - 844
AB - Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution $p$ over a finite set $V$ of $n$ discrete variables and a positive integer $k$, find a decomposable model with tree-width $k$ that best fits $p$. If $\mathcal {H}$ is the generating hypergraph of a decomposable model and $p_{\mathcal {H}}$ is the estimate of $p$ under the model, we can measure the closeness of $p_{\mathcal {H}}$ to $p$ by the information divergence $D(p: p_{\mathcal {H}})$, so that the problem above reads: given $p$ and $k$, find an acyclic, connected hypergraph ${\mathcal {H}}$ of tree-width $k$ such that $D(p: p_{\mathcal {H}})$ is minimum. It is well-known that this problem is $NP$-hard. However, for $k = 1$ it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow-Liu solution, a few forward-selection procedures have been proposed with the aim at finding a ‘good’ solution for an arbitrary $k$. We propose a backward-selection procedure which starts from the (trivial) optimal solution for $k=n-1$, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every $k$.
LA - eng
KW - backward selection; information divergence; decomposable model; acyclic hypergraph; $k$-hypertree; backward selection; information divergence; decomposable model; acyclic hypergraph; -hypertree
UR - http://eudml.org/doc/251359
ER -
References
top- Almond, R., Kong, A., Optimality Issues in Constructing a Markov Tree from Graphical Models., Research Report A-3, Dept. Statistics, Harvard University, 1991.
- Altmüller, A., Haralick, R. M., Approximating high dimensional probability disributions., In: Proc. XVII Int. Conf. on Patter Recognitions, 2004.
- Altmüller, A., Haralick, R. M., Practical aspects of efficient forward selection in decomposable graphical models., In: Proc. XVI IEEE Int. Conf. on Tools with Artificial Intelligence, 2004, pp. 710-715.
- Bach, F. R., Jordan, M. I., Thin junction trees., Adv. in Neural Inform. Proces. Systems 14 (2002), 569-572.
- Badsberg, J.-H., Malvestuto, F. M., 10.1016/S0167-9473(01)00013-5, Comput. Statist. Data Anal. 37 (2001), 297-322. Zbl1061.65500MR1856676DOI10.1016/S0167-9473(01)00013-5
- Beeri, C., Fagin, R., Maier, D., Yannakakis, M., 10.1145/2402.322389, J. ACM 30 (1983), 479-513. Zbl0624.68087MR0709830DOI10.1145/2402.322389
- Beineke, L. W., Pippert, R. E., The enumeration of labelled 2-trees., J. Combin. Theory 6 (1969), 200-205. MR0234868
- Bishop, Y., Fienberg, S. E., Holland, P. W., Discrete Multivariate Analysis: Theory and Practice., MIT Press, Cambridge 1975. Zbl1131.62043MR0381130
- Brown, D. T., 10.1016/S0019-9958(59)80016-4, Inform. and Control 2 (1959), 386-392. Zbl0117.14804MR0110598DOI10.1016/S0019-9958(59)80016-4
- Chickering, D., Learning Bayesian networks is NP-complete., In: Learning from Data, Lecture Notes in Statist. 112 (1996), 121-130. MR1473013
- Chow, C. K., Liu, C. N., 10.1109/TIT.1968.1054142, IEEE Trans. Inform. Theory 14 (1968), 462-467. Zbl0165.22305DOI10.1109/TIT.1968.1054142
- Cover, T. M., Elements of Information Theory., John Wiley and Sons, 1991. Zbl1140.94001MR1122806
- Csiszár, I., Körner, J., Information Theory., Academic Press, 1981. MR0666545
- Dagum, P., Luby, M., 10.1016/0004-3702(93)90036-B, Artificial Intelligence 60 (1993), 141-153. MR1216898DOI10.1016/0004-3702(93)90036-B
- Dasgupta, S., Learning polytrees., In: Proc. XV Conference on Uncertainty in Artificial Intelligence, 1999, pp. 134-141.
- Deshpande, A., Garofalakis, M., Jordan, M. I., Efficient stepwise selection in decomposable models., In: Proc. XVII Conf. on Uncertainty in Artificial Intelligence, 2001, pp. 128-135.
- Ding, G., Lax, R. F., Chen, J., Chen, P. P., Marx, B. D., Comparison of greedy strategies for learning Markov networks of treewidth ., In: Proc. Int. Conf. on Machine Learning: Models, Technologies and Applications, 2007, pp. 294-301.
- Havránek, T., On model search methods., In: Proc. IX Symp. on Computational Statistics, 1990, pp. 101-108.
- Havránek, T., Simple formal systems in data analysis., In: Proc. Conf. on Symbolic-Numeric Data Analysis and Learning, 1991, pp. 373-381.
- Jensen, F. V., Jensen, F., Optimal junction trees., In: Proc. X Conf. on Uncertainty in Artificial Intelligence (R. L. de Mantaras and D. Poole, eds.), 1994, pp. 360-366.
- Karger, D., Srebro, N., Learning Markov networks: maximum bounded tree-width graphs., In: Proc. XII ACM-SIAM Symp. on Discrete Mathematics, 2001, pp. 392-401. Zbl0987.68067MR1958431
- Kloks, T., Tree-width., LNCS 842, Springer Verlag, Berlin 1994. Zbl0925.05052
- Kocka, T., New algorithm for learning decomposable models., Unpublished manuscript, 2000.
- Kovács, E., Szántai, T., Vine copulas as a mean for the construction of high dimensional probability distribution associated to a Markov network., arXiv:1105.1697v1, 2011.
- Ku, H. H., Kullback, S., 10.1109/TIT.1969.1054336, IEEE Trans. Inform. Theory 15 (1969), 444-447. Zbl0174.23202MR0243669DOI10.1109/TIT.1969.1054336
- Lauritzen, S. L., Graphical Models., Clarendon Press, Oxford 1996. MR1419991
- II, P. M. Lewis, 10.1016/S0019-9958(59)90207-4, Inform. and Control 2 (1959), 214-225. MR0110597DOI10.1016/S0019-9958(59)90207-4
- Malvestuto, F. M., Operations research in the design of statistical databases (in Italian)., In: Proc. AIRO Meeting on Operations Research and Computer Science, 1986, pp. 117-130.
- Malvestuto, F. M., 10.1109/21.120082, IEEE Trans. Systems, Man and Cybernetics 21 (1991), 1287-1294. DOI10.1109/21.120082
- Malvestuto, F. M., An axiomatization of loglinear models with an application to the model search., In: Learning from Data, LNS 112 (1996), pp. 175-184.
- Malvestuto, F. M., Designing a probabilistic database from a given set of full conditional independences., In: Proc. Workshop on Conditional Independence Structures and Graphical Models, 1999.
- Malvestuto, F. M., 10.1023/A:1008979300007, Statist. Comput. 11 (2001), 155-169. MR1837135DOI10.1023/A:1008979300007
- Malvestuto, F. M., 10.1023/A:1014551721406, Ann. Math. and Artificial Intelligence 35 (2002), 253-285. Zbl1001.68033MR1899954DOI10.1023/A:1014551721406
- Malvestuto, F. M., Tree and local computations in a cross-entropy minimization problem with marginal constraints., Kybernetika 46 (2010), 621-654. Zbl1204.93113MR2722092
- Meek, C., Finding a path is harder than finding a tree., J. Artificial Intelligence Res. 15 (2001), 383-389. Zbl0994.68120MR1884083
- Mezzini, M., Moscarini, M., 10.1016/j.tcs.2009.10.004, Theoret. Comput. Sci. 411 (2010), 958-966. Zbl1213.05251MR2606034DOI10.1016/j.tcs.2009.10.004
- Nunez, K., Chen, J., Chen, P., Ding, G., Lax, R. F., Marx, B., Empirical comparison of greedy strategies for learning Markov networks of treewidth ., In: Proc. VII Int. Conf. on Machine Learning and Applications, 2008, pp. 106-113.
- Rose, J. D., 10.1016/0012-365X(74)90042-9, Discrete Math. 7 (1974), 317-322. Zbl0285.05128MR0335319DOI10.1016/0012-365X(74)90042-9
- Szántai, T., Kovács, E., Hypergraphs as a mean of discovering the dependence structure of a discrete multivariate probability distribution., In: Proc. Conf. on Applied Mathematical Programming and Modelling, 2008; also in Ann. Oper. Res. 193 (2012), 71-90. MR2874757
- Szántai, T., Kovács, E., Discovering a junction tree behind a Markov network by a greedy algorithm., arXiv:1104.2762v3, 2011.
- Tarjan, R. E., Yannakakis, M., 10.1137/0213035, SIAM J. Comput. 13 (1984), 566-579. MR0749707DOI10.1137/0213035
- Wermuth, N., 10.2307/2529341, Biometrics 32 (1976), 95-108. Zbl0357.62049MR0403088DOI10.2307/2529341
- Wermuth, N., 10.2307/2529496, Biometrics 32 (1976), 253-256. Zbl0339.62079DOI10.2307/2529496
- Xiang, Y., Wong, S. K. M., Cercone, N., 10.1023/A:1007324100110, Mach. Learning 26 (1997), 65-72. Zbl0866.68088DOI10.1023/A:1007324100110
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.