Tree and local computations in a cross–entropy minimization problem with marginal constraints

Francesco M. Malvestuto

Kybernetika (2010)

  • Volume: 46, Issue: 4, page 621-654
  • ISSN: 0023-5954

Abstract

top
In probability theory, Bayesian statistics, artificial intelligence and database theory the minimum cross-entropy principle is often used to estimate a distribution with a given set P of marginal distributions under the proportionality assumption with respect to a given “prior” distribution q . Such an estimation problem admits a solution if and only if there exists an extension of P that is dominated by q . In this paper we consider the case that q is not given explicitly, but is specified as the maximum-entropy extension of an auxiliary set Q of distributions. There are three problems that naturally arise: (1) the existence of an extension of a distribution set (such as P and Q ), (2) the existence of an extension of P that is dominated by the maximum entropy extension of Q , (3) the numeric computation of the minimum cross-entropy extension of P with respect to the maximum entropy extension of Q . In the spirit of a divide-and-conquer approach, we prove that, for each of the three above-mentioned problems, the global solution can be easily obtained by combining the solutions to subproblems defined at node level of a suitable tree.

How to cite

top

Malvestuto, Francesco M.. "Tree and local computations in a cross–entropy minimization problem with marginal constraints." Kybernetika 46.4 (2010): 621-654. <http://eudml.org/doc/196453>.

@article{Malvestuto2010,
abstract = {In probability theory, Bayesian statistics, artificial intelligence and database theory the minimum cross-entropy principle is often used to estimate a distribution with a given set $P$ of marginal distributions under the proportionality assumption with respect to a given “prior” distribution $q$. Such an estimation problem admits a solution if and only if there exists an extension of $P$ that is dominated by $q$. In this paper we consider the case that $q$ is not given explicitly, but is specified as the maximum-entropy extension of an auxiliary set $Q$ of distributions. There are three problems that naturally arise: (1) the existence of an extension of a distribution set (such as $P$ and $Q$), (2) the existence of an extension of $P$ that is dominated by the maximum entropy extension of $Q$, (3) the numeric computation of the minimum cross-entropy extension of $P$ with respect to the maximum entropy extension of $Q$. In the spirit of a divide-and-conquer approach, we prove that, for each of the three above-mentioned problems, the global solution can be easily obtained by combining the solutions to subproblems defined at node level of a suitable tree.},
author = {Malvestuto, Francesco M.},
journal = {Kybernetika},
keywords = {cross-entropy; acyclic hypergraph; connection tree; junction tree; probabilistic database; relational database; cross-entropy; acyclic hypergraph; connection tree; junction tree; probabilistic database; relational database},
language = {eng},
number = {4},
pages = {621-654},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Tree and local computations in a cross–entropy minimization problem with marginal constraints},
url = {http://eudml.org/doc/196453},
volume = {46},
year = {2010},
}

TY - JOUR
AU - Malvestuto, Francesco M.
TI - Tree and local computations in a cross–entropy minimization problem with marginal constraints
JO - Kybernetika
PY - 2010
PB - Institute of Information Theory and Automation AS CR
VL - 46
IS - 4
SP - 621
EP - 654
AB - In probability theory, Bayesian statistics, artificial intelligence and database theory the minimum cross-entropy principle is often used to estimate a distribution with a given set $P$ of marginal distributions under the proportionality assumption with respect to a given “prior” distribution $q$. Such an estimation problem admits a solution if and only if there exists an extension of $P$ that is dominated by $q$. In this paper we consider the case that $q$ is not given explicitly, but is specified as the maximum-entropy extension of an auxiliary set $Q$ of distributions. There are three problems that naturally arise: (1) the existence of an extension of a distribution set (such as $P$ and $Q$), (2) the existence of an extension of $P$ that is dominated by the maximum entropy extension of $Q$, (3) the numeric computation of the minimum cross-entropy extension of $P$ with respect to the maximum entropy extension of $Q$. In the spirit of a divide-and-conquer approach, we prove that, for each of the three above-mentioned problems, the global solution can be easily obtained by combining the solutions to subproblems defined at node level of a suitable tree.
LA - eng
KW - cross-entropy; acyclic hypergraph; connection tree; junction tree; probabilistic database; relational database; cross-entropy; acyclic hypergraph; connection tree; junction tree; probabilistic database; relational database
UR - http://eudml.org/doc/196453
ER -

References

top
  1. Asmussen, S., Edwards, D., 10.1093/biomet/70.3.567, Biometrika 70 (1983), 367–378. (1983) Zbl0549.62041MR0725370DOI10.1093/biomet/70.3.567
  2. Bacharach, M., Biproportional Matrices and Input-Output Change, Cambridge University Press, Cambridge 1970. (1970) Zbl0195.49705MR0263409
  3. Badsberg, J.-H., Malvestuto, F. M., 10.1016/S0167-9473(01)00013-5, Comput. Statist. Data Analysis 37 (2001), 297–322. (2001) Zbl1061.65500MR1856676DOI10.1016/S0167-9473(01)00013-5
  4. Beeri, C., Vardi, M., On the Properties of Full Join Dependencies, Adv. Database Theory I, Plenum Press, New York 1981. (1981) 
  5. Beeri, C., Fagin, R., Maier, D., Yannakakis, M., 10.1145/2402.322389, J. Assoc. Comput. Mach. 30 (1983), 479–513. (1983) Zbl0624.68087MR0709830DOI10.1145/2402.322389
  6. Berge, C., Hypergraphs, North-Holland, Amsterdam 1989. (1989) Zbl0674.05001MR1013569
  7. Berge, C., Discrete Multivariate Analysis, MIT Press, Cambridge 1975. (1975) 
  8. Csiszár, I., 10.1214/aop/1176996454, Ann. Probab. 3 (1975), 146–158. (1975) MR0365798DOI10.1214/aop/1176996454
  9. Csiszár, I., Maxent, mathematics, and information theory, In: Proc. Internat. Workshop on “Maximum entropy and Bayesian methods", 1995, pp. 35–50. (1995) MR1446714
  10. Dall’Aglio, G., Kotz, K., Salinetti, G., Advances in Probability Distributions with Given Marginals, Kluwer Academic Pub., Dordrecht, Boston, London 1991. (1991) MR1215942
  11. Darroch, J. N., Ratcliff, D., 10.1214/aoms/1177692379, Ann. Math. Statist. 43 (1972), 1470–1480. (1972) Zbl0251.62020MR0345337DOI10.1214/aoms/1177692379
  12. Deming, W. E., Statistical Adjustment of Data, Dover Pub., New York 1943. (1943) Zbl0060.31504MR0009819
  13. Deming, W. E., Stephan, F. F., 10.1214/aoms/1177731829, Ann. Math. Statist. 11 (1940), 427–444. (1940) Zbl0024.05502MR0003527DOI10.1214/aoms/1177731829
  14. Endo, Y., Takemura, A. I., 10.1016/j.csda.2008.11.013, Comput. Statist. Data Analysis 53 (2009), 966–978. (2009) MR2657062DOI10.1016/j.csda.2008.11.013
  15. Fienberg, S. E., 10.1214/aoms/1177696968, Ann. Math. Statist. 41 (1970), 907–917. (1970) Zbl0198.23401MR0266394DOI10.1214/aoms/1177696968
  16. Fienberg, S. E., Meyer, M. M., Iterative proportional fitting, In: Encyclopedia of Statistical Sciences (S. Kotz, N. L. Johnson, and C. B. Read, eds.), 4, John Wiley and Sons, New York, pp.  275–279. 
  17. Haberman, S. J., Log-linear Models for Contingency Tables, University of Chicago Press, Chicago 1974. (1974) 
  18. Ireland, C. T., Kullback, S., 10.1093/biomet/55.1.179, Biometrika 55 (1968), 179–188. (1968) Zbl0155.26701MR0229329DOI10.1093/biomet/55.1.179
  19. Jiroušek, R., Composition of probability measures on finite spaces, In: Proc. XIII Internat. Conf. Uncertainty in Artificial Intelligence 1997, pp. 274–281. (1997) 
  20. Jiroušek, R., Přeučil, S., 10.1016/0167-9473(93)E0055-9, Comput. Statist. Data Analysis 19 (1995), 177–189. (1995) DOI10.1016/0167-9473(93)E0055-9
  21. Johnson, R. W., 10.1109/TIT.1979.1056113, IEEE Trans. Inform. Theory 25 (1979), 709–716. (1979) Zbl0422.60016MR0551270DOI10.1109/TIT.1979.1056113
  22. Kellerer, H. G., 10.1007/BF00534912, Zeitschrift Wahrscheinlichkeitstheorie und Verw. Gebiete 3 (1964), 247–270. (1964) Zbl0126.34003MR0175158DOI10.1007/BF00534912
  23. Kellerer, H. G., 10.1007/BF01360315, Math. Annalen 153 (1964), 168–198. (1964) Zbl0118.05003MR0161956DOI10.1007/BF01360315
  24. Kern-Isberner, G., 10.1016/S0004-3702(97)00068-4, Artificial Intelligence 98 (1998), 169–208. (1998) Zbl0903.68181MR1614388DOI10.1016/S0004-3702(97)00068-4
  25. Ku, H. H., Kullback, S., Interaction in multidimensional contingency tables: an information-theoretic approach, J. Res. Nat. Bur. Standards - Math. Sci. 72 B (1968), 159–199. (1968) Zbl0274.62036MR0258223
  26. Lauritzen, S. L., Graphical Models, Oxford Science Pub., Clarendom Press, Oxford 1996. (1996) MR1419991
  27. Lauritzen, S. L., Speed, M. P., Vijayan, K., 10.1017/S1446788700027300, J. Austral. Math. Soc. Ser. A 36 (1984), 12–29. (1984) Zbl0533.05046MR0719998DOI10.1017/S1446788700027300
  28. Lauritzen, S. L., Spiegelhalter, D. J., Local computations with probabilities on graphical structures and their application to expert systems, J. Roy. Stat. Soc. Ser. B 50 (1988), 157–224. (1988) Zbl0684.68106MR0964177
  29. Leimer, G., 10.1016/0012-365X(93)90510-Z, Discrete Math. 113 (1993), 99–123. (1993) Zbl0793.05128MR1212872DOI10.1016/0012-365X(93)90510-Z
  30. Leontief, W. W., The Structure of American Economy 1919–1929, Oxford University Press, New York 1941. (1941) 
  31. Leontief, W. W., Strout, A., Multiregional input-output analysis, In: Structural Interdependence and Economic Development, 1963, pp. 119–169. (1963) 
  32. Madigan, D., Mosurski, K., 10.1093/biomet/77.2.315, Biometrika 77 (1990), 315–319. (1990) Zbl0731.62113MR1064803DOI10.1093/biomet/77.2.315
  33. Madigan, D., Mosurski, K., Errata: An extension of the results of Asmussen and Edwards on collapsibility in contingency tables, Biometrika 86 (1999) 973. (1999) MR1741994
  34. Maier, D., The Theory of Relational Databases, Computer Science Press, 1983. (http://web.cecs.pdx.edu/ maier/TheoryBook/TRD.html) (1983) Zbl0519.68082MR0691493
  35. Maier, D., Ullman, J. D., 10.1016/0304-3975(84)90030-6, Theoret. Comput. Sci. 32 (1984), 185–199. (1984) Zbl0557.05054MR0761167DOI10.1016/0304-3975(84)90030-6
  36. Malvestuto, F. M., Answering queries in categorical data bases, In: Proc. VI ACM Symp. Principles of Database Systems 1987, pp. 87–96. (1987) 
  37. Malvestuto, F. M., 10.1016/0012-365X(88)90178-1, Discrete Math. 69 (1988), 61–77. (1988) Zbl0637.60021MR0935028DOI10.1016/0012-365X(88)90178-1
  38. Malvestuto, F. M., 10.1016/0167-9473(89)90046-7, Computat. Statist. Data Anal. 8 (1989), 299–311. (1989) Zbl0726.62012MR1028405DOI10.1016/0167-9473(89)90046-7
  39. Malvestuto, F. M., 10.1007/BF00162528, Statist. Computing 6 (1996), 169–176. (1996) DOI10.1007/BF00162528
  40. Malvestuto, F. M., 10.1023/A:1008979300007, Statist. Computing 11 (2001), 155–169. (2001) MR1837135DOI10.1023/A:1008979300007
  41. Malvestuto, F. M., 10.1023/A:1014551721406, Ann. Math. Artificial Intelligence 35 (2002), 253–285. (2002) Zbl1001.68033MR1899954DOI10.1023/A:1014551721406
  42. Malvestuto, F. M., 10.1016/j.disc.2009.01.003, Discrete Math. 309 (2009), 4287–4298. (2009) Zbl1211.05093MR2519164DOI10.1016/j.disc.2009.01.003
  43. Malvestuto, F. M., Moscarini, M., 10.1006/jcss.1998.1570, J. Comput. System Sci. 56 (1998), 299–309. (1998) Zbl0913.68060MR1633981DOI10.1006/jcss.1998.1570
  44. Malvestuto, F. M., Moscarini, M., 10.1016/S0304-3975(98)00128-5, Theoret. Comput. Sci. 237 (2000), 57–79. (2000) Zbl0939.68089MR1756201DOI10.1016/S0304-3975(98)00128-5
  45. Malvestuto, F. M., Pourabbas, E., Customized answers to summary queries via aggregate views, In: Proc. XVI Intl. Conf. Scientific & Statistical Database Management 2004, pp. 193–202. (2004) 
  46. Malvestuto, F. M., Pourabbas, E., Local computation of answers to table queries on summary databases, In: Proc. XVII Intl. Conf. Scientific & Statistical Database Management 2005, pp. 263–272. (2005) 
  47. Matúš, F., Discrete marginal problem for complex measures, Kybernetika 24 (1988), 36–46. (1988) MR0936552
  48. Matúš, F., On the maximum-entropy extensions of probability measures over undirected graphs, In: Proc. III Workshop Uncertainty Processing in Expert Systems 1994, pp. 181–198. (1994) 
  49. Matúš, F., Flusser, J., 10.1109/34.254058, IEEE Trans. Pattern Analysis and Machine Intelligence 15 (1993), 996–1006. (1993) DOI10.1109/34.254058
  50. Purcell, N. J., Kish, L., 10.2307/2530340, Biometrics 35 (1979), 365–384. (1979) Zbl0419.62092MR0535774DOI10.2307/2530340
  51. Purcell, N. J., Kish, L., 10.2307/1402400, Internat. Statist. Rev. 48 (1980), 3–18. (1980) Zbl0433.62080DOI10.2307/1402400
  52. Rüschendorf, L., 10.1214/aos/1176324703, Ann. Statist. 23 (1995), 1160–1174. (1995) MR1353500DOI10.1214/aos/1176324703
  53. Shore, J. E., Johnson, R. W., 10.1109/TIT.1981.1056373, IEEE Trans. Inform. Theory 27 (1981), 472–482. (1981) Zbl0459.94008MR0635526DOI10.1109/TIT.1981.1056373
  54. Stephan, F. F., 10.1214/aoms/1177731604, Ann. Math. Statist. 13 (1942), 166–178. (1942) MR0006674DOI10.1214/aoms/1177731604
  55. Stone, R., Brown, A., A Computable Model for Economic Growth: A Programme for Growth, No. 1, Chapman Hall, London 1962. (1962) 
  56. Tarjan, R. E., Yannakakis, M., 10.1137/0213035, SIAM J. Comput. 13 (1984), 566–579. (1984) MR0749707DOI10.1137/0213035
  57. Vomlel, J., 10.3166/jancl.14.367-386, J. Appl. Non-Classical Logics 14 (2004), 365–386. (2004) Zbl1185.68699DOI10.3166/jancl.14.367-386
  58. Vorob’ev, N. N., 10.1137/1107014, Theor. Prob. Appl. 7 (1962), 147–163. (1962) DOI10.1137/1107014
  59. Vorob’ev, N. N., 10.1137/1108047, Theor. Prob. Appl. 8 (1963), 420–429. (1963) MR0169295DOI10.1137/1108047
  60. Yannakakis, M., 10.1137/0602010, SIAM J. Algebraic Discrete Mathematics 2 (1981), 77–79. (1981) Zbl0496.68033MR0604513DOI10.1137/0602010
  61. Yannakakis, M., Algorithms for acyclic database schemes, In: Proc. VII Internat. Conf. Very Large Data Bases 1981, pp. 82–94. (1981) 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.