Learning tree languages from text

Henning Fernau

RAIRO - Theoretical Informatics and Applications (2007)

  • Volume: 41, Issue: 4, page 351-374
  • ISSN: 0988-3754

Abstract

top
We study the problem of learning regular tree languages from text. We show that the framework of function distinguishability, as introduced by the author in [Theoret. Comput. Sci.290 (2003) 1679–1711], can be generalized from the case of string languages towards tree languages. This provides a large source of identifiable classes of regular tree languages. Each of these classes can be characterized in various ways. Moreover, we present a generic inference algorithm with polynomial update time and prove its correctness. In this way, we generalize previous works of Angluin, Sakakibara and ourselves. Moreover, we show that this way all regular tree languages can be approximately identified.

How to cite

top

Fernau, Henning. "Learning tree languages from text." RAIRO - Theoretical Informatics and Applications 41.4 (2007): 351-374. <http://eudml.org/doc/250053>.

@article{Fernau2007,
abstract = { We study the problem of learning regular tree languages from text. We show that the framework of function distinguishability, as introduced by the author in [Theoret. Comput. Sci.290 (2003) 1679–1711], can be generalized from the case of string languages towards tree languages. This provides a large source of identifiable classes of regular tree languages. Each of these classes can be characterized in various ways. Moreover, we present a generic inference algorithm with polynomial update time and prove its correctness. In this way, we generalize previous works of Angluin, Sakakibara and ourselves. Moreover, we show that this way all regular tree languages can be approximately identified. },
author = {Fernau, Henning},
journal = {RAIRO - Theoretical Informatics and Applications},
language = {eng},
month = {9},
number = {4},
pages = {351-374},
publisher = {EDP Sciences},
title = {Learning tree languages from text},
url = {http://eudml.org/doc/250053},
volume = {41},
year = {2007},
}

TY - JOUR
AU - Fernau, Henning
TI - Learning tree languages from text
JO - RAIRO - Theoretical Informatics and Applications
DA - 2007/9//
PB - EDP Sciences
VL - 41
IS - 4
SP - 351
EP - 374
AB - We study the problem of learning regular tree languages from text. We show that the framework of function distinguishability, as introduced by the author in [Theoret. Comput. Sci.290 (2003) 1679–1711], can be generalized from the case of string languages towards tree languages. This provides a large source of identifiable classes of regular tree languages. Each of these classes can be characterized in various ways. Moreover, we present a generic inference algorithm with polynomial update time and prove its correctness. In this way, we generalize previous works of Angluin, Sakakibara and ourselves. Moreover, we show that this way all regular tree languages can be approximately identified.
LA - eng
UR - http://eudml.org/doc/250053
ER -

References

top
  1. H. Ahonen, H. Mannila and E. Nikunen, Forming grammars for structured documents: an application of grammatical inference, in Proceedings of the Second International Colloquium on Grammatical Inference (ICGI-94): Grammatical Inference and Applications, edited by R.C. Carrasco and J. Oncina. Lect. Notes Artif. Int.862 (1994) 153–167.  
  2. D. Angluin, Inductive inference of formal languages from positive data. Inform. Comput.45 (1980) 117–135.  Zbl0459.68051
  3. D. Angluin, Inference of reversible languages. J. ACM29 (1982) 741–765.  Zbl0485.68066
  4. D. Angluin, Learning regular sets from queries and counterexamples. Inform. Comput.75 (1987) 87–106.  Zbl0636.68112
  5. M. Bernard and C. de la Higuera, GIFT: grammatical inference for terms, in Conférence d'Apprentissage, Palaiseau, May 1999. English version: Late breaking paper of International Conference on Inductive Logic Programming; French journal version: Apprentissage de programmes logiques par inférence grammaticale. Revue d'Intelligence Artificielle14 (2001) 375–396.  
  6. J. Besombes and J.-Y. Marion, Identification of reversible dependency tree languages, in Proc. 3rd Workshop on Learning Languages in Logic LLL'01, edited by L. Popelínský and M. Nepil, 11–22. Technical report FIMU-RS-2001-08, FI MU Brno, Czech Republic, see , September 2001.  URIhttp://www.fi.muni.cz/ilpnet2/LLL2001/#proceedings
  7. J. Besombes and J.-Y. Marion, Learning tree languages from positive examples and membership queries, in Algorithmic Learning Theory, 15th International Conference, ALT, edited by S. Ben-David, J. Case and A. Maruoka. Lect. Notes Comput. Sci.3244 (2004) 440–453.  Zbl1110.68390
  8. H. Boström, Theory-guided induction of logic programs by inference of regular languages. in Proc. of the 13th International Conference on Machine Learning, Morgan Kaufmann, (1996) 46–53.  
  9. R.C. Carrasco, J. Oncina and J. Calera-Rubio, Stochastic inference of regular tree languages. Mach. Learn.44 (2001) 185–197.  Zbl0983.68106
  10. R. C. Carrasco and J. R. Rico-Juan, A similarity between probabilistic tree languages: Application to XML document families. Pattern Recogn.36 (2003) 2197–2199.  Zbl1047.68550
  11. M. Ceresna and G. Gottlob, Query based learning of XPath fragments, in Proceedings of Dagstuhl Seminar on Machine Learning for the Semantic Web (05071), Dagstuhl, Germany, (2005).  
  12. S. Crespi-Reghizzi, M.A. Melkanoff and L. Lichten, The use of grammatical inference for designing programming languages. Comm. ACM16 (1972) 83–90.  Zbl0248.68037
  13. E. Cypher, D.C. Halbert, D. Kurlander, H. Liebermanand D. Maulsby, B.A. Myers and A. Turransky, Eds. Watch What I Do: Programming by Demonstration. MIT Press, 1993. Available online at  URIhttp://www.acypher.com/wwid/WWIDToC.html
  14. A. Dix and A. Patrick, Query by browsing. in Proceedings of IDS'94: The 2nd International Workshop on User Interfaces to Databases, edited by P. Sawyer, Springer (1994) 236–248. HTML version:  URIhttp://www.comp.lancs.ac.uk/computing/users/dixa/papers/QbB-IDS94/
  15. F. Drewes, Grammatical Picture Generation – A Tree-Based Approach. Texts in Theoretical Computer Science. An EATCS Series. Springer, (2006). More information can be found on the web page  URIhttp://www.cs.umu.se/~drewes/picgen/index.html
  16. F. Drewes and J. Högberg, Learning a regular tree language from a teacher. In Proc. Developments in Language Theory DLT 2003, edited by Z. Ésik and Z. Fülöp. Lect. Notes Comput. Sci.2710 (2003) 279–291.  Zbl1037.68082
  17. F. Drewes and J. Högberg, Query learning of regular tree languages: How to avoid dead states. Theor. Comput. Syst. (2006). To appear.  Zbl1107.68049
  18. L.F. Fass, Learning context-free languages from their structured sentences. SIGACT News, 15 (1983) 24–35.  Zbl0528.68051
  19. H. Fernau, k-gram extensions of terminal distinguishable languages. In International Conference on Pattern Recognition (ICPR 2000). IEEE/IAPR2 (2000) 125–128.  
  20. H. Fernau, Learning XML grammars. in Machine Learning and Data Mining in Pattern Recognition MLDM'01, edited by P. Perner. Lect. Notes Artif. Int.2123 (2001) 73–87.  Zbl0997.68607
  21. H. Fernau, Learning tree languages from text. in Computational Learning Theory COLT 2002, edited by J. Kivinen and R.H. Sloan. Lect. Notes Artif. Int.2375 (2002) 153–168.  Zbl1050.68060
  22. H. Fernau, Identification of function distinguishable languages. Theoret. Comput. Sci.290 (2003) 1679–1711.  Zbl1051.68092
  23. H. Fernau, Identifying terminal distinguishable languages. Ann. Math. Artif. Intell.40 (2004) 263–281.  Zbl1081.68035
  24. H. Fernau and C. de la Higuera, Grammar induction: An invitation to formal language theorists. GRAMMARS7 (2004) 45–55.  
  25. H. Fernau and A. Radl, Algorithms for learning function distinguishable regular languages. in Structural, Syntactic, and Statistical Pattern Recognition SSPR and SPR 2002, edited by T. Caelli, A. Amin, R.P.W. Duin, M. Kamel and D. de Ridder. Lect. Notes Comput. Sci.2396 (2002) 64–72.  
  26. H. Fernau and J.M. Sempere, Permutations and control sets for learning non-regular language families. in Grammatical Inference: Algorithms and Applications, 5th International Colloquium (ICGI 2000), edited by A.L. Oliveira. Lect. Notes Artif. Int., 1891 (2000) 75–88.  Zbl0974.68089
  27. C.C. Florêncio, Consistent identification in the limit of any of the classes of k-valued is NP-hard. in Logical Aspects of Computational Linguistics LACL'01, edited by P. de Groote, G. Morrill and C. Retoré. Lect. Notes Artif. Int.2099 (2001) 125–138.  
  28. H. Fukuda and K. Kamata, Inference of tree automata from sample set of trees. Int. J. Comput. Inform. Sci.13 (1984) 177–196.  Zbl0566.68064
  29. P. García, Learning k-testable tree sets from positive data. Technical Report DSIC/II/46/1993, Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, (1993).  URIhttp://www.dsic.upv.es/users/tlcc/tlcc.html
  30. P. García and J. Oncina, Inference of recognizable tree sets. Technical Report DSIC-II/47/93, Departamento de Sistemas Informáticos y Computación (1993).  
  31. M. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri and K. Shim, XTRACT: learning document type descriptors from XML document collections. Data Min. Knowl. Discov.7 (2003) 23–56.  
  32. E.M. Gold, Language identification in the limit. Inform. Comput.10 (1967) 447–474.  Zbl0259.68032
  33. R.C. Gonzalez and M.G. Thomason, Syntactic Pattern Recognition; An Introduction. Addison-Wesley (1978).  Zbl0383.68065
  34. J. Gregor, Data-driven inductive inference of finite-state automata. Int. J. Pattern Recogn.8 (1994) 305–322.  
  35. C. de la Higuera, A bibliographical study of grammatical inference. Pattern Recogn.38 (2005) 1332–1348.  
  36. C. de la Higuera, Current trends in grammatical inference. in Advances in Pattern Recognition, Joint IAPR International Workshops SSPR+SPR'2000, edited by F.J. Ferri et al.Lect. Notes Comput. Sci.1876 (2000) 28–31.  
  37. E. Jeong and C.-H. Hsu, Induction of integrated view for XML data with heterogenous DTDs. In Proceedings of the 10th International Conference on Information and Knowledge Management CIKM, ACM, (2001) 151–158.  
  38. M. Kanazawa, Learnable Classes of Categorial Grammars. Ph.D., CSLI (1998).  Zbl0937.68130
  39. B.B. Kimia, A.R. Tannenbaum and S.W. Zucker, Shapes, shocks, and deformations, I. Int. J. Comput. Vision15 (1995) 189–224.  
  40. B. Knobe and K. Knobe, A method for inferring context-free grammars. Inform. Comput.31 (1976) 129–146.  Zbl0333.68065
  41. T. Knuutila, Inference of k-testable tree languages, in Proc. IAPR Workshop on Structural and Syntactical Pattern Recognition. World Scientific (1992) 109–120.  
  42. T. Knuutila, How to invent characterizable methods for regular languages. in 4th Workshop on Algorithmic Learning Theory ALT'93, edited by K.P. Jantke et al.Lect. Notes Artif. Int.744 (1993) 209–222.  Zbl0925.68365
  43. T. Knuutila and M. Steinby, The inference of tree languages from finite samples: an algebraic approach. Theoret. Comput. Sci.129 (1994) 337–367.  Zbl0822.68055
  44. S. Kobayashi and T. Yokomori, Learning approximately regular languages with reversible languages. Theoret. Comput. Sci.174 (1997) 251–257.  Zbl0908.68146
  45. D. Kozen, On the Myhill-Nerode theorem for trees. EATCS Bull.47 (1992) 170–173.  Zbl0757.68083
  46. H. Lieberman, editor. Your Wish is My Command: Programming by Example. Morgan Kaufmann (2001).  
  47. D. López and S. España, Error correcting tree language inference. Pattern Recogn. Lett.23 (2002) 1–12.  Zbl0993.68057
  48. D. López and I. Piñaga, Syntactic pattern recognition by error correcting analysis on tree automata, in Advances in Pattern Recognition, Joint IAPR International Workshops SSPR+SPR'2000, edited by F.J. Ferri et al.Lect. Notes Comput. Sci.1876 (2000) 133–142.  Zbl0996.68532
  49. D. López, J.M. Sempere and P. García, Error correcting analysis for tree languages. Int. J. Pattern Recogn.14 (2000) 357–368.  
  50. D. López, J.M. Sempere and P. García, Inference of reversible tree languages. IEEE T. Syst. Man Cy.34 (2004) 1658–1665.  
  51. H.R. Lu and K.S. Fu, Error-correcting tree automata for syntactic pattern recognition. IEEE T. Comput.27 (1978) 1040–1053.  Zbl0392.68076
  52. S. Matsumoto and T. Shoudai, Learning of ordered tree languages with height-bounded variables using queries. in Algorithmic Learning Theory, 15th International Conference, ALT, edited by S. Ben-David, J. Case and A. Maruoka. Lect. Notes Comput. Sci.3244 (2004) 425–439.  Zbl1110.68402
  53. F. Neven, Automata, logic, and XML. In Computer Science Logic; 16th International Workshop, CSL 2002, edited by J. Bradfield. Lect. Notes Comput. Sci.2471 (2002) 2–26.  
  54. V. Radhakrishnan and G. Nagaraja, Inference of regular grammars via skeletons. IEEE T. Syst. Man Cy.17 (1987) 982–992.  
  55. V. Radhakrishnan and G. Nagaraja, Inference of even linear grammars and its application to picture description languages. Pattern Recogn.21 (1988) 55–62.  Zbl0653.68077
  56. J.R. Rico-Juan, J. Calera-Rubio and R.C. Carrasco, Probabilistic k-testable tree languages. in Grammatical Inference: Algorithms and Applications, 5th International Colloquium (ICGI 2000), edited by A.L. Oliveira. Lect. Notes Artif. Int.1891 (2000) 221–228.  Zbl0974.68514
  57. J.R. Rico-Juan, J. Calera-Rubio and R.C. Carrasco, Stochastic k-testable tree languages and applications, in Grammatical Inference: Algorithms and Applications, 6th International Colloquium, ICGI 2002, edited by P. Adriaans, H. Fernau and M. van Zaanen. Lect. Notes Artif. Int.2484 (2002) 199–212.  Zbl1028.68608
  58. G. Rozenberg and A. Salomaa, eds. Handbook of Formal Languages, Volume III. Springer, Berlin (1997).  
  59. Y. Sakakibara, Learning context-free grammars from structural data in polynomial time. Theoret. Comput. Sci.76 (1990) 223–242.  Zbl0704.68067
  60. Y. Sakakibara, Efficient learning of context-free grammars from positive structural examples. Inform. Comput.97 (1992) 23–60.  Zbl0764.68146
  61. Y. Sakakibara and H. Muramatsu, Learning context-free grammars from partially structured examples, in Grammatical Inference: Algorithms and Applications, 5th International Colloquium (ICGI 2000), edited by A.L. Oliveira. Lect. Notes Artif. Int.1891 (2000) 229–240.  Zbl0974.68530
  62. J.M. Sempere and G. Nagaraja, Learning a subclass of linear languages from positive structural information. in Proceedings of the Fourth International Colloquium on Grammatical Inference (ICGI-98), edited by V. Honavar and G. Slutski. Lect. Notes Artif. Int.1433 (1998) 162–174.  
  63. K. Siddiqi, A. Shakoufandeh, S. Dickinson and S. Zucker, Shock graphs and shape matching. Int. J. Comput. Vision30 (1999) 1–24.  
  64. B. Starkie, Developing Spoken Dialog Systems using Grammatical Inference. Ph.D. Thesis, The University of Newcastle (AUS) (2005).  
  65. B. Starkie and H. Fernau, The Boisdale algorithm — an induction method for a subclass of unification grammar from positive data, in Grammatical Inference: Algorithms and Applications; 7th International Colloquium ICGI, edited by G. Paliouras and Y. Sakakibara. Lect. Notes Artif. Int.3264 (2004) 235–247.  Zbl1111.68481
  66. Y. Takada and T.Y. Nishida, A note on grammatical inference of slender context-free languages, in Proceedings of the Third International Colloquium on Grammatical Inference (ICGI-96): Learning Syntax from Sentences, edited by L. Miclet and C. de la Higuera. Lect. Notes Artif. Int.1147 (1996) 117–125.  
  67. E. Tanaka and K.S. Fu, Error-correcting parsers for formal languages. IEEE T. Comput.27 (1978) 605–616.  Zbl0379.68050
  68. J.L. Verdú-Mas, M.L. Forcada, R.C. Carrasco and J. Calera-Rubio, Tree k-grammar models for natural language modelling and parsing, in Structural, Syntactic, and Statistical Pattern Recognition SSPR and SPR 2002, edited by T. Caelli, A. Amin, R.P.W. Duin, M. Kamel and D. de Ridder. Lect. Notes Comput. Sci.2396 (2002) 56–63.  Zbl1073.68842
  69. J.L. Verdú-Mas, R.C. Carrasco and J. Calera-Rubio, Parsing with probabilistic strictly locally testable tree languages. IEEE T. Pattern Anal., 27 (2005) 1040–1050.  
  70. H. Volger, Grammars with generalized contextfree rules and their tree automata. in Proceedings of CLIN '99; Selected Papers (1999) 223–233, see  URIhttp://www-uilots.let.uu.nl/publications/clin1999/papers.html
  71. T. Yokomori, Inductive inference of context-free languages based on context-free expressions. Int. J. Comput. Math.24 (1988) 115–140.  Zbl0658.68095
  72. T. Yokomori, On learning systolic languages. in Proceedings of the 3rd Workshop on Algorithmic Learning Theory (ALT '92), edited by K.P. Jantke, S. Doshita, K. Furukawa and T. Nishida. Lect. Notes Artif. Int.743 (1992) 41–52.  Zbl0925.68368
  73. T. Yokomori, Polynomial-time identification of very simple grammars from positive data. Theoret. Comput. Sci.298 (2003) 179–206.  Zbl1038.68074

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.