Learning deterministic regular grammars from stochastic samples in polynomial time

Rafael C. Carrasco; Jose Oncina

RAIRO - Theoretical Informatics and Applications (2010)

  • Volume: 33, Issue: 1, page 1-19
  • ISSN: 0988-3754

Abstract

top
In this paper, the identification of stochastic regular languages is addressed. For this purpose, we propose a class of algorithms which allow for the identification of the structure of the minimal stochastic automaton generating the language. It is shown that the time needed grows only linearly with the size of the sample set and a measure of the complexity of the task is provided. Experimentally, our implementation proves very fast for application purposes.

How to cite

top

Carrasco, Rafael C., and Oncina, Jose. "Learning deterministic regular grammars from stochastic samples in polynomial time." RAIRO - Theoretical Informatics and Applications 33.1 (2010): 1-19. <http://eudml.org/doc/221999>.

@article{Carrasco2010,
abstract = { In this paper, the identification of stochastic regular languages is addressed. For this purpose, we propose a class of algorithms which allow for the identification of the structure of the minimal stochastic automaton generating the language. It is shown that the time needed grows only linearly with the size of the sample set and a measure of the complexity of the task is provided. Experimentally, our implementation proves very fast for application purposes. },
author = {Carrasco, Rafael C., Oncina, Jose},
journal = {RAIRO - Theoretical Informatics and Applications},
keywords = {stochastic regular languages},
language = {eng},
month = {3},
number = {1},
pages = {1-19},
publisher = {EDP Sciences},
title = {Learning deterministic regular grammars from stochastic samples in polynomial time},
url = {http://eudml.org/doc/221999},
volume = {33},
year = {2010},
}

TY - JOUR
AU - Carrasco, Rafael C.
AU - Oncina, Jose
TI - Learning deterministic regular grammars from stochastic samples in polynomial time
JO - RAIRO - Theoretical Informatics and Applications
DA - 2010/3//
PB - EDP Sciences
VL - 33
IS - 1
SP - 1
EP - 19
AB - In this paper, the identification of stochastic regular languages is addressed. For this purpose, we propose a class of algorithms which allow for the identification of the structure of the minimal stochastic automaton generating the language. It is shown that the time needed grows only linearly with the size of the sample set and a measure of the complexity of the task is provided. Experimentally, our implementation proves very fast for application purposes.
LA - eng
KW - stochastic regular languages
UR - http://eudml.org/doc/221999
ER -

References

top
  1. D. Angluin, Identifying languages from stochastic examples. Internal Report YALEU/DCS/RR-614 (1988).  
  2. M. Anthony and N. Biggs, Computational learning theory. Cambridge University Press, Cambridge (1992).  
  3. R.C. Carrasco and J. Oncina, Learning stochastic regular grammars by means of a state merging method, in Grammatical Inference and Applications, R.C. Carrasco and J. Oncina Eds., Springer-Verlag, Berlin, Lecture Notes in Artificial Intelligence862 (1994).  
  4. M.A. Casta no, F. Casacuberta and E. Vidal, Simulation of stochastic regular grammars through simple recurrent networks, in New Trends in Neural Computation, J. Mira, J. Cabestany and A. Prieto Eds., Springer Verlag, Lecture Notes in Computer Science686 (1993) 210-215.  
  5. T.M. Cover and J.A. Thomas, Elements of information theory. John Wiley and Sons, New York (1991).  
  6. W. Feller, An introduction to probability theory and its applications, John Wiley and Sons, New York (1950).  
  7. K.S. Fu, Syntactic pattern recognition and applications, Prentice Hall, Englewood Cliffs, N.J. (1982).  
  8. C.L. Giles, C.B. Miller, D. Chen, H.H. Chen, G.Z. Sun and Y.C. Lee, Learning and extracting finite state automata with second order recurrent neural networks. Neural Computation 4 (1992) 393-405.  
  9. E.M. Gold, Language identification in the limit. Inform. and Control10 (1967) 447-474.  
  10. W. Hoeffding, Probability inequalities for sums of bounded random variables. Amer. Statist. Association J. 58 (1963) 13-30.  
  11. J.E. Hopcroft and J.D. Ullman, Introduction to automata theory, languages and computation, Addison Wesley, Reading, Massachusetts (1979).  
  12. K. Lang, Random DFA's can be approximately learned from sparse uniform examples, in Proc. of the 5th Annual ACM Workshop on Computational Learning Theory (1992).  
  13. F.J. Maryanski and T.L. Booth, Inference of finite-state probabilistic grammars. IEEE Trans. Comput.C26 (1997) 521-536.  
  14. J. Oncina and P. García, Inferring regular languages in polynomial time, in Pattern Recognition and Image Analysis, N. Pérez de la Blanca, A. Sanfeliu and E. Vidal Eds., World Scientific (1992).  
  15. J.B. Pollack, The induction of dynamical recognizers. Machine Learning 7 (1991) 227-252.  
  16. A.S. Reber, Implicit learning of artificial grammars. J. Verbal Learning and Verbal Behaviour 6 (1967) 855-863.  
  17. D. Ron, Y. Singer and N. Tishby, On the learnability and usage of acyclic probabilistic finite automata, in Proc. of the 8th Annual Conference on Computational Learning Theory (COLT'95), ACM Press, New York (1995) 31-40.  
  18. A.W. Smith and D. Zipser, Learning sequential structure with the real-time recurrent learning algorithm. Internat. J. Neural Systems 1 (1989) 125-131.  
  19. A. Stolcke and S. Omohundro, Hidden Markov model induction by Bayesian model merging, in Advances in Neural Information Processing Systems 5, C.L. Giles, S.J. Hanson and J.D. Cowan Eds., Morgan Kaufman, Menlo Park, California (1993).  
  20. A. van der Mude and A. Walker, On the inference of stochastic regular grammars. Inform. and Control38 (1978) 310-329.  
  21. R.L. Watrous and G.M. Kuhn, Induction of finite-state languages using second-order recurrent networks. Neural Computation 4 (1992) 406-414.  

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.