Towards spike-based speech processing: A biologically plausible approach to simple acoustic classification

Ismail Uysal; Harsha Sathyendra; John G. Harris

International Journal of Applied Mathematics and Computer Science (2008)

  • Volume: 18, Issue: 2, page 129-137
  • ISSN: 1641-876X

Abstract

top
Shortcomings of automatic speech recognition (ASR) applications are becoming more evident as they are more widely used in real life. The inherent non-stationarity associated with the timing of speech signals as well as the dynamical changes in the environment make the ensuing analysis and recognition extremely difficult. Researchers often turn to biology seeking clues to make better engineered systems, and ASR is no exception with the usage of feature sets such as Mel frequency cepstral coefficients, which employ filter banks similar to cochlear filter banks in frequency distribution and bandwidth. In this paper, we delve deeper into the mechanics of the human auditory system to take this biological inspiration to the next level. The main goal of this research is to investigate the computation potential of spike trains produced at the early stages of the auditory system for a simple acoustic classification task. First, various spike coding schemes from temporal to rate coding are explored, together with various spike-based encoders with various simplicity levels such as rank order coding and liquid state machine. Based on these findings, a biologically plausible system architecture is proposed for the recognition of phonetically simple acoustic signals which makes exclusive use of spikes for computation. The performance tests show superior performance on a noisy vowel data set when compared with a conventional ASR system.

How to cite

top

Ismail Uysal, Harsha Sathyendra, and John G. Harris. "Towards spike-based speech processing: A biologically plausible approach to simple acoustic classification." International Journal of Applied Mathematics and Computer Science 18.2 (2008): 129-137. <http://eudml.org/doc/207871>.

@article{IsmailUysal2008,
abstract = {Shortcomings of automatic speech recognition (ASR) applications are becoming more evident as they are more widely used in real life. The inherent non-stationarity associated with the timing of speech signals as well as the dynamical changes in the environment make the ensuing analysis and recognition extremely difficult. Researchers often turn to biology seeking clues to make better engineered systems, and ASR is no exception with the usage of feature sets such as Mel frequency cepstral coefficients, which employ filter banks similar to cochlear filter banks in frequency distribution and bandwidth. In this paper, we delve deeper into the mechanics of the human auditory system to take this biological inspiration to the next level. The main goal of this research is to investigate the computation potential of spike trains produced at the early stages of the auditory system for a simple acoustic classification task. First, various spike coding schemes from temporal to rate coding are explored, together with various spike-based encoders with various simplicity levels such as rank order coding and liquid state machine. Based on these findings, a biologically plausible system architecture is proposed for the recognition of phonetically simple acoustic signals which makes exclusive use of spikes for computation. The performance tests show superior performance on a noisy vowel data set when compared with a conventional ASR system.},
author = {Ismail Uysal, Harsha Sathyendra, John G. Harris},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {spike coding; synchrony coding; phase locking; speech perception; psychoacoustics; speech recognition},
language = {eng},
number = {2},
pages = {129-137},
title = {Towards spike-based speech processing: A biologically plausible approach to simple acoustic classification},
url = {http://eudml.org/doc/207871},
volume = {18},
year = {2008},
}

TY - JOUR
AU - Ismail Uysal
AU - Harsha Sathyendra
AU - John G. Harris
TI - Towards spike-based speech processing: A biologically plausible approach to simple acoustic classification
JO - International Journal of Applied Mathematics and Computer Science
PY - 2008
VL - 18
IS - 2
SP - 129
EP - 137
AB - Shortcomings of automatic speech recognition (ASR) applications are becoming more evident as they are more widely used in real life. The inherent non-stationarity associated with the timing of speech signals as well as the dynamical changes in the environment make the ensuing analysis and recognition extremely difficult. Researchers often turn to biology seeking clues to make better engineered systems, and ASR is no exception with the usage of feature sets such as Mel frequency cepstral coefficients, which employ filter banks similar to cochlear filter banks in frequency distribution and bandwidth. In this paper, we delve deeper into the mechanics of the human auditory system to take this biological inspiration to the next level. The main goal of this research is to investigate the computation potential of spike trains produced at the early stages of the auditory system for a simple acoustic classification task. First, various spike coding schemes from temporal to rate coding are explored, together with various spike-based encoders with various simplicity levels such as rank order coding and liquid state machine. Based on these findings, a biologically plausible system architecture is proposed for the recognition of phonetically simple acoustic signals which makes exclusive use of spikes for computation. The performance tests show superior performance on a noisy vowel data set when compared with a conventional ASR system.
LA - eng
KW - spike coding; synchrony coding; phase locking; speech perception; psychoacoustics; speech recognition
UR - http://eudml.org/doc/207871
ER -

References

top
  1. Atal B. S. and Hanauer S. L. (1971). Speech analysis and synthesis by linear prediction, Journal of the Acoustical Society of America 50(2B): 637-655. 
  2. Davis S. B. and Mermelstein P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, Signal Processing 28(4): 357-366. 
  3. Dayan P. and Abbott L. F. (2001). Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems, MIT Press, Cambridge, MA. Zbl1051.92010
  4. Delorme A. and Thorpe S. J. (2001). Face identification using one spike per neuron: resistance to image degradations, Neural Networks 14(7): 795-803. 
  5. Hopfield J. J. and Brody C. D. (2001). What is a moment? Transient synchrony as a collective mechanism for spatiotemporal integration, Proceedings of the National Academy of Sciences USA 98(3): 1282-1287. 
  6. Jaeger H. (2001). The “echo state” approach to analysing and training recurrent neural networks, Technical Report GMD Report 148, German National Research Center for Information Technology. 
  7. Maass W. Natschlager T. and Markram H. (2002). Real-time computing without stable states: A new framework for neural computation based on perturbations, Neural Computation 14(11): 2531-2560. Zbl1057.68618
  8. Markram H., Lubke J., Frotscher M. and Sakmann B. (1997). Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs, Science 275(5297): 213-215. 
  9. Meddis R. (1986). Simulation of mechanical to neural transduction in the auditory receptor, Journal of the Acoustical Society of America 79(3): 702-711. 
  10. Moissl U. and Meyer-Base U. (2000). A comparison of different methods to assess phase-locking in auditory neurons, International Conference of IEEE-EMBS on Information Technology Applications in Biomedicine, Vol. 2, Arlington, USA, pp. 840-843. 
  11. Rieke F., Warland D., de Ruyter can Steveninck R. and Bialek W. (1999). Spikes - Exploring the Neural Code, MIT Press, Cambridge, MA. Zbl0912.92004
  12. Rullen R. V., Gautrais J., Delorme A. and Thorpe S. J. (1998). Face processing using one spike per neuron, Biosystems 48(1-3): 229-239. 
  13. Rullen R. V., Guyonneau R. and Thorpe S. J. (2005). Spike times make sense, Trends in Neurosciences 28(1): 1-4. 
  14. Sachs M. B. (1984). Neural coding of complex sounds: Speech, Annual Review of Physiology 46: 261-273. 
  15. Skowronski M. D. and Harris J. G. (2004). Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition, Journal of the Acoustical Society of America 116(3): 1774-1780. 
  16. Sumner C. J. and Lopez-Poveda E. A. (2002). A revised model of the inner-hair cell and auditory-nerve complex, Journal of the Acoustical Society of America 111(5): 2178-2188. 
  17. Sumner C. J., Lopez-Poveda E. A., O'Mard L. P. and Meddis R. (2003). Adaptation in a revised inner-hair cell model, Journal of the Acoustical Society of America 113(2): 893-901. 
  18. Terman D. and Wang D. (1995). Global competition and local cooperation in a network of neural oscillators, Physica D. 81(1-2): 148-176. Zbl0882.68153
  19. Thorpe S. J. and Gautrais J. (1998). Rank order coding, in J. Bower (ed.), Computational Neuroscience: Trends in Research, New York: Plenum Press, pp. 113-119. 
  20. Uysal I., Sathyendra H. and Harris J. G. (2006). A biologically plausible system approach for noise robust vowel recognition, Proceedings of the IEEE Midwest Symposium on Circuits and Systems, Vol. 1, San Juan, Puerto Rico, pp. 245-249. 
  21. Uysal I., Sathyendra H. and Harris J. G. (2007a). A duplex theory of spike coding in the early stages of the auditory system, Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, Vol. 4, Honolulu, USA, pp. 733-736. 
  22. Uysal I., Sathyendra H. and Harris J. G. (2007b). Spike-based feature extraction for noise robust speech recognition using phase synchrony coding, Proceedings of the IEEE International Symposiom on Circuits and Systems, New Orleans, USA, pp. 1529-1532. 
  23. VanRullen R., Guyonneau R. and Thorpe S. J. (2005). Spike times make sense, Trends in Neurosciences 28(1): 1-4. 
  24. Verstraeten D., Schrauwen B., Stroobandt D. and Campenhout J. V. (2005). Isolated word recognition with the liquid state machine: A case study, Information Processing Letters 95(6): 521-528. Zbl1184.68257

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.