Please note that all dates are tentative. My primary concern is that you master the material and the schedule may be adjusted in either direction to optimize comprehension and scope of material.
This is a presidential election year. If you are a US citizen and have not registered to vote, please consider doing so now. California voters can do so online: https://covr.sos.ca.gov/. Democracies only work when the public participates.
Readings are given in parentheses. Chapter numbers refer to your text (Goodfellow et al., 2016), other materials have links or are provided in Canvas. Remember that the text is available online in addition to print.
- Aug 25 – Introduction and elementary statistics, framing, basic acoustics.
Readings: statistics 3-3.9, framing (ch. 9 – 9.3.2, Jurafsky and Martin, 2009), pressure, intensity, spectrograms of Rhode Island Graduate School of Oceanography and Marine Acoustics (2017). Online video: Brief introduction to Python.
- Sep 1 – Machine learning concepts (Sep 4 7:59 PM is add/drop deadline)
- Sep 8 – Deep feedforward networks (6-6.5.3). Note: Labor Day is Mon Sept 7
- Sep 15 – Feedforward continued
- Sep 22 – Regularization (7-7.5), Dropout (Srivastava et al., 2014)
- Sep 29 – Optimization (8-8.3.2), Speech perception (2.4, Rabiner and Juang, 1993)
- Oct 6 – Speech perception contd, Convolutional networks (9-9.3)
- Oct 13 – Convolutional networks contd., Sequence modeling (10-10.2.2, 10.10)
- Oct 20 – Sequence modeling contd.
- Oct 27 – Papers: Deformable convolutions (Dai et al., 2017), and Time-domain audio processing (Ravanelli and Bengio, 2018),
- Nov 3 ⭐Election Day ⭐ (VOTE if you are eligible)
– Harmonic convolutional networks (Zhang et al., 2020), final project discussion
- Nov 10 – Use of deep nets in automatic speech recognition (12.3) (Veteran’s Day is Wed Nov 11)
- Nov 17 – Language models (Jurafsky and Martin 12.4-12.4.4, Gale and Sampson, 1995)
- Nov 24 – Language models contd. (Thanksgiving Th Nov 26, no class)
- Dec 1 – Catch up
- Dec 7 – Project / poster presentations
Python quick intro: https://www.youtube.com/watch?v=N4mEzFDjqtA
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., and Wei, Y. (2017). “Deformable Convolutional Networks,” in 2017 IEEE International Conference on Computer Vision (ICCV) (pp. 764-773.
Gale, W. A., and Sampson, G. (1995). “Good‐turing frequency estimation without tears,” Journal of Quantitative Linguistics 2(3). 217-237.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning (The MIT Press, Cambridge, Massachusetts), pp. xxii, 775 pages
Jurafsky, D., and Martin, J. H. (2009). Speech and Language Processing (Pearson Prentice Hall, Upper Saddle River, NJ)
Rabiner, L. R., and Juang, B.-H. (1993). Fundamentals of speech recognition (Prentice-Hall, Englewood Cliffs, NJ 07632)
Ravanelli, M., and Bengio, Y. (2018). “Speaker Recognition from Raw Waveform with SincNet,” in 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 1021-1028.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” J Mach Learn Res 15. 1929-1958.
Univ. of Rhode Island Graduate School of Oceanography, and Marine Acoustics, I. (2017). “Discovery of Sound in the Sea,” Accessed August 1, 2017. http://dosits.org.
Zhang, Z., Wang, Y., Gan, C., Wu, J., Tenenbaum, J. B., Torralba, A., and Freeman, W. T. (2020). “Deep Audio Priors Emerge Ffrom Harmonic Convolutional Networks,” in Intl. Conf. Learn. Repr. (ICLR) (virtual), p. 12.
Since the introduction of deep learning into speech processing by Dahl et al. (2010) and Deng et al. (2010) the field has changed rapidly and deep learning is now the dominant method used in speech processing applications. As textbooks on deep learning are just starting to come out, we will be using a deep learning book supplemented with readings on speech.
- Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning (The MIT Press, Cambridge, Massachusetts), pp. xxii, 775 pages. Availble in print at bookstore or freely available online.
Programming exercises will be implemented using Python 3.6 and Tensorflow. While we will briefly introduce Python in class, we will not be devoting much time to how to program in Python as computer science graduate students (and advanced undergraduates) should be able to pick up new languages fairly easily. Optional books to help you with this are freely available from SDSU Library through Safari Technical Books:
- Martelli, A., Ravenscroft, A., and Holden, S. (2017). Python in a Nutshell, 3rd Edition (O’Reilly Media, Inc, Sebastapol, CA) or
- Reitz, K. (2016). The Hitchhiker’s Guide to Python: Best Practices for Development (O’Reilly Media, Sebastopol)
In addition, the python.org’s tutorial is also quite good.
Python is rapidly becoming one of the most popular languages for machine learning. This is primarily due to a large number of scientific libraries such as NumPy and SciPy coupled with popular machine learning language libraries such as Theano, TensorFlow, and PyTorch as well as higher-level interfaces such as keras. In this class, we will use NumPy, SciPy, keras, and TensorFlow. Anaconda (Austin, TX) is a company that offers a distribution that makes installing this large collection of libraries easier. Anaconda and the appropriate libraries have been installed on the Windows machines in the Department lab (GMCS 425). If you wish to install Anaconda on your own, please follow these instructions. The instructions also show you how to run a program using eclipse (preferred) or the Spyder IDE once Anaconda is installed.
About the course:
You will master machine learning and signal processing skills. We will apply this to recognizing speech and speaker identity, but many of the skills that you will acquire are useful in many contexts such as finance, bioinformatics, control systems, etc.
Upon successful completion of this class, students should be able to:
- Understand feature extraction including automatic discovery of features.
- Have an understanding of human speech production and perception.
- Apply machine learning techniques to a variety of problems including those that require recognizing sequences.
- Be able to write a scientific paper.
- Be well-equipped to understand readings in the speech technologies literature
The prerequisites for this course are: Computer Science 310, Mathematics 254, and Statistics 551A. As many CS students will not have taken Statistics 551A or linear algebra 254, this will be waived for any student who is willing to spend a bit of time learning the statistics, the basics of which will be covered briefly in class.
Please see syllabus for detailed course policies.