Please note that all dates are tentative. My primary concern is that you master the material and the schedule may be adjusted in either direction to optimize comprehension and scope of material.
There is a special election this fall. If you are a US citizen and have not registered to vote, please consider doing so now. If you are a US citizen and have not registered to vote, please consider doing so now. California voters can do so online: https://covr.sos.ca.gov/. Democracies only work when the public participates.
Readings are given in parentheses. Chapter numbers refer to your text (Goodfellow et al., 2016), other materials have links or are provided in Canvas. Remember that the text is available online in addition to print.
- Aug 24 – Introduction and elementary statistics, framing, basic acoustics.
Readings: statistics 3-3.9, framing (ch. 9 – 9.3.2, Jurafsky and Martin, 2009), pressure, intensity, spectrograms of Rhode Island Graduate School of Oceanography and Marine Acoustics (2017). Online video: Brief introduction to Python.
August 30th is the Last day to register to vote in California recall election
- Aug 31 – Brief introduction to neural networks (high-level introduction to 6 and Tensorflow, we will cover these topics in detail later)
- Sep 7 – Machine learning concepts (5)
- Sep 14 – Deep feed-forward networks (6)
⭐Election Day Tuesday September 14th (California recall election). Vote if you are eligible.
- Sep 21 – Regularization (7-7.5) Paper: Dropout (Srivastava et al., 2014)
- Sep 28 – Optimization (8-8.3.2)
- Oct 5 – Speech perception (2.4 of Rabiner and Juang, 1993)
- Oct 12 – Sequence modeling (10-10.2.2, 10.10)
- Oct 19 – Practical sequence modeling,
- Oct 26 – Convolutional networks (9-9.3), Manifold learning (5.11
- Nov 2 – Manifolds continued.
- Nov 9 – Exam Tue Nov 9. Final project introduction and selection. Paper: Time-domain audio processing (Ravanelli and Bengio, 2018), no class on Veteran’s Day: Thu Nov 11
- Nov 16 – Professor Roch at conference (work on project)
- Nov 23 – Papers: Deformable convolutions (Dai et al., 2017), Harmonic Conv networks (Zhang et al. 2020), Multi-target learning, no class on Thanksgiving: Thu Nov 25
- Nov 30 – In-progress presentations for project groups
- Dec 7 – Language models
Final projects are due Monday December 13th by 10:30 PM.
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., and Wei, Y. (2017). “Deformable Convolutional Networks,” in 2017 IEEE International Conference on Computer Vision (ICCV) (pp. 764-773.
Gale, W. A., and Sampson, G. (1995). “Good‐turing frequency estimation without tears,” Journal of Quantitative Linguistics 2(3). 217-237.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning (The MIT Press, Cambridge, Massachusetts), pp. xxii, 775 pages
Jurafsky, D., and Martin, J. H. (2009). Speech and Language Processing (Pearson Prentice Hall, Upper Saddle River, NJ)
Rabiner, L. R., and Juang, B.-H. (1993). Fundamentals of speech recognition (Prentice-Hall, Englewood Cliffs, NJ 07632)
Ravanelli, M., and Bengio, Y. (2018). “Speaker Recognition from Raw Waveform with SincNet,” in 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 1021-1028.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” J Mach Learn Res 15. 1929-1958.
Univ. of Rhode Island Graduate School of Oceanography, and Marine Acoustics, I. (2017). “Discovery of Sound in the Sea,” Accessed August 1, 2017. http://dosits.org.
Zhang, Z., Wang, Y., Gan, C., Wu, J., Tenenbaum, J. B., Torralba, A., and Freeman, W. T. (2020). “Deep Audio Priors Emerge from Harmonic Convolutional Networks,” in Intl. Conf. Learn. Repr. (ICLR) (virtual), p. 12.
Since the introduction of deep learning into speech processing by Dahl et al. (2010) and Deng et al. (2010) the field has changed rapidly and deep learning is now the dominant method used in speech processing applications. As textbooks on deep learning are just starting to come out, we will be using a deep learning book supplemented with readings on speech.
- Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning (The MIT Press, Cambridge, Massachusetts), pp. xxii, 775 pages. Availble in print at bookstore or freely available online.
Programming exercises will be implemented using Python 3.9 and Tensorflow 2.6. While we will briefly introduce Python in class, we will not be devoting much time to how to program in Python as computer science graduate students (and advanced undergraduates) should be able to pick up new languages fairly easily. Optional books to help you with this are freely available from SDSU Library through Safari Technical Books:
- Martelli, A., Ravenscroft, A., and Holden, S. (2017). Python in a Nutshell, 3rd Edition (O’Reilly Media, Inc, Sebastapol, CA) or
- Reitz, K. (2016). The Hitchhiker’s Guide to Python: Best Practices for Development (O’Reilly Media, Sebastopol)
In addition, the python.org’s tutorial is also quite good and I have put a short video for you on Canvas.
Python is rapidly becoming one of the most popular languages for machine learning. This is primarily due to a large number of scientific libraries such as NumPy and SciPy coupled with popular machine learning language libraries such as Theano, TensorFlow, and PyTorch as well as higher-level interfaces such as keras. In this class, we will use NumPy, SciPy, keras, and TensorFlow. Anaconda (Austin, TX) is a company that offers a distribution that makes installing this large collection of libraries easier. Anaconda and the appropriate libraries have been installed on the Windows machines in the Department lab (GMCS 425). If you wish to install Anaconda on your own, please follow these instructions. The instructions also show you how to run a program using eclipse (preferred) or the Spyder IDE once Anaconda is installed.
About the course:
You will master machine learning and signal processing skills. We will apply this to recognizing speech and speaker identity, but many of the skills that you will acquire are useful in many contexts such as finance, bioinformatics, control systems, etc.
Upon successful completion of this class, students should be able to:
- Understand feature extraction including automatic discovery of features.
- Have an understanding of human speech production and perception.
- Apply machine learning techniques to a variety of problems including those that require recognizing sequences.
- Be able to write a scientific paper.
- Be well-equipped to understand readings in the speech technologies literature
The prerequisites for this course are: Computer Science 310, Mathematics 254, and Statistics 551A. As many CS students will not have taken Statistics 551A or linear algebra 254, this will be waived for any student who is willing to spend a bit of time learning the statistics, the basics of which will be covered briefly in class.
Please see syllabus for detailed course policies.