Assignments

Graded assignments

Assignments are submitted online and must include an affidavit.

Assignments

A1
A2
A3, guidelines on lab reports
A4
A5
A6

Due dates for assignments that require work to be turned in are posted on the calendar below. Solution keys to most problems are available on Canvas.

Use the ICAL address to add this to a personal calendar if you wish. Interested in AI? Check out the AI Club seminar blog.

Slides

Quiz & Dirty Python (lecture available on Canvas, some extra slides here: iterators, dataclasses)
Introduction, statistics, and acoustics
Brief overview of neural networks
Machine learning concepts
Deep networks
Regularization
Optimization
Speech production, table of phoneme names and sample pronunciation
Convolutional neural networks
Sequence modeling, RNNs in Keras
Manifolds, the discrete Fourier transform
Language models
Transformers

Schedule

CS 682 Fall 2023
Please note that all dates are tentative. My primary concern is that you master the material and the schedule may be adjusted in either direction to optimize comprehension and scope of material.
Readings are given in parentheses. Chapter numbers refer to your text (Goodfellow et al., 2016), other materials have links or are provided in Canvas. Remember that the text is available online in addition to print.

Week – Topic

Aug 22 – Introduction and elementary statistics, framing, basic acoustics. Readings: statistics 3-3.9, framing (ch. 9 – 9.3.2, Jurafsky and Martin, 2009), pressure, intensity, spectrograms Univ. of Rhode Island Graduate School of Oceanography and Marine Acoustics (2017). Online video: Brief introduction to Python.
Aug 29 – Brief introduction to machine learning concepts (5) with a focus on neural networks (high-level introduction (6) and keras, we will cover these topics in detail later)
Sep 5 – Brief introduction continued
Sep 12 – Deep feed-forward networks (6)
Sep 19 – Regularization (7-7.5) Paper: Dropout (Srivastava et al., 2014)
Sep 26 – Optimization (8-8.3.2)
Oct 3 – Professor Roch at conference, work on assignment
Speech perception (2.4 of Rabiner and Juang, 1993)
Oct 10 – Convolutional networks (9-9.3)
Oct 17 – Sequence modeling (10-10.2.2, 10.10)
Oct 24 – Tue, Oct 24 – midterm. Practical sequence modeling, Manifold learning (5.11)
Oct 31 – Manifolds continued.
Nov 7 – Papers: Time-domain audio processing (Ravanelli and Bengio, 2018), transformers (either Vasaswani et al. 2017 or a substitute paper/tutorial)
Nov 14– Professor Roch at conference (work on project)
Nov 21 – Papers: Deformable convolutions (Dai et al., 2017), Harmonic Conv networks (Zhang et al. 2020), Multi-target learning, no class on Thanksgiving: Thu Nov 23
Nov 28 – Paper: data2vec (Baevski 202), Language models, traditional and modern techniques (lecture slides and Irie et al. 2019)
Dec 5 – Language models continued.

Final exam is Thursday, December 14th from 3:30 – 5:30 PM. No early exams will be given.

Materials

Textbooks

Since the introduction of deep learning into speech processing by Dahl et al. (2010) and Deng et al. (2010) the field has changed rapidly and deep learning is now the dominant method used in speech processing applications. As the best textbooks on deep learning do not cover speech recognition, we will be using a deep learning book supplemented with readings on speech.

Required:

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning (The MIT Press, Cambridge, Massachusetts), pp. xxii, 775 pages. Availble in print at bookstore or freely available online.

Programming exercises will be implemented using Python 3.9 and a recent version of Tensorflow. While we will briefly introduce Python in class, we will not be devoting much time to how to program in Python as computer science graduate students (and advanced undergraduates) should be able to pick up new languages fairly easily. Optional books to help you with this are freely available from SDSU Library through Safari Technical Books:

Martelli, A., Ravenscroft, A., and Holden, S. (2017). Python in a Nutshell, 3rd Edition (O’Reilly Media, Inc, Sebastapol, CA) or
Reitz, K. (2016). The Hitchhiker’s Guide to Python: Best Practices for Development (O’Reilly Media, Sebastopol)

In addition, the python.org’s tutorial is also quite good and I have put a short video for you on Canvas.

Programming environment

Python has become one of the most popular languages for machine learning. This is primarily due to a large number of scientific libraries such as NumPy and SciPy coupled with popular machine learning language libraries such as TensorFlow and PyTorch as well as higher-level interfaces such as keras. In this class, we will use NumPy, SciPy, keras, and TensorFlow. Anaconda (Austin, TX) is a company that offers a distribution that makes installing this large collection of libraries easier. In past semesters, students have preferred to use their own machines to departmental resources. To install Anaconda on your machine, please follow these instructions. The instructions also show you how to run a program using eclipse (preferred) or the Spyder IDE once Anaconda is installed.

About

About the course:

You will master machine learning and signal processing skills. We will apply this to recognizing audio signals, but many of the skills that you will acquire are useful in many contexts such as finance, bioinformatics, control systems, etc. Upon successful completion of this class, students should be able to:

Understand feature extraction including automated discovery of features.
Have an understanding of human speech production and perception.
Solve problems related to the classification of signals using a variety of machine learning and signal processing skills, including problems with temporal dependencies.
Organize and write a scientific paper.
Read, understand, and critique current research literature.

The prerequisites for this course are: Computer Science 310, Mathematics 254, and Statistics 551A. As many CS students will not have taken Statistics 551A or linear algebra 254, this will be waived for any student who is willing to spend a bit of time learning the statistics, the basics of which will be covered briefly in class.

Please see syllabus for detailed course policies.

CS 682 Speech Processing

Assignments

Slides

Schedule

Materials

About

Assignments

Graded assignments

Assignments

Slides

Schedule

Materials

Textbooks

Programming environment

About