{"id":63,"date":"2017-07-19T13:51:41","date_gmt":"2017-07-19T13:51:41","guid":{"rendered":"http:\/\/roch.sdsu.edu\/?page_id=63"},"modified":"2023-11-28T13:20:44","modified_gmt":"2023-11-28T21:20:44","slug":"cs-682-speech-processing","status":"publish","type":"page","link":"https:\/\/roch.sdsu.edu\/index.php\/cs-682-speech-processing\/","title":{"rendered":"CS 682 Speech Processing"},"content":{"rendered":"<div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-one-full fusion-column-first fusion-column-last\" style=\"--awb-bg-blend:overlay;--awb-bg-size:cover;\"><div class=\"fusion-column-wrapper fusion-flex-column-wrapper-legacy\"><div class=\"fusion-tabs fusion-tabs-1 classic nav-is-justified horizontal-tabs icon-position-left mobile-mode-accordion\" style=\"--awb-title-border-radius-top-left:0px;--awb-title-border-radius-top-right:0px;--awb-title-border-radius-bottom-right:0px;--awb-title-border-radius-bottom-left:0px;--awb-alignment:start;--awb-inactive-color:#ebeaea;--awb-background-color:#ffffff;--awb-border-color:#ebeaea;--awb-active-border-color:#a0ce4e;\"><div class=\"nav\"><ul class=\"nav-tabs nav-justified\" role=\"tablist\" aria-orientation=\"horizontal\"><li class=\"active\" role=\"presentation\"><a class=\"tab-link\" data-toggle=\"tab\" role=\"tab\" aria-controls=\"tab-76b56fee2f76e7c46d3\" aria-selected=\"true\" tabindex=\"0\" id=\"fusion-tab-76b56fee2f76e7c46d3\" href=\"#tab-76b56fee2f76e7c46d3\"><h4 class=\"fusion-tab-heading\"><i class=\"fontawesome-icon fa-handshake far\" aria-hidden=\"true\" style=\"font-size:13px;\"><\/i>Assignments<\/h4><\/a><\/li><li  role=\"presentation\"><a class=\"tab-link\" data-toggle=\"tab\" role=\"tab\" aria-controls=\"tab-1be485de87309fac353\" aria-selected=\"false\" tabindex=\"-1\" id=\"fusion-tab-1be485de87309fac353\" href=\"#tab-1be485de87309fac353\"><h4 class=\"fusion-tab-heading\"><i class=\"fontawesome-icon fa-tv fas\" aria-hidden=\"true\" style=\"font-size:13px;\"><\/i>Slides<\/h4><\/a><\/li><li  role=\"presentation\"><a class=\"tab-link\" data-toggle=\"tab\" role=\"tab\" aria-controls=\"tab-2f5c58aefb10609c910\" aria-selected=\"false\" tabindex=\"-1\" id=\"fusion-tab-2f5c58aefb10609c910\" href=\"#tab-2f5c58aefb10609c910\"><h4 class=\"fusion-tab-heading\"><i class=\"fontawesome-icon fa-calendar-alt fas\" aria-hidden=\"true\" style=\"font-size:13px;\"><\/i>Schedule<\/h4><\/a><\/li><li  role=\"presentation\"><a class=\"tab-link\" data-toggle=\"tab\" role=\"tab\" aria-controls=\"tab-a6bb6a32350da60783f\" aria-selected=\"false\" tabindex=\"-1\" id=\"fusion-tab-a6bb6a32350da60783f\" href=\"#tab-a6bb6a32350da60783f\"><h4 class=\"fusion-tab-heading\"><i class=\"fontawesome-icon fa-book fas\" aria-hidden=\"true\" style=\"font-size:13px;\"><\/i>Materials<\/h4><\/a><\/li><li  role=\"presentation\"><a class=\"tab-link\" data-toggle=\"tab\" role=\"tab\" aria-controls=\"tab-73d88e952c80a0278a3\" aria-selected=\"false\" tabindex=\"-1\" id=\"fusion-tab-73d88e952c80a0278a3\" href=\"#tab-73d88e952c80a0278a3\"><h4 class=\"fusion-tab-heading\"><i class=\"fontawesome-icon fa-podcast fas\" aria-hidden=\"true\" style=\"font-size:13px;\"><\/i>About<\/h4><\/a><\/li><\/ul><\/div><div class=\"tab-content\"><div class=\"nav fusion-mobile-tab-nav\"><ul class=\"nav-tabs nav-justified\" role=\"tablist\" aria-orientation=\"horizontal\"><li class=\"active\" role=\"presentation\"><a class=\"tab-link\" data-toggle=\"tab\" role=\"tab\" aria-controls=\"tab-76b56fee2f76e7c46d3\" aria-selected=\"true\" tabindex=\"0\" id=\"mobile-fusion-tab-76b56fee2f76e7c46d3\" href=\"#tab-76b56fee2f76e7c46d3\"><h4 class=\"fusion-tab-heading\"><i class=\"fontawesome-icon fa-handshake far\" aria-hidden=\"true\" style=\"font-size:13px;\"><\/i>Assignments<\/h4><\/a><\/li><\/ul><\/div><div class=\"tab-pane fade fusion-clearfix in active\" role=\"tabpanel\" tabindex=\"0\" aria-labelledby=\"fusion-tab-76b56fee2f76e7c46d3\" id=\"tab-76b56fee2f76e7c46d3\">\n<h3>Graded assignments<\/h3>\n<p>Assignments are&nbsp;<a href=\"https:\/\/roch.sdsu.edu\/index.php\/submitting-work\/\">submitted online and must include an affidavit<\/a>.<\/p>\n<h3>Assignments<\/h3>\n<ul>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/assignments\/A1.pdf\">A1<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/assignments\/A2.pdf\">A2<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/assignments\/A3.pdf\">A3<\/a>, guidelines on <a href=\"https:\/\/roch.sdsu.edu\/cs682\/assignments\/LabReportInstructions.pdf\">lab reports<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/assignments\/A4.pdf\">A4<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/assignments\/A5.pdf\">A5<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/assignments\/A6.pdf\">A6<\/a><\/li>\n<\/ul>\n<h3>&nbsp;<\/h3>\n<p>Due dates for assignments that require work to be turned in are posted on the calendar below.&nbsp; Solution keys to most problems are available on Canvas.<iframe style=\"border: 0;\" src=\"https:\/\/calendar.google.com\/calendar\/embed?src=9kcdpp28qe0asru307thph4l7c%40group.calendar.google.com&amp;ctz=America\/Los_Angeles\" width=\"800\" height=\"600\" frameborder=\"0\" scrolling=\"no\"><\/iframe><\/p>\n<p>Use the <a href=\"https:\/\/calendar.google.com\/calendar\/ical\/9kcdpp28qe0asru307thph4l7c%40group.calendar.google.com\/public\/basic.ics\">ICAL address<\/a> to add this to a personal calendar if you wish.&nbsp; <em>Interested in AI?&nbsp; Check out the&nbsp;<a href=\"https:\/\/sdsuai.home.blog\/\">AI Club seminar blog<\/a>.&nbsp;&nbsp;<\/em><\/p>\n<\/div><div class=\"nav fusion-mobile-tab-nav\"><ul class=\"nav-tabs nav-justified\" role=\"tablist\" aria-orientation=\"horizontal\"><li  role=\"presentation\"><a class=\"tab-link\" data-toggle=\"tab\" role=\"tab\" aria-controls=\"tab-1be485de87309fac353\" aria-selected=\"false\" tabindex=\"-1\" id=\"mobile-fusion-tab-1be485de87309fac353\" href=\"#tab-1be485de87309fac353\"><h4 class=\"fusion-tab-heading\"><i class=\"fontawesome-icon fa-tv fas\" aria-hidden=\"true\" style=\"font-size:13px;\"><\/i>Slides<\/h4><\/a><\/li><\/ul><\/div><div class=\"tab-pane fade fusion-clearfix\" role=\"tabpanel\" tabindex=\"0\" aria-labelledby=\"fusion-tab-1be485de87309fac353\" id=\"tab-1be485de87309fac353\">\n<ol>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/slides\/00Quick-and-dirty-Python.pdf\">Quiz &amp; Dirty Python<\/a> (lecture available on Canvas, some extra slides here: iterators, dataclasses)<\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/slides\/01Introduction&amp;Statistics.pdf\">Introduction, statistics, and acoustics<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/slides\/02NeuralNetsHighLevel.pdf\">Brief overview of neural networks<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/slides\/03MachineLearningConcepts.pdf\">Machine learning concepts<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/slides\/05DeepNets.pdf\">Deep networks<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/slides\/06Regularization.pdf\">Regularization<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/slides\/07Optimization.pdf\">Optimization<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/slides\/08SpeechProduction.pdf\">Speech production<\/a>, table of <a href=\"https:\/\/roch.sdsu.edu\/cs682\/slides\/IPA-CMU-TIMIT-Phoneset.pdf\">phoneme names and sample pronunciation<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/slides\/09ConvolutionalNets.pdf\">Convolutional neural networks<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/slides\/10SequenceModeling.pdf\">Sequence modeling<\/a>, RNNs in <a href=\"https:\/\/roch.sdsu.edu\/cs682\/slides\/10KerasSequenceModels.pdf\">Keras<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/slides\/11Manifolds.pdf\">Manifolds<\/a>, the <a href=\"https:\/\/www.youtube.com\/watch?v=spUNpyF58BY\">discrete Fourier transform<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/slides\/12LanguageModeling.pdf\">Language models<\/a><\/li>\n<li><a href=\"https:\/\/roch.sdsu.edu\/cs682\/slides\/13Transformers.pdf\">Transformers<\/a><\/li>\n<\/ol>\n<\/div><div class=\"nav fusion-mobile-tab-nav\"><ul class=\"nav-tabs nav-justified\" role=\"tablist\" aria-orientation=\"horizontal\"><li  role=\"presentation\"><a class=\"tab-link\" data-toggle=\"tab\" role=\"tab\" aria-controls=\"tab-2f5c58aefb10609c910\" aria-selected=\"false\" tabindex=\"-1\" id=\"mobile-fusion-tab-2f5c58aefb10609c910\" href=\"#tab-2f5c58aefb10609c910\"><h4 class=\"fusion-tab-heading\"><i class=\"fontawesome-icon fa-calendar-alt fas\" aria-hidden=\"true\" style=\"font-size:13px;\"><\/i>Schedule<\/h4><\/a><\/li><\/ul><\/div><div class=\"tab-pane fade fusion-clearfix\" role=\"tabpanel\" tabindex=\"0\" aria-labelledby=\"fusion-tab-2f5c58aefb10609c910\" id=\"tab-2f5c58aefb10609c910\">\n<p>CS 682 Fall 2023<br \/>\nPlease note that all dates are tentative. My primary concern is that you master the material and the schedule may be adjusted in either direction to optimize comprehension and scope of material.<br \/>\nReadings are given in parentheses. Chapter numbers refer to your text (Goodfellow et al., 2016), other materials have links or are provided in Canvas. Remember that the text is available online in addition to print.<\/p>\n<p>Week &#8211; Topic<\/p>\n<ol>\n<li>Aug 22 \u2013 Introduction and elementary statistics, framing, basic acoustics.\u00a0 Readings: statistics 3-3.9, framing (ch. 9 &#8211; 9.3.2, Jurafsky and Martin, 2009), pressure, intensity, spectrograms Univ. of Rhode Island Graduate School of Oceanography and Marine Acoustics (2017). Online video: Brief introduction to Python.<\/li>\n<li>Aug 29 \u2013 Brief introduction to machine learning concepts (5) with a focus on neural networks (high-level introduction (6) and keras, we will cover these topics in detail later)<\/li>\n<li>Sep 5 \u2013 Brief introduction continued<\/li>\n<li>Sep 12 \u2013 Deep feed-forward networks (6)<\/li>\n<li>Sep 19 \u2013 Regularization (7-7.5) Paper: Dropout (Srivastava et al., 2014)<\/li>\n<li>Sep 26 \u2013 Optimization (8-8.3.2)<\/li>\n<li>Oct 3 \u2013 Professor Roch at conference, work on assignment<br \/>\nSpeech perception (2.4 of Rabiner and Juang, 1993)<\/li>\n<li>Oct 10 \u2013 Convolutional networks (9-9.3)<\/li>\n<li>Oct 17 &#8211; Sequence modeling (10-10.2.2, 10.10)<\/li>\n<li>Oct 24 \u2013 Tue, Oct 24 &#8211; midterm.\u00a0 Practical sequence modeling, Manifold learning (5.11)<\/li>\n<li>Oct 31 \u2013 Manifolds continued.<\/li>\n<li>Nov 7 \u2013 Papers: Time-domain audio processing (Ravanelli and Bengio, 2018), transformers (either Vasaswani et al. 2017 or a substitute paper\/tutorial)<\/li>\n<li>Nov 14\u2013 Professor Roch at conference (work on project)<\/li>\n<li>Nov 21 \u2013 Papers: Deformable convolutions (Dai et al., 2017), Harmonic Conv networks (Zhang et al. 2020), Multi-target learning, no class on Thanksgiving: Thu Nov 23<\/li>\n<li>Nov 28 \u2013 Paper: data2vec (Baevski 202), Language models, traditional and modern techniques (lecture slides and Irie et al. 2019)<\/li>\n<li>Dec 5 \u2013 Language models continued.<\/li>\n<\/ol>\n<p>Final exam is Thursday, December 14th from 3:30 \u2013 5:30 PM. No early exams will be given.<\/p>\n<\/div><div class=\"nav fusion-mobile-tab-nav\"><ul class=\"nav-tabs nav-justified\" role=\"tablist\" aria-orientation=\"horizontal\"><li  role=\"presentation\"><a class=\"tab-link\" data-toggle=\"tab\" role=\"tab\" aria-controls=\"tab-a6bb6a32350da60783f\" aria-selected=\"false\" tabindex=\"-1\" id=\"mobile-fusion-tab-a6bb6a32350da60783f\" href=\"#tab-a6bb6a32350da60783f\"><h4 class=\"fusion-tab-heading\"><i class=\"fontawesome-icon fa-book fas\" aria-hidden=\"true\" style=\"font-size:13px;\"><\/i>Materials<\/h4><\/a><\/li><\/ul><\/div><div class=\"tab-pane fade fusion-clearfix\" role=\"tabpanel\" tabindex=\"0\" aria-labelledby=\"fusion-tab-a6bb6a32350da60783f\" id=\"tab-a6bb6a32350da60783f\">\n<h1><strong>Textbooks \u00a0<\/strong><\/h1>\n<p>Since the introduction of deep learning into speech processing by Dahl <em>et al<\/em>. (2010) and Deng <em>et al<\/em>. (2010) the field has changed rapidly and deep learning is now the dominant method used in speech processing applications. \u00a0As the best textbooks on deep learning do not cover speech recognition, we will be using a deep learning book supplemented with readings on speech.<\/p>\n<p>Required:<\/p>\n<ul>\n<li>Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning (The MIT Press, Cambridge, Massachusetts), pp. xxii, 775 pages. \u00a0Availble in print at bookstore or freely available\u00a0<a href=\"http:\/\/www.deeplearningbook.org\/\">online<\/a>.<\/li>\n<\/ul>\n<p>Programming exercises will be implemented using Python 3.9 and a recent version of Tensorflow.\u00a0 While we will briefly introduce Python in class, we will not be devoting much time to how to program in Python as computer science graduate students (and advanced undergraduates) should be able to pick up new languages fairly easily. \u00a0Optional books to help you with this are freely available from SDSU Library through <a href=\"http:\/\/libguides.sdsu.edu\/safari-tech-books\">Safari<\/a>\u00a0Technical Books:<\/p>\n<ul>\n<li>Martelli, A., Ravenscroft, A., and Holden, S. (2017). Python in a Nutshell, 3rd Edition (O&#8217;Reilly Media, Inc, Sebastapol, CA) or<\/li>\n<li>Reitz, K. (2016). The Hitchhiker&#8217;s Guide to Python: Best Practices for Development (O&#8217;Reilly Media, Sebastopol)<\/li>\n<\/ul>\n<p>In addition, the python.org&#8217;s <a href=\"https:\/\/docs.python.org\/3.5\/\">tutorial <\/a>is also quite good and I have put a short video for you on Canvas.<\/p>\n<h1><strong>Programming environment\u00a0<\/strong><\/h1>\n<p>Python has become one of the most popular languages for machine learning. \u00a0This is primarily due to a large number of scientific libraries such as <a href=\"http:\/\/www.numpy.org\/\">NumPy <\/a>and <a href=\"https:\/\/www.scipy.org\/\">SciPy <\/a>coupled with popular machine learning language libraries such as <a href=\"https:\/\/www.tensorflow.org\/\">TensorFlow<\/a> and <a href=\"http:\/\/pytorch.org\/\">PyTorch <\/a>as well as higher-level interfaces such as <a href=\"https:\/\/keras.io\/\">keras<\/a>. \u00a0In this class, we will use NumPy, SciPy, keras, and TensorFlow. \u00a0Anaconda (Austin, TX) is a company that offers a distribution that makes installing this large collection of libraries easier.\u00a0 \u00a0In past semesters, students have preferred to use their own machines to departmental resources. To install Anaconda on your machine, please follow these <a href=\"https:\/\/drive.google.com\/drive\/folders\/0B9XgkMlUiDz9RDZuMDFkTTJ0ZU0?usp=sharing\">instructions<\/a>. \u00a0The instructions also show you how to run a program using eclipse (preferred) or the Spyder IDE once Anaconda is installed.<\/p>\n<\/div><div class=\"nav fusion-mobile-tab-nav\"><ul class=\"nav-tabs nav-justified\" role=\"tablist\" aria-orientation=\"horizontal\"><li  role=\"presentation\"><a class=\"tab-link\" data-toggle=\"tab\" role=\"tab\" aria-controls=\"tab-73d88e952c80a0278a3\" aria-selected=\"false\" tabindex=\"-1\" id=\"mobile-fusion-tab-73d88e952c80a0278a3\" href=\"#tab-73d88e952c80a0278a3\"><h4 class=\"fusion-tab-heading\"><i class=\"fontawesome-icon fa-podcast fas\" aria-hidden=\"true\" style=\"font-size:13px;\"><\/i>About<\/h4><\/a><\/li><\/ul><\/div><div class=\"tab-pane fade fusion-clearfix\" role=\"tabpanel\" tabindex=\"0\" aria-labelledby=\"fusion-tab-73d88e952c80a0278a3\" id=\"tab-73d88e952c80a0278a3\">\n<p>About the course:<\/p>\n<p>You will master machine learning and signal processing skills. We will apply this to recognizing audio signals, but many of the skills that you will acquire are useful in many contexts such as finance, bioinformatics, control systems, etc. Upon successful completion of this class, students should be able to:<\/p>\n<ul>\n<li>Understand feature extraction including automated discovery of features.<\/li>\n<li>Have an understanding of human speech production and perception.<\/li>\n<li>Solve problems related to the classification of signals using a variety of machine learning and signal processing skills, including problems with temporal dependencies.<\/li>\n<li>Organize and write a scientific paper.<\/li>\n<li>Read, understand, and critique current research literature.<\/li>\n<\/ul>\n<p>The prerequisites for this course are:\u00a0Computer Science 310, Mathematics 254, and Statistics 551A. \u00a0As many CS students will not have taken Statistics 551A or linear algebra 254, this will be waived for any student who is willing to spend a bit of time learning the statistics, the basics of which will be covered briefly in class.<\/p>\n<p>Please see <a href=\"https:\/\/roch.sdsu.edu\/cs682\/CS682Syllabus.pdf\">syllabus<\/a>\u00a0for detailed course policies.<\/p>\n<\/div><\/div><\/div><div class=\"fusion-clearfix\"><\/div><\/div><\/div><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-63","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/roch.sdsu.edu\/index.php\/wp-json\/wp\/v2\/pages\/63","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/roch.sdsu.edu\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/roch.sdsu.edu\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/roch.sdsu.edu\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/roch.sdsu.edu\/index.php\/wp-json\/wp\/v2\/comments?post=63"}],"version-history":[{"count":117,"href":"https:\/\/roch.sdsu.edu\/index.php\/wp-json\/wp\/v2\/pages\/63\/revisions"}],"predecessor-version":[{"id":614,"href":"https:\/\/roch.sdsu.edu\/index.php\/wp-json\/wp\/v2\/pages\/63\/revisions\/614"}],"wp:attachment":[{"href":"https:\/\/roch.sdsu.edu\/index.php\/wp-json\/wp\/v2\/media?parent=63"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}