Course Overview

In almost every field, there is a need to draw inference from or make decisions based on data. The goal of this course is to provide an introduction to machine learning that is approachable to diverse disciplines and empowers students to become proficient in the foundational concepts and tools while working with interdisciplinary real-world data. You will learn to (a) structure a machine learning problem, (b) determine which algorithmic tools are applicable to a given problem, (c) apply those algorithmic tools to diverse, interdisciplinary data examples, (d) evaluate the performance of your solution, and (e) how to accurately interpret and communicate your results. This course is a fast-paced, applied introduction to machine learning that arms you with the basic skills you will need in practice to both conduct analyses and effectively communicate your results.

Instructor

Teaching Assistants

Schedule Final Project Data Science Resources

Class Time and Location

Meeting times:
Monday and Wednesday 10:15am - 11:30am

Meeting location:
Virtual via Zoom. Links are available on Sakai.

Office Hours and Email

Kyle Bradbury (kyle.bradbury@duke.edu)
Office Hours: See Piazza
Vanessa Tang (vanessa.tang@duke.edu)
Office Hours: See Piazza
JY Xu (jy.xu@duke.edu)
Office Hours: See Piazza

Navigating Class Resources

  • Sakai: Zoom meeting links, quizzes, grades
  • Piazza: Announcements, questions, communications
  • Gradescope: Assignment & project submission & feedback
  • Schedule site: Schedule & assignments

Assignments & Grading

Assignments, projects, & quizzes: Assignments and projects details are posted on the course syllabus. For expectations and instructions on the assignments, see the Assignment Instructions. Quizzes are found on Sakai.

Grading:
  • 50% Assignments (6, each worth 8.3%)
  • 20% Quizzes (~23, each worth <1%)
  • 30% Final Project

Textbook and References

Textbooks (free versions available online):
An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, 2013.
Pattern Recognition and Machine Learning by Christopher Bishop, 2006.
Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016.
Reinforcement Learning: An Introduction, by Richard Sutton and Andrew Barto, 2018.

Have questions?

We welcome your questions about the course including lectures, assignments, projects, and logistics on Piazza. Email the TA or instructor about questions that specifically pertain to you as an individual. Ask questions early - questions asked close to a deadline are not guaranteed to get a response - please plan accordingly.

Prerequisites

This course moves quickly, so having a firm grasp on prerequisites is important. The prerequisites are as follows:

Detailed description

This course will begin with exploring the purpose of machine learning told through a discussion of the types of problems that machine learning can answer: description of the data – “what has happened?”, predictions based on the data – “what will happen?”, or prescription – “what should happen?” Then we will discuss the tools at our disposal to answer these questions, namely supervised learning including classification and regression; unsupervised learning including clustering and density estimation; and lastly reinforcement learning. There will be a strong focus on how to formulate a machine learning problem. Central to that formulation will be developing an understanding of how to preprocess data for analysis (normalization, cleaning, etc.), sampling and dimensionality reduction, feature and model selection, and performance evaluation with cross validation. The final topic of this course will be a brief overview of state-of-the-art machine learning techniques such as deep learning.

Throughout this course, the focus will be on applying algorithms rather than diving deeply into theory. You will be asked to consider the practical issues of machine learning problem solving: challenges of applying machine learning code packages, striving for parsimony and interpretability, and ensuring model assumptions are valid for a given problem and dataset. This course will also stress the importance of team-based collaboration, the value of producing fully reproducible and validated results, and tools to help with both such as version control and code repositories.

Communicating your results. Data science solutions are only as impactful as the communicator who shares them. Throughout this course you will be working with Jupyter Notebooks. A Jupyter notebook is an interactive writing and coding tool that allow you to combine formatted text, code and output from code including plots, as well as mathematical equations all in one location. Demonstrating competency in data science means (a) exhibiting a working knowledge of technical concepts including programming, statistics, and mathematics and (b) being able to clearly communicate the problem you were trying to solve or question you were trying to answer, why it matters, and how well your analysis worked. You will have opportunities to practice these skills throughout this course.

Software and Hardware Tools

Programming language: We will use Python 3.x. The Anaconda distribution is recommended and comes with the most common packages. Python continues to be an one of the top programming languages and the rich packages in the language make it an excellent choice for machine learning. In particular the Python ecosystem of packages makes it a natural choice for ML including core numerical programming and plotting libraries like numpy, scipy, matplotlib, and pandas as well as excellent packages for machine learning algorithm development and statistical modeling including TensorFlow, Pytorch, Keras, Scikit-Learn, and NLTK.
Development environments: Jupyter lab or Jupyter notebook will be appropriate for most class assignments. We highly encourage you to use Visual Studio Code or Spyder are for larger projects, in particular due to the debugging capabilities. There are many configurations that may work for you, but consider branching out to some of these other tools as well.
Graphics processing units (GPUs): GPUs are the workhorses of many modern machine learning algorithms, especially any that involve neural network-based architectures. There will be a small number of assignments that will require additional computation from that of GPUs. For these, we will use Google Colab, which is a free notebook environment that enables access to cloud resources including GPUs. For longer sessions before timeouts, greater RAM, and better GPUs you can optionally upgrade to Colab Pro (currently $9.99 per month).

Course Policies

Academic dishonesty. Adherence to the Duke Community Standard is expected. To uphold the Duke Community Standard:
I will not lie, cheat, or steal in my academic endeavors;
I will conduct myself honorably in all my endeavors; and
I will act if the Standard is compromised
Anyone found in violation of the Standard will be reported to the Office of Student Conduct.

Accommodations. If you need special accommodations due to physical or learning disabilities, medical needs, religious practices, or other reasons, please inform us as soon as possible so we can work to accommodate those needs.

Late Submissions. Assignments and projects are due in class by the start of class on the date posted. Late deliverables will ONLY be accepted at the discretion of the instructor. Any late assignments will result in a reduction of at least 20 points off the grade. Course projects will not be accepted after the deadline. Quizzes are given at the beginning of each class and students are expected to be present in class for the quiz. While quizzes cannot be made up since the answers are discussed immediately after, the lowest two quizzes will be dropped at the end of the semester for each student to accommodate necessary absences and days when we're off our game. Please reach out to the TA's or instructor as early as possible to request any special accommodations.

Collaboration. While collaboration is encouraged for the final project, assignments should contain independent work. You are welcome and encouraged to help each other, but your responses and solutions should all be your own on the assignments. No two assignments should have content that is identical, even in part. Quizzes are fully-independent endeavors and you should not discuss the content with others.

Accessibility.In addition to accessibility issues experienced during the typical academic year, I recognize that remote learning may present additional challenges. Students may be experiencing unreliable wi-fi, lack of access to quiet study spaces, varied time-zones, or additional responsibilities while studying at home. If you are experiencing these or other difficulties, please contact me to discuss possible accommodations.

Rules for video recording course content. Student recording recordings of lectures must be permitted by the instructor and shall be for private study only. Such recordings shall not be distributed to anyone else without authorization by the instructor whose lecture has been recorded. However, the instructor may arrange through the Office of Information Technology to make recorded lectures available to students enrolled in the class on such terms and conditions as he or she prescribes. Redistribution of recorded lectures is prohibited. Unauthorized distribution is a cause for disciplinary action by the Judicial Board. The full policy on recoding of lectures falls under the Duke University Policy on Intellectual Property Rights, available here.

Course Pedagogy

Tenet #1: Good learning is active learning. Everyone who was good at something was once bad at it. Learning comes from practice. No amount of reading or video/lecture watching alone will help you to become good without actively engaging with the material through practice. That is why this entire course is focused on supporting you to actively apply machine learning techniques through the assignments, quizzes, and project. Concept described in Make It Stick. Tenet #2: Desirable difficulty leads to meaningful learning. Learning is most effective when there's a degree of struggle with the material. "Requiring students to organize new information and to work harder in the initial learning period can lead to greater and deeper learning. Although this struggle, dubbed a desirable difficulty...may at first be frustrating to learner and teacher alike, ultimately it improves long-term retention" (Excerpt from A Concise Guide to Improving Student Learning: Six Evidence-Based Principles and How to Apply Them). Desirable difficulties help you build connections between concepts and learn representations of knowledge (meta-cognition) that, like an index of a book, will increase your ability to creatively connect concepts and think more deeply about the topic. This is also described in Make It Stick.
Tenet #3: Read, reflect, recall is a pattern for effective learning. Spaced retrieval and reflection is a key to effective learning. When we learn something, if we don't use it, the knowledge fades. However, if we return to the material, apply it, create with it, we're increasing the probability of long-term learning. This is why you will interact with each concept typically 4 times: lectures, readings, quizzes, and assignments, and at least one more time for those concepts involved in the final project. An added benefit of the frequent reflection through quizzes is that it tests your knowledge regularly, helping us to avoid the illusion of knowledge (thinking we know something, when we actually do not).

Mental Health and Wellness Resources

If your mental health concerns and/or stressful events negatively affect your daily emotional state, academic performance, or ability to participate in your daily activities, many resources are available to you, including ones listed below. Duke encourages all students to access these resources, particularly as we navigate the transition and emotions associated with this time. Duke Student Government has worked with DukeReach and student advocates to create the Fall 2020 “Two-Click Support” Form, and Duke Reach has expanded its drop in hours as well.

Managing daily stress and self-care are also important to well-being. Duke offers several resources for students to both seek assistance on coursework and improve overall wellness, some of which are listed below. Please visit this site to learn more about: