Final Project

IDS 705 Principles of Machine Learning

Summary and Goals

It’s time for you to unleash your creativity in the final project for the course! This is your chance to apply everything you’ve learned so far, from experimental design and model selection to evaluation and optimization. This project gives you the freedom to explore new applications, experiment, and innovate. Creativity, critical thinking, and problem-solving will be key.

Machine learning tools are not an end in themselves, but yield value when making predictions, quantifying and describing phenomena in the world around us, and in all these ways and more helping us to make decisions that would otherwise be difficult or impossible. For this final project, you will work in teams to (1) identify a problem to solve or a question to answer, (2) apply machine learning techniques to conduct experiments to investigate the application area, (3) rigorously evaluate the performance of your approach, and (4) clearly communicate your findings to a clearly-defined stakeholder audience. The deliverables for this project are:

  1. Project proposal
  2. Final written report AND a draft report prior to final submission
  3. Github repository for your project
  4. Peer evaluation

Other topics described in this document related to the project include: - Learning objectives - Submission, evaluation, & grading - Project ideas - Frequently asked questions

Learning Objectives

This is an opportunity to creatively deploy machine learning in an application area of interest to you. The focus of your project should be a clearly defined problem with a clearly identified (hypothetical) stakeholder - a person or group who would genuinely care about that problem. A central component of your project must be a machine learning methodology. It does not have to be one that we’ve explicitly discussed in class as you’re welcome to use the project as an opportunity to learn new topics, although there should be a supervised learning component to your project. The objectives of this project are to…

  1. Develop deeper competency in applying machine learning methods for practical applications
  2. Gain experience in learning more about a topic beyond what was explicitly discussed
  3. Increase your experience with collaborative data science workflows
  4. Refine your ability to communicate the findings from a project to interdisciplinary audiences

In this project you will use what you’ve learned throughout this course and build on that knowledge and experience to apply the paradigms, algorithms, evaluation tools, and interpretation techniques discussed throughout the course. I strongly encourage you to pick a project that is of genuine interest in some way (e.g. the application, the tools, the dataset, etc.). Learning comes from stretching yourself: this requires that you push yourself into some unfamiliar territory and that is often a challenge and leads to desirable difficulty. Through this struggle is how the best learning happens, but it requires perseverance and that is best achieved when you are able to bring intrinsic motivation to that challenge. Find a topic of interest and embrace the challenge!

For this project you will identify a problem you wish to solve using machine learning tools. Identify the experiment you would need to run to evaluate how well you solved it as compared to existing approaches in the field including what metrics to use to evaluate performance.

Requirements

  • The project must involve supervised machine learning. You may include concepts we were not able to cover in the course. You may include other concepts at well, but there should be a supervised learning component.

  • The project must be able to be completed within the course of this semester and should be scoped correctly: we encourage you to be ambitious, but please visit office hours if you have questions about project scope.

  • Every project should involve learning more about both your application domain and the methods that you’re using. This means reading about both facets. If you’re working on a project involving diagnosis of a disease, you should read enough to understand the disease and how it manifests the symptoms that your data may be noting. You’re expected to develop some domain knowledge related to your problem and demonstrate that in the report.

  • Your project should consider the potential ethical implications of your work and describe how that was factored into your work.

  • Core requirement: You’re required to have at least one core experimental design with rigorous evaluation. That, alone, will get you to the grade of a “B”. This will require an appropriate experimental design and appropriately evaluated results with measures of uncertainty in your estimates, whenever possible.

  • Additional analyses: To get an “A” you will need to go beyond the core requirement and add in two more depth components to the project that more fully investigate the problem you’re working on keeping your (imagined) stakeholder’s needs in mind. While these will be problem-specific, below are several ideas to help you think through some possibilities:

    • Model interpretability (or explainability). If interpretability is important, could you explore interpretable models that could be deployed alongside more flexible models to evaluate the tradeoff between model performance and interpretability?
    • Training data sufficiency analysis. If you anticipate limited training data im practice, could you determine how much training data is needed to sufficiently address your problem to inform future decisions for up-scaling the model? For example, vary the amount of training data and determine performance and uncertainty.
    • Model robustness when the application domain will likely be shifted from the training domain. What is the impact on generalization performance if we know that the conditions under which we will apply our ML approach are different than the available training data (e.g. if I’m working on a computer vision problem and all my training data are from the U.S., but the goal is to apply the model to regions in Asia)
    • Model sensitivity to imbalanced data. What if training data when the training data are not perfectly sampled to be representative of the actual target data. This could be an investigation of the impact and possible solutions to imbalanced datasets.
    • Active learning for cases where collecting data is expensive. How little additional data do you need to add to your model and how might it be identified how to sample the new data.
    • Bias detection and mitigation - could the fairness of algorithms be analyzed across different subsets of the data.
    • Model robustness to noise and/or adversarial attacks. How does the model stand up to accidental or purposeful mistakes in the dataset? How much can it take before performance degrades?
    • Calibration in classification models. Do the confidence scores, when interpreted as probabilities match the actual probabilities? If this is important for the application, can this be corrected?

Proposal

Your team will submit a short project proposal. You will receive feedback that should be used to guide your project development and execution. Every proposal should have the title of the project and the list of team members at the top of the first page. You can find the project proposal template and instructions here. You are required to use the template for your proposal so that we can provide comments in Google docs. Please read through follow the instructions in the proposal document for preparation and submission.

Additionally, content from your proposal may be reused in your draft/final report and so you’re encouraged to invest in it with that in mind.

If you are looking for ideas about datasets, etc., please see the Ideas section below. Please stop by office hours if you would like to discuss specific project ideas or for any other help in selecting your project idea.

Final Report

The final project report that you submit will consist of two parts: (1) a draft project report and (2) a final report. The draft project report is your main opportunity to get detailed feedback on your report. While the draft report won’t be graded, we will provide written feedback and suggestions in the form of Google doc comments that we strongly recommend addressing in your final report.

Please find the instructions and template for the final report here.

Github Repository

Your github respository should (a) contain a descriptive README.md file that explains what the repo is for, and how to use the code to reproduce your work (including how to set it up to run), (b) be well commented throughout all files, (c) list all dependencies in a requirements.txt file, (d) inform the user how to get the data and includes all preprocessing code, and (e) actually runs (i.e. we can successfully test it) and does what it says

Peer Evaluation

Since this is a team project, you will also receive feedback from your teammates AND reflect on your own performance in a self-evaluation. You will be evaluating your fellow team members on the following criteria:

  1. Was dependable in attending meetings to work on the project
  2. Did work accurately and completely
  3. Completed work on time
  4. Contributed positively to team discussions
  5. Helped others when needed
  6. Responded to communications in a timely manner
  7. Treated other team members respectfully
  8. Demonstrated a positive attitude about the team and its work

This evaluation is NOT based directly on the scores that you receive in the feedback, but a satisfactory peer and self-evaluation is assessed based on the level of constructiveness of the feedback you provide. More detailed, constructive feedback is more useful to help your peers better understand their strengths and areas for growth. Doing so respectfully and compassionately is a requirement. Your peers will receive anonymized versions of the feedback that you share. If the feedback you submit to your teammates does not demonstrate demonstrate a meaningful effort, your final project grade will be reduced by one letter (e.g., A- to B-).

Submission, Evaluation, & Grading

You should submit each deliverable from your project through Gradescope. You will submit a link to each team deliverable. This should be submitted AS A TEAM not through individual submissions. The project proposal, and draft final report should be submitted through GradeScope as links to Google Docs (so that we can attach easy-to-respond-to comments) using the templates provided. The link to the github repo should be submitted as a link via GradeScope. The final project report, however, should be submitted as a PDF document in GradeScope.

The grading for this project will be assigned as follows:

Component Evaluation / Feedback Plan
Final Report 15 points, graded
Team Proposal Written or verbal feedback will be provided to help guide your project design.**
Draft Final Report Written feedback will be provided to help guide your final report writing.**
Github Repository Required for project submission to be considered complete.**
Peer Evaluation Required for project submission to be considered complete.**
Total 15 points

** No points will be directly assigned. One point will be deducted from your overall final project score for each day late; up to 2 points may be deducted from the overall project score (out of 15 possible points) if the deliverable is unsatisfactory (i.e., if it does not represent a serious effort towards the deliverable)

Sample projects

The following projects ideas are short sketches showing what projects that generally meet the above criteria might look like. These ideas were jointly developed using AI tools to create a sufficient breadth of project types to help provide inspiration.

Example Project 1: Hospital Readmission Risk Prediction

Stakeholder: Hospital discharge planning teams and health insurance case managers.

Problem: Predict 30-day hospital readmission risk from patient demographics, diagnosis codes, length of stay, and prior visit history.

Short description of why stakeholders care: Hospitals face financial penalties from Medicare when patients are readmitted within 30 days of discharge. More importantly, readmissions often signal that a patient’s care plan was inadequate or that they lacked support to recover safely at home. A readmission risk model used at discharge could trigger additional follow-up calls, home health visits, or medication reviews for high-risk patients, but only if clinicians trust it.

Dataset: Diabetes 130-US Hospitals Dataset (UCI, ~100K rows)

Core requirements:

  • Model comparison and evaluation: identify a reasonable baseline model from the field and select several reasonable models to compare it to. Conduct a stratified k-fold cross-validation and compare the performance of the models.
  • Analyze the results thoroughly and compare performance by race, insurance type, or age; investigate disparities in performance and report how this may impact healthcare equity.

Additional analyses (choose at least 2):

  • Interpretability: A black-box score that a medical professional cannot explain to a patient or justify in a care plan is unlikely to be used in practice. Investigate whether interpretable models can achieve similar performance (e.g. sparse logical models, generalized additive models, decision trees, etc.). Identify features most associated with readmission risk; verify alignment with clinical risk factors and discuss your findings.
  • Calibration: If a care team told a patient has “72% readmission risk” needs that number to mean something. Assess probability calibration (how closely the predicted probabilities match actual probabilities).Try some approaches to correct any issues found with model calibration.
  • Model fairness audit and improvement:. If performance differs by race, insurance type, or age, explore whether there are class imbalances in the data and investigate how these might be able to be overcome through class weighting, over- or under-sampling, etc.
  • Error cost analysis:: Define a realistic cost matrix (false negatives are more dangerous than false positives) and tune the decision threshold accordingly. How would changes in cost impact the operating point and what implications might that have for clinical applications?

Example Project 2: Predicting Socioeconomic Outcomes from Satellite Imagery

Stakeholder: International development organizations (e.g., World Bank) and humanitarian NGOs.

Problem: Use pre-trained computer vision models to extract features from satellite images and predict poverty indicators.

Short description of why stakeholders care: Reliable poverty data in low-income countries is often years out of date or simply unavailable. Traditional surveys are expensive, slow, and require physical access to remote areas. A model that estimates poverty indicators from freely available satellite imagery could allow organizations to continuously monitor the impact of aid programs, identify communities in need, and allocate limited resources more equitably. The distribution shift and data efficiency stretch goals are directly motivated by the reality that a model trained on one region must often be deployed in another with little additional labeled data.

Dataset: SustainBench or DHS poverty data + satellite tiles

Core requirements:

  • Model comparison and evaluation: using pretrained computer visions models (e.g., CNNs or vision transforms) extract features from the relevant images and then use a spatial cross-validation (ensure train/test splits do not leak geographically adjacent tiles) to estimate poverty.
  • Evaluate performance over different spatial contexts/regions and explore how it differs region-to-region? What might the implications be for practical applications?

Additional analyses (choose at least 2):

  • Distribution shift analysis: A model trained on one region must often be deployed in another with little additional labeled data. Fine-tune your model on one region, evaluate on another and quantify the performance drop; repeat this across several regions and determine the average change in performance. Discuss why this matters for deployment and how to address any performance gaps you might discover.
  • Deployment constraints: Document compute requirements: GPU hours, memory usage, and estimated cost on a cloud platform. With your findings comment on how scalable this approach may be for larger geographies?
  • Explainability: Generate activation maps to visualize what image regions the model attends to; do they correspond to meaningful features (roads, vegetation, building density)? What do you learn from them?
  • Data efficiency: Train on progressively smaller labeled subsets and plot a learning curve. At what point does additional labeled data stop helping? What’s the smallest amount of data for which you can still get acceptable results? What implications does this have for practical application/deployment?