Kaggle Competition
Competition Link: Island in the Sun
Good luck! The competition ends February 28
Kaggle is a machine learning competition platform, and you will be competing with each other to obtain the top solution. Imagine that in this competition you and your team were competing for a summer internship at a company of your choice. For that reason, it’s not only your rank on the Kaggle competition that matters, but you presentation of your solution and how clearly your team is able to communicate it.
Report
Through this project you will access real world data, process the data into a form to ready it for machine learning algorithms, train and test multiple supervised learning techniques, and evaluate the performance of your algorithms. Your report on your solution should be composed in a way that tells the story of this competition. You will want to include the following content. The format below is meant to lay out the components that must be addressed, although the goal here is to get others excited about what you did, so you’re encouraged to be creative in how the material below is presented in a way that engages your reader. No section of this report, unless otherwise stated, should be less than 250 words. Figures should be used throughout to support your narrative.
- Abstract. [150 words maximum] This should be the one paragraph that captures the significance of what you did and why you did it.
- Introduction. Provide a description of the problem and the value in finding a solution, motivate your reader as to why he/she should care about this question. The idea is to get your reader excited about the solution you are about to present.
- Background. This section should cite problems that have been previously addressed that relate to your work, and the key takeaways of the studies that explored that work. The idea here is to place the problem you’re working on in context and to let the reader know that you’re not working in a knowledge vacuum. For finding relevant literature, a good starting point is Google Scholar.
- Data. Describe and visualize your data. Make sure every caption fully describes the figure. You may want to visualize the raw data and/or extracted features. What challenges are inherent to this problem? How might they be overcome? What take away messages can you get simply from visualizing your data?
- Methods. Your machine learning solution (a description of any preprocessing, feature extraction, classification/regression techniques) and why you made each of the choices you did. Any methods that you didn’t create yourself, please cite relevant literature. Also include a flow chart of your methodology to the reader can easily conceptualize your solution. Be sure to describe your approach to measuring generalization performance. Imagine that you are writing this section so that someone could recreate your results.
- Results. A complete performance assessment that includes your validation approach (cross validation, train/validate/test split, etc.) and the key metrics of performance for the problem (ROC curves or confusion matrices). You should also compare your outcomes to one or more baselines (which could include random chance or a very simple model). This section should be supported with visualizations including examples where your method worked well, and examples where it failed, and hypotheses as to why.
- Conclusions. It’s critical to have a strong ending and not just let the energy fizzle out of the report. Many readers, if pressed for time, will simply read your abstract and your conclusions. In fact, you may want to start by writing your conclusions. Recap the problem you were studying and why. What was your approach to the solution. What are the key takeaways from your work? If the readers took nothing else away from reading your report, what would you want them to know most? Did you identify one particular approach that worked well? Was there a challenge that you faced that opens the door to working on solving a new problem? What avenues of research would you pursue next?
- Roles. Since this is a team project, we want to know what your specific contribution was to this project. Provide detail on your role and how it contributed to the competition.
- References. An alphabetical list of references cited in this work. A minimum of 5 are required. We recommend using the Zotero citation manager for collecting and compiling your references.
Your report will be in the form of a Jupyter notebook that is both published online (provide us with the link). You will ALSO submit THREE printed PDF versions of this Jupyter notebook.
For citations in Jupyter Notebooks, we recommend author-date notation for ease of composition. For example, if the author was Jocelyn Miller, and the date of publication was 2018, then the citation would be (Miller 2018).
Peer Evaluation
Since this is a team project, you will also be evaluated by your teammates (and yourself). This is a chance to offer each other feedback and reflect on your own performance. You will be rating your fellow team members on the following criteria:
- Was dependable in attending meetings to work on the project
- Did work accurately and completely
- Completed work on time
- Contributed positively to team discussions
- Helped others when needed
- Responded to communications in a timely manner
- Treated other team members respectfully
- Demonstrated a positive attitude about the team and its work
You can download the peer evaluation form here
.