Kaggle Tutorial : Competitions – Part I

This Kaggle competition is a great way to get your hands on real data science and data analysis problems.

Humpback Whale Identification

One of the major problems when learning data science is how to get your hands on real problems. If you want to become a real data scientist or learn data science, Kaggle is one of best places to practice data science. 

About this tutorial

Here I’m going to be doing this kaggle tutorial on how to get started in one of the current competitions of the website. If you want to follow along, just go to the competitions, and scroll down to the Humpback whale identification challenge. I’ve been playing around with the humpback whale identification challenge for about a month now. You can checkout the prizes for this competition, they are up to 10k dollars.

Breaking down

I’m going to be breaking down this competition from the very start. We are going to be going from 0 to creating a model to make our submissions. In this first video I’ll be showing you the kernel I’ve made so that you can follow along with the videos. For those that aren’t familiar with kaggle, this kernels are like jupyter notebooks that you can run on the cloud. You can check out the specifications for the machine running your scripts here. And you can also check out the commits made to the kernel. The specifications  are quite reasonable to run your first models

Let’s get coding

Ok, Now that I gave you an introduction to the kernels at kaggle, we can move into the coding part. To make our model we’ll use pytorch, I got quite surprised when I asked if you wanted more videos on keras or pytorch and you choose pytorch, but this great, I’ve enjoyed pytorch much better then keras and tensorflow so far. A part from pytorch, you can see pytorch beeing imported here, you’ll use the os library, to work with the files, also going to be using pandas, We can’t miss that on our data science project (big grin). For the matrices and vector calculations we’ll be using our old friend numpy. To understand a bit better and visualize our dataset we’ll use matplotlib

What’s next

In the next tutorials we’ll be moving on to creating a class to handle our dataset. Then making some basic preprocessing so we can create our convolutional neural net works with pytorch. You can check out the new videos here ( in the future haha )