Predicting house prices on Kaggle: a gentle introduction to data science – Part II

In Part I of this tutorial series, we started having a look at the Kaggle House Prices: Advanced Regression Techniques challenge, and talked about some approaches for data exploration and visualization. Armed with a better understanding of our dataset, in this post we will discuss some of the things we need to do to prepare our data for modelling. In particular, we will focus on treating missing values and encoding non-numerical data types, both of which are prerequisites for the majority of machine learning algorithms. We will briefly touch upon feature engineering as well – a crucial step for building effective predictive models. So let’s get started!
Continue reading

Predicting house prices on Kaggle: a gentle introduction to data science – Part I

Data is ubiquitous these days, and being generated at an ever-increasing rate. However, left untouched and unexplored, it is of course of little use. This post will be the first in a series of tutorial articles exploring the process of moving from raw data to a predictive model. We’ll walk through the basic steps involved, and talk about some of the common pitfalls along the way.

Continue reading