We know our dataset inside out (Part I), the data is immaculately clean (Part II) and we’ve engineered some powerful and informative features. Finally, in this third and final part of our tutorial series, we are ready to proceed to the guts of the data science process: the modelling itself. Given the abundance of excellent machine learning libraries available, we will not delve here into developing the algorithms themselves. Rather, we will discuss how one might go about choosing and fitting one of the models already available, and how to verify whether the solution we end up with is up to task.
Continue reading
Author: lstrln
Predicting house prices on Kaggle: a gentle introduction to data science – Part II
In Part I of this tutorial series, we started having a look at the Kaggle House Prices: Advanced Regression Techniques challenge, and talked about some approaches for data exploration and visualization. Armed with a better understanding of our dataset, in this post we will discuss some of the things we need to do to prepare our data for modelling. In particular, we will focus on treating missing values and encoding non-numerical data types, both of which are prerequisites for the majority of machine learning algorithms. We will briefly touch upon feature engineering as well – a crucial step for building effective predictive models. So let’s get started!
Continue reading
An evening of data science and machine learning
Last Wednesday, akquinet tech@spree had the pleasure to host a technology evening of the Software Initiative Berlin Brandenburg (SIBB). The event was devoted to all things data and machine learning and comprised two talks, plus plenty of discussions (and pizza) afterwards.
Predicting house prices on Kaggle: a gentle introduction to data science – Part I
Data is ubiquitous these days, and being generated at an ever-increasing rate. However, left untouched and unexplored, it is of course of little use. This post will be the first in a series of tutorial articles exploring the process of moving from raw data to a predictive model. We’ll walk through the basic steps involved, and talk about some of the common pitfalls along the way.