Predicting house prices on Kaggle: a gentle introduction to data science – Part II

In Part I of this tutorial series, we started having a look at the Kaggle House Prices: Advanced Regression Techniques challenge, and talked about some approaches for data exploration and visualization. Armed with a better understanding of our dataset, in this post we will discuss some of the things we need to do to prepare our data for modelling. In particular, we will focus on treating missing values and encoding non-numerical data types, both of which are prerequisites for the majority of machine learning algorithms. We will briefly touch upon feature engineering as well – a crucial step for building effective predictive models. So let’s get started!
Continue reading

Predicting house prices on Kaggle: a gentle introduction to data science – Part I

Data is ubiquitous these days, and being generated at an ever-increasing rate. However, left untouched and unexplored, it is of course of little use. This post will be the first in a series of tutorial articles exploring the process of moving from raw data to a predictive model. We’ll walk through the basic steps involved, and talk about some of the common pitfalls along the way.

Continue reading

Customizing application properties with JBoss EAP/Wildfly

Usually developers have to create and deploy different versions of their application: For local development, testing, training, production, …

Different third-party and system dependencies for those different versions will preferably be configured via the container, e.g. data sources, JMS, topics, mail server, etc. However, most applications also contain several custom application properties such as the current version, mail addresses, images, templates, etc. Most of them may be static, but there are cases where you want to change application properties dynamically, i.e. without rebuilding the artifact.

In this article we will describe some approaches how this goal can be approached using the JBoss WildFly/EAP7 application server.

Continue reading

WildFly 8-10 and JBoss EAP 7 verbose HTTP headers

As a developer I am really happy to have an easy way to determine which version of a software I’m running. But I do not like it if my software tells everyone its name and version, as this gives important fingerprinting information to possible attackers.

If you use WildFly versions 8 through 10 or JBoss EAP version 7 the default configuration includes some HTTP headers that are too verbose in my opinion. JBoss EAP 6 is not affected by the way. The headers you get look like this

Server: JBoss-EAP/7
X-Powered-By: Undertow/1

Getting rid of these headers is really easy. So I think the tiny effort to remove these headers should be put into any project even if the probability of getting attacked and the possible impact are really small.

To fix the problem let’s have a look at the default configuration in the standalone.xml:

Continue reading

commons-fileupload 1.3.3 resolves deserialisation vulnerability CVE-2016-1000031

CVE-2016-1000031 is a vulnerabilty in the extremely widely used Apache Commons library commons-fileupload – you might not even know you’re having it on your class path. It has a very nasty Remote Code Execution vulnerability with easy to use exploits publicly available up to version 1.3.2. What makes it even worse is that you do not even need to use the library – you only need to have it on your class path and to deserialise some data. The data is the attack vector. You can find a good in detail explanation of the vulnerability here.

It did take a while but with version 1.3.3 this vulnerability is finally closed (by default).

There is some stuff that you should know about the fix though:

Continue reading

Measuring of Swift by looking at a simple web service

I ran across a colleague’s article recently and figured that the Swift programming language would be a nice addition to his comparison. In order to remedy this I implemented the admittedly very simple web service in Swift and measured both its performance and size. Then I followed the given structure of the article in terms of how to present my relevant results. You may find these subsequently.

Continue reading

ActiveMQ Confusion and What comes with your JBoss EAP / WildFly

Oftentimes people talk to each other about using ActiveMQ, but they’re actually referring to different brokers. That is because there are 3 different message brokers with ‘ActiveMQ’ in their name and this turns out to be pretty confusing when a project as big as WildFly starts to use a broker with ‘ActiveMQ’ in its name that is not the broker that was known for years under the name ‘ActiveMQ’.

So there are 3 projects:

Continue reading