In Part I of this tutorial series, we started having a look at the Kaggle House Prices: Advanced Regression Techniques challenge, and talked about some approaches for data exploration and visualization. Armed with a better understanding of our dataset, in this post we will discuss some of the things we need to do to prepare our data for modelling. In particular, we will focus on treating missing values and encoding non-numerical data types, both of which are prerequisites for the majority of machine learning algorithms. We will briefly touch upon feature engineering as well – a crucial step for building effective predictive models. So let’s get started!
Last Wednesday, akquinet tech@spree had the pleasure to host a technology evening of the Software Initiative Berlin Brandenburg (SIBB). The event was devoted to all things data and machine learning and comprised two talks, plus plenty of discussions (and pizza) afterwards.
Data is ubiquitous these days, and being generated at an ever-increasing rate. However, left untouched and unexplored, it is of course of little use. This post will be the first in a series of tutorial articles exploring the process of moving from raw data to a predictive model. We’ll walk through the basic steps involved, and talk about some of the common pitfalls along the way.
Usually developers have to create and deploy different versions of their application: For local development, testing, training, production, …
Different third-party and system dependencies for those different versions will preferably be configured via the container, e.g. data sources, JMS, topics, mail server, etc. However, most applications also contain several custom application properties such as the current version, mail addresses, images, templates, etc. Most of them may be static, but there are cases where you want to change application properties dynamically, i.e. without rebuilding the artifact.
In this article we will describe some approaches how this goal can be approached using the JBoss WildFly/EAP7 application server.
As a developer I am really happy to have an easy way to determine which version of a software I’m running. But I do not like it if my software tells everyone its name and version, as this gives important fingerprinting information to possible attackers.
If you use WildFly versions 8 through 10 or JBoss EAP version 7 the default configuration includes some HTTP headers that are too verbose in my opinion. JBoss EAP 6 is not affected by the way. The headers you get look like this
Getting rid of these headers is really easy. So I think the tiny effort to remove these headers should be put into any project even if the probability of getting attacked and the possible impact are really small.
To fix the problem let’s have a look at the default configuration in the standalone.xml:
Hurray! Vaadin 8 has finally been released and comes up with a bunch of new features. Maybe the most important one is a new data binding concept. But the feature I’m discussing here is support for the HTML5 History API.
CVE-2016-1000031 is a vulnerabilty in the extremely widely used Apache Commons library commons-fileupload – you might not even know you’re having it on your class path. It has a very nasty Remote Code Execution vulnerability with easy to use exploits publicly available up to version 1.3.2. What makes it even worse is that you do not even need to use the library – you only need to have it on your class path and to deserialise some data. The data is the attack vector. You can find a good in detail explanation of the vulnerability here.
It did take a while but with version 1.3.3 this vulnerability is finally closed (by default).
There is some stuff that you should know about the fix though:
I ran across a colleague’s article recently and figured that the Swift programming language would be a nice addition to his comparison. In order to remedy this I implemented the admittedly very simple web service in Swift and measured both its performance and size. Then I followed the given structure of the article in terms of how to present my relevant results. You may find these subsequently.
Oftentimes people talk to each other about using ActiveMQ, but they’re actually referring to different brokers. That is because there are 3 different message brokers with ‘ActiveMQ’ in their name and this turns out to be pretty confusing when a project as big as WildFly starts to use a broker with ‘ActiveMQ’ in its name that is not the broker that was known for years under the name ‘ActiveMQ’.
So there are 3 projects: