Rich Data     About/Imprint     CV     Publications     Blog Archive     Blog Feed

The severity of Europe's reaction to Covid-19

230 lines of code (Python)

While the global pandemic unfolded, European countries followed different strategies. Some reacted radically and fast. Others are still taking their time. In this blog post I will characterise the policy decisions to the novel Corona virus.

Gif of the policy restriction severity in Europe

Read more

The tricky question of how long it takes for Corona cases to double

260 lines of code (Python)

The doubling time of Covid-19 cases has become one of the key metrics of the Corona pandemic. Political decision makers use this number to decide when to ease lockdown measures. In this blogpost I show that different assumptions about the virus epidemic lead to different doubling time estimates. Which number should you trust?

Different ways of calculating the doubling time lead to vastly different estimates

Read more

Using geographical heat maps to visualise cultural influence

100 lines of code (Python)

Streets, squares and places in general tend to be named after influential people. As a consequence, place names offer a glimpse into who shaped local cultures. In this post, I use open street map data in order to visualise the influence of different historical figures in Germany.

Konrad Adenauer place names heat map

Read more

Exponentially scaling your data in order to zoom in on small differences

100 lines of code (Python)

Machine learning models benefit from zooming in on the area of a scale where most data points show differences. In this blog post I present an exponential scaler which does exactly that. It zooms in on the lower or higher end of the scale in order to focus a machine learning model on the differences that count the most.

The effect of a negative exponent

Read more

Transforming from one scale to another

90 lines of code (Python)

Transforming data from one scale to another is such a common task as a data scientist. This blog post goes beyond the options found in sklearn. I have always missed one particular scaler, so in this blog post I write it myself, the ScoreScaler.

Formula of ScoreScaler

Read more

Working hours of a start-up employee

200 lines of code (Python)

From November 2017 until July 2019 I worked for a seed-funded Berlin start-up. During most of this time I tracked my working hours. In this post I analyse these data.

Development of extra hours during one and a half years

Read more

The surprisingly good performance of dumb classification algorithms

140 lines of code (Python)

When evaluating binary classification algorithms it is a good idea to have a baseline for the performance measures. In this blog post I calculate the classification performance of really dumb classifiers. These models do not use any feature information. If your own classification model performs just like them, there is a problem.

Summary of F1-scores of dumb classifiers

Read more

Predicting typical completion rates of online courses

140 lines of code (Python)

Massive open online courses (MOOCs) did not revolutionize education. Why? They suffer from abysmal completion rates. Most students start a MOOC without finishing it. In this blog post I take a look at what my own company's e-learning course completion rates would be if we offered standard MOOCs.

alt text

Read more

Modelling rating data correctly using ordered logistic regression

70 lines of code (Python)

Using rating data to predict how much people will like a product is more tricky than it seems. Even though ratings often get treated as if they were a kind of measurement, they are actually a ranking. The difference is not just academic. In this blog post I show how using an appropriate model for such data improves prediction accuracy.

alt text

Read more

Creating the right data map

280 lines of code (R)

Information with a geographical element can best be visualised with a map. However, big regions tend to dominate maps independent of their actual importance. I show possible ways around this issue and let you generate the right data map for your own purposes without needing to code.

alt text

Read more