Starting off in data science
200 lines of code (R)
A little more than a year ago, I decided to pursue a career in data science. Today, I work as an educational data scientist for StackFuel, a small start-up in Berlin. How did I do it?
Read moreSMOTE explained for noobs - Synthetic Minority Over-sampling TEchnique line by line
130 lines of code (R)
Using a machine learning algorithm out of the box is problematic when one class in the training set dominates the other. Synthetic Minority Over-sampling Technique (SMOTE) solves this problem. In this tutorial I'll walk you through how SMOTE works and then how the SMOTE function code works.
Read moreSQL versus R - who is faster?
225 lines of code (R)
Is it worth organising your data in a data base if all you are interested in is speed? It depends on what you are doing with the data. This guide teaches you where to expect speed advantages of SQLite and R.
Read moreUsing Python and the IMDb API to find the best Star Trek episode
100 lines of code (python)
A new Star Trek series, Discovery, is on the horizon. Time to look back at the best episodes of the franchise so far. This guide will teach you to use the IMDb API to get the answers yourself.
Read moreHow close are German political parties to each other? Using R to derive the latent semantic network of German election manifestos
150 lines of code (R)
On 24 September Germans will elect a new federal parliament. In this tutorial, I text mine the main parties' election manifestos, derive the latent semantic space and visualise it to see who is closer to whom in German politics.
Read moreUsing R to derive the German election manifesto word clouds
60 lines of code (R)
In just one month the biggest country of Europe, Germany, is going to the polls. In this short tutorial, I text mine the main parties' election manifestos in order to visualise the state of German politics.
Read moreGermany discriminates against foreign grades
280 lines of code (R)
There are more than 23,000 Germans studying in the Netherlands. Many of them don’t realise that back in Germany they will be penalised. The reason is foreign grade discrimination. What can be done about it?
Read moreThe Rich Data guide to beer
200 lines of code (R)
I sampled millions of beer ratings from the biggest beer rating sites around in order to answer all the questions beer lovers have. What is the best beer? What makes it so good? Where are the best beers made?
Using Python, the IMDb API, and web-scraping rotten tomatoes to find the best Star Trek movie
120 lines of code (python)
Star Trek is a very rare positive vision of the future. Which movie captures the audience the most with this hopeful message? I learned python to sample all the movie ratings I could find in order to answer this question. If you follow this guide, so can you.
Read more
Older
Newer