13/04/2018 – Finding joint and specific sources of variation in linked high-dimensional data

A small GitHub Repository with the R code and PPT used by Katrijn van Deun at the last MSDSlab session can be found at the MSDSlab GitHub page or at: https://github.com/msdslab/MSDS-13-04-2018-RSCA

Original post:

Friday 13/04/2018 at 15:30 in room B1.09


After the  high-dimensional data symposium, Katrijn van Deun of Tilburg University, will give an interactive talk on ‘Finding joint and specific sources of variation in linked high-dimensional data’ for the MSDSlab members.

Attention: This presentation is on another day and time than usual.

Preparation: Bring laptop with R installed


22/03/2018 – Hidden Markov Models

Emmeke explaining the model in her graph

Here you can find the slides of Emmeke’s talk


Original announcement:

Thursday 22/03/2018 at 15:00 in room B1.09

In this meeting, Emmeke will introduce us to Hidden Markov Models.


The HMM is a very flexible model and as such is applicable to a wide variety of longitudinally collected data. For example, one can extract student behaviour states from MOOC data and investigate the composition of the different learning states, and the transitions between the different learning states. Or one can extract sleep states based on EEG measurements, and subsequently compare the duration of, and transitions between, different sleep states for patients which do and do not suffer from insomnia.


08/03/2018 – A Probabilistic Active Learning Approach for Learning from Data with Limited Supervision

You can find the slides of the presentation Georg gave here (.pdf, 3MB).

Georg explaining the results in his graphs

Original announcement:

Thursday 08/03/2018 at 15:00 in room B1.09

The speaker for this meeting will be Georg Krempl, who will talk about an approach for learning from data with limited supervision. Here is a shortened abstract:

Machine learning has become widely used throughout commerce, science, and technology. However, the ever increasing volumes of data are contrasted by various constraints, such as limited supervision, processing or storage capacities. This requires techniques to optimise the allocation of these capacities.

Active machine learning aims to provide techniques for selecting the most insightful information (like label annotations of data instances) to be queried from oracles (like human supervisors).

In this talk, I will present our recently developed probabilistic active learning approach PAL. This decision-theoretic approach combines the fast asymptotic runtime of popular heuristics like uncertainty sampling with a direct optimisation of the expected gain in classification performance.

I will conclude this presentation by demonstrating the use of PAL in different active learning scenarios, ranging from label selection in large data pools and evolving data streams to broader settings such as active class selection.


22/02/2018 – Discovering Causal Structure with the PC Algorithm

The meeting was a success! Click here (.pdf, 2MB) to download the updated slides with a worked-out example of how the PC algorithm works, and contact OisĂ­n Ryan (mailto:o.ryan@uu.nl) if you’d like to know more about the Causality Reading Group.

OisĂ­n explaining the PC algorithm

Thursday 22/02/2018 at 15:00 in room B1.09

For this meeting, MSDSlab is teaming up with its sister Causality Reading Group, organized by OisĂ­n Ryan. OisĂ­n will explain the background and implementation of the PC algorithm and show how it can discover causal structure in a network of variables through smart use of conditional independence rules.

Source: pcalg package vignette, p.3 (Kalisch, Mächler, Colombo, Hauser, Maathuis, Bühlmann)

Activity: We will also be trying out the PC algorithm and its interpretation on a real dataset, if you’d like to join this activity, bring your laptop!

Preparation: install R, Rstudio, the pcalg package

Optional reading: Causation, Prediction and Search, Chapter 5 (freely available here: http://cognet.mit.edu/book/causation-prediction-and-search)

08/02/2018 – Text Mining

Ayoub introducing text mining

Thursday 08/02/2018 at 15:00 in room B1.09

This meeting, Ayoub Bagheri (M&S, UMCU) gave an introduction to text mining in general and some of his work in particular. Here’s a quote:

Text mining is the process of analyzing natural language text looking for useful and unknown patterns. In other words, text mining is the art of turning free text into numerical variables and then mining them with statistical techniques and learning algorithms.


Preparation: none

See you there!

Introducing: MSDSlab Sessions

Introducing: MSDSlab Sessions

Data science is not only attending talks and discussing methods, but also analysing datasets and practical problem-solving. A true data scientist knows which methods to use as well as how to use them in an efficient way. This is why we are introducing MSDSlab Sessions.


In the MSDSlab Sessions, a session leader works on their own data science related project, such as web scraping, analysis building, or text mining, and others can simply join in. The session is entirely up to the participants: they may all want to work on the same thing together, or perhaps they divide a project into different parts. The sessions last one hour, with some extra time reserved for wrapping up the results.

Practical info
Thursdays at 11:00
Sjoerd Groenmangebouw B1.01