21/06/2018: Recursion

Thursday 21/06/2018 at 15:00 in room B1.09

The next MSDSlab meetings will be next week Thursday and will be presented by Erik-Jan van Kesteren of Utrecht University.  He will  provide a brief overview and an interactive workshop on the magical mystery that is recursion. We’ll go over the basics, when it’s useful, and we will also program some recursive functions.

 

Preparation: bring your laptop and your programming language of choice. For the illustration I will use R and RStudio.

Erik-Jan.

 

Advertisements

11/06/2018: Learning from Partitioned Data

Lianne’s PPT can now be downloaded here.:

Photo Lianne.

Original post:

Monday 11/06/2018 at 15:00 in room B1.09

The next MSDSlab meetings will be on Monday (Instead of Thursday) the 11th of June and will be presented by Lianne Ippel from Maastricht University. Lianne will present on two themes within the topic of learning from partitioned data.

  1. Row-by-row (streaming) learning (horizontal) and
  2. Privacy preserving machine learning  (vertical).

Preparation:

Abstract
Over the last decade, social research workflow has greatly changed. While previously data were often collected using paper-and-pencil questionnaires, nowadays data are often collected using webpages and smartphone applications. This change in gathering data has had many consequences, though in this talk I focus in particular on the partitioning of data. I will discuss two types of partitioned data. Horizontally partitioned data implies that the same variables are available for each respondent, however, not all respondents are available in one central place (e.g., like streaming data). On the other hand, vertically partitioned data means that the same respondents are available at different sites, or institutes. However, each site can have its own set of features, which might or might not be sharable with other sites, e.g., due to the sensitive nature of the features. For these non-sharable features, privacy-preserving data mining/machine learning techniques are required. While discussing this, your input at this part of the talk will be much appreciated!

Bio
Lianne Ippel recently started as a Postdoctoral researcher at the Institute for Data Science at Maastricht University. She received her PhD degree from Tilburg University for her thesis ‚ÄúMultilevel Modeling for Data Streams with Dependent Observations‚ÄĚ, for which she won ‚ÄėBest Thesis Award‚Äô at the General Online Research conference in Cologne (2018). Her research interests are centered around ethical and responsible use of Machine Learning and ¬†Machine learning models in relation to methodological issues such as response style, measurement invariance, and missing data.

LI picture

31/05/2018: Crowdsourcing for Medical Image Analysis

Thank you again to Veronika and everyone who was present. Veronika’s PPT can now be downloaded here.

Original post:

veronika

Thursday 31/05/2018 at 15:00 in room B1.09

The next MSDSlab meetings will be this Thursday by Veronika Cheplygina (Eindhoven University of Technology) who will present on the possibilities of crowdsourcing of Medical Image Analysis in an interactive MSDSlab.

Preparation:

Abstract
Machine learning (ML) has vast potential in medical image analysis, improving possibilities for early diagnosis and prognosis of disease. However, ML needs large amounts of representative, annotated examples for good performance, which may not always be possible with medical images. In this talk I will discuss how crowdsourcing is being used to address this problem. I will cover several existing approaches that do this, as well as discuss (what I think is) a promising alternative. At the end there will be an opportunity to play with some data to investigate this claim.

Bio
Veronika Cheplygina is an assistant professor at the Medical Image Analysis group, Eindhoven University of Technology since February 2017. She received her Ph.D. from the Delft University of Technology for her thesis “Dissimilarity-Based Multiple Instance Learning‚Äú in 2015. As part of her PhD, she was a visiting researcher at the Max Planck Institute for Intelligent Systems in Tuebingen, Germany. From 2015 to 2016 she was a postdoc at the Biomedical Imaging Group Rotterdam, Erasmus Medical Center. Her research interests are centered around learning scenarios where few labels are available, such as multiple instance learning, transfer learning, and crowdsourcing. Next to research, Veronika blogs about academic life at http://www.veronikach.com

 

24/05/2018 – Grand Challenge Design for Medical Image Analysis – Sharing Data, Metrics and Ground Truth for Algorithm Evaluation

Thursday 24/05/2018 at 15:00 in room A.308

 

Adri√ęnne Mendrik¬†of the¬†Netherlands eScience Center will give a presentation on¬†Grand Challenge Design for Medical Image Analysis.¬†¬†This meeting will be held at a slightly different location than usual at¬†Sjoerd Groenmangebouw¬†A3.08.

Preparation: Have a look at  https://grand-challenge.org/All_Challenges/  which gives an overview of all challenges organized in medical image analysis.

 

17/05/2018 – Better predictions using big(ger) data sets

Thursday 17/05/2018 at 15:00 in room B1.09

Thomas Debray from the UMCU  will host the next MSDSlab. He will discuss how we can investigate, quantify and improve the generalizability of prediction models by utilizing big datasets from e-health records or meta-analyses with individual participant data.

Preparation: Have a look at the  background readings

Abstract
Clinical prediction models (CPM) are an important tool in contemporary medical decision making and abundant in the medical literature. These models estimate the probability/risk that a certain condition is present or will occur in the future by combining information from multiple variables (predictors) from an individual, e.g. predictors from patient history, physical examination or medical testing. Unfortunately, many CPM predict much worse than anticipated during their development. A major reason for unsatisfactory performance and limited use in clinical practice is that they are typically developed from relatively small datasets, and subsequently used in populations/settings too different from the original development population/setting, without proper validation and adaptation to the new situation.

Background literature (assessing generalizability of clinical prediction models)
All are optional. For novices I would recommend the BMJ and PLOS MED paper.

  • Riley RD, et al. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ. 2016;353:i3140. (Riley2016a)
  • Debray TPA, et al. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol. 2015;68(3):279‚Äď89. (debray_new_2015)
  • Debray TPA, et al. Individual Participant Data (IPD) Meta-analyses of Diagnostic and Prognostic Modeling Studies: Guidance on Their Use. PLoS Med. 2015;12(10):e1001886. (Debray2015c)
  • Debray TPA, et al. A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis. Stat Med. 2013 Aug 15;32(18):3158‚Äď80. (Debray2012b)

 

03/05/2018 – Digital Humanities and Text Mining: Stylistic and Intertextual Analysis of Large Corpora

Paul’s presentation and code can now be found the MSDSlab Github page.¬†IMG_20180503_155420437

Original post:

Thursday 03/05/2018 at 15:30 in room B1.09


Paul Vierthaler, a university lecturer at Leiden University in the Digital Humanities,  will discuss the methodological approaches he takes in his research on late Imperial Chinese literature. Paul studies the relationships among historical and fictional documents written in late Ming and early Qing China (1550 to 1700) at the corpus level. To do this, he uses a variety of methods developed by linguists, computer scientists, and biologists. In his talk, Paul will cover stylometric analysis and an intertextuality detection algorithm based on the bioinformatics algorithm BLAST (Basic Local Alignment Search Tool). While this talk will ground the methodology in specific research questions, he will mainly focus on describing his approach to blending information retrieval with literary studies.

This talk will start 30 minutes later than our regular starting time!

Preparation: These are some suggested, but not essential, readings:

 

 

PV.png

26/04/2018 – Visualizing (not so) Big Data

Meys’ slides are now available here.

Original post:

Thursday 26/04/2018 at 15:00 in room A3.17


Wouter Meys, of the Amsterdam based Citizen Data Lab will give a talk on data visualization.¬† The¬†Citizen Data Lab consists of ‚Äėinterdisciplinary teams of researchers, programmers, and designers working on the mapping of urban issues. They develop tools and methods for participatory data collection, visualization and interpretation.‚Äô¬†

 Preparation: Bring laptop with R installed