Thursday 21/06/2018 at 15:00 in room B1.09
The next MSDSlab meetings will be next week Thursday and will be presented by Erik-Jan van Kesteren of Utrecht University. He will provide a brief overview and an interactive workshop on the magical mystery that is recursion. We’ll go over the basics, when it’s useful, and we will also program some recursive functions.
Preparation: bring your laptop and your programming language of choice. For the illustration I will use R and RStudio.
Lianne’s PPT can now be downloaded here.:
Monday 11/06/2018 at 15:00 in room B1.09
The next MSDSlab meetings will be on Monday (Instead of Thursday) the 11th of June and will be presented by Lianne Ippel from Maastricht University. Lianne will present on two themes within the topic of learning from partitioned data.
- Row-by-row (streaming) learning (horizontal) and
- Privacy preserving machine learning (vertical).
Over the last decade, social research workflow has greatly changed. While previously data were often collected using paper-and-pencil questionnaires, nowadays data are often collected using webpages and smartphone applications. This change in gathering data has had many consequences, though in this talk I focus in particular on the partitioning of data. I will discuss two types of partitioned data. Horizontally partitioned data implies that the same variables are available for each respondent, however, not all respondents are available in one central place (e.g., like streaming data). On the other hand, vertically partitioned data means that the same respondents are available at different sites, or institutes. However, each site can have its own set of features, which might or might not be sharable with other sites, e.g., due to the sensitive nature of the features. For these non-sharable features, privacy-preserving data mining/machine learning techniques are required. While discussing this, your input at this part of the talk will be much appreciated!
Lianne Ippel recently started as a Postdoctoral researcher at the Institute for Data Science at Maastricht University. She received her PhD degree from Tilburg University for her thesis “Multilevel Modeling for Data Streams with Dependent Observations”, for which she won ‘Best Thesis Award’ at the General Online Research conference in Cologne (2018). Her research interests are centered around ethical and responsible use of Machine Learning and Machine learning models in relation to methodological issues such as response style, measurement invariance, and missing data.
Thank you again to Veronika and everyone who was present. Veronika’s PPT can now be downloaded here.
Thursday 31/05/2018 at 15:00 in room B1.09
The next MSDSlab meetings will be this Thursday by Veronika Cheplygina (Eindhoven University of Technology) who will present on the possibilities of crowdsourcing of Medical Image Analysis in an interactive MSDSlab.
Machine learning (ML) has vast potential in medical image analysis, improving possibilities for early diagnosis and prognosis of disease. However, ML needs large amounts of representative, annotated examples for good performance, which may not always be possible with medical images. In this talk I will discuss how crowdsourcing is being used to address this problem. I will cover several existing approaches that do this, as well as discuss (what I think is) a promising alternative. At the end there will be an opportunity to play with some data to investigate this claim.
Veronika Cheplygina is an assistant professor at the Medical Image Analysis group, Eindhoven University of Technology since February 2017. She received her Ph.D. from the Delft University of Technology for her thesis “Dissimilarity-Based Multiple Instance Learning“ in 2015. As part of her PhD, she was a visiting researcher at the Max Planck Institute for Intelligent Systems in Tuebingen, Germany. From 2015 to 2016 she was a postdoc at the Biomedical Imaging Group Rotterdam, Erasmus Medical Center. Her research interests are centered around learning scenarios where few labels are available, such as multiple instance learning, transfer learning, and crowdsourcing. Next to research, Veronika blogs about academic life at http://www.veronikach.com
Thursday 24/05/2018 at 15:00 in room A.308
Adriënne Mendrik of the Netherlands eScience Center will give a presentation on Grand Challenge Design for Medical Image Analysis. This meeting will be held at a slightly different location than usual at Sjoerd Groenmangebouw A3.08.
Preparation: Have a look at https://grand-challenge.org/All_Challenges/ which gives an overview of all challenges organized in medical image analysis.
Thursday 17/05/2018 at 15:00 in room B1.09
Thomas Debray from the UMCU will host the next MSDSlab. He will discuss how we can investigate, quantify and improve the generalizability of prediction models by utilizing big datasets from e-health records or meta-analyses with individual participant data.
Preparation: Have a look at the background readings
Clinical prediction models (CPM) are an important tool in contemporary medical decision making and abundant in the medical literature. These models estimate the probability/risk that a certain condition is present or will occur in the future by combining information from multiple variables (predictors) from an individual, e.g. predictors from patient history, physical examination or medical testing. Unfortunately, many CPM predict much worse than anticipated during their development. A major reason for unsatisfactory performance and limited use in clinical practice is that they are typically developed from relatively small datasets, and subsequently used in populations/settings too different from the original development population/setting, without proper validation and adaptation to the new situation.
Background literature (assessing generalizability of clinical prediction models)
All are optional. For novices I would recommend the BMJ and PLOS MED paper.
- Riley RD, et al. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ. 2016;353:i3140. (Riley2016a)
- Debray TPA, et al. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol. 2015;68(3):279–89. (debray_new_2015)
- Debray TPA, et al. Individual Participant Data (IPD) Meta-analyses of Diagnostic and Prognostic Modeling Studies: Guidance on Their Use. PLoS Med. 2015;12(10):e1001886. (Debray2015c)
- Debray TPA, et al. A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis. Stat Med. 2013 Aug 15;32(18):3158–80. (Debray2012b)
Paul’s presentation and code can now be found the MSDSlab Github page.
Thursday 03/05/2018 at 15:30 in room B1.09
Paul Vierthaler, a university lecturer at Leiden University in the Digital Humanities, will discuss the methodological approaches he takes in his research on late Imperial Chinese literature. Paul studies the relationships among historical and fictional documents written in late Ming and early Qing China (1550 to 1700) at the corpus level. To do this, he uses a variety of methods developed by linguists, computer scientists, and biologists. In his talk, Paul will cover stylometric analysis and an intertextuality detection algorithm based on the bioinformatics algorithm BLAST (Basic Local Alignment Search Tool). While this talk will ground the methodology in specific research questions, he will mainly focus on describing his approach to blending information retrieval with literary studies.
This talk will start 30 minutes later than our regular starting time!
Preparation: These are some suggested, but not essential, readings:
Meys’ slides are now available here.
Thursday 26/04/2018 at 15:00 in room A3.17
Wouter Meys, of the Amsterdam based Citizen Data Lab will give a talk on data visualization. The Citizen Data Lab consists of ‘interdisciplinary teams of researchers, programmers, and designers working on the mapping of urban issues. They develop tools and methods for participatory data collection, visualization and interpretation.’
Preparation: Bring laptop with R installed