Session 02 – Part 2: Rolf Biehler and Yannik Fleischer (Germany)

Bringing together statistics and computer science education: Machine learning by decision trees grounded in students’ data exploration experiences

24.11.2021
17.30-18.30

(UTC+1)

Abstract

Trees can be used to visualize decision rules for classifications, and students may have encountered trees for different purposes in the mathematics or computer science classroom already. Everyday decisions can be supported by using simple decision trees. A new idea for students is to use trees for predictive modeling (classification) in multivariate data sets.

The human construction of trees has to be based on insights into the data and its context. Based on such experiences, algorithms for the automatic creation of trees can be developed and critically evaluated. Essential elements of predictive modeling such as the distinction between training and test data, overfitting, consequences of bias in the data (random or systematic sources), different evaluation criteria based on the confusion matrix can be discussed.

We have developed material and educational guidelines for their use for several educational levels (grade 5/6, 9/10, and 11/12). We use different computational tools: Codap (codap.concord.org) as a web-based, easy-to-use data exploration tool that has a plug-in for creating decision trees used for a start. Various types of Jupyter Notebooks, based on Python, require different levels of coding skills from the students. As a rule, we start with unplugged activities at all levels. For instance, we have developed a decision game with data cards for young kids. In their basic version, the Jupyter Notebooks appear menu-driven. In their advanced version, students get worked examples for computational essays that they can adapt for their own data and predictive modeling problems, including the adaptation and enhancement of code. All notebooks use libraries for data exploration and decision tree machine learning that we have adapted for educational purposes from professional sources.

Students encounter various multivariate data sets. These include data on nutrition values of food, data on (social) media use of adolescents, and data from medicine on heart diseases. In addition, parking lot occupancy data from their town were used, where predictive modeling is applied to help reducing parking search traffic and related emissions.

We will present some of our materials and the first results from studies in the classroom where we used the material.

Bio Rolf Biehler

Dr. Rolf Biehler is professor for didactics of mathematics at Paderborn University. His research interests include probability, statistics and data science education, university mathematics education and the professional development of mathematics teachers. He was a co-founder and co-director of the Centre for Research in University Mathematics Education. He is engaged in the International Association of Statistics Education (IASE) and has worked as an editor or editorial board member in several international journals and book series for mathematics education. He is currently co-directing the Project Data Science and Big Data at School.

Bio Yannik Fleischer

Yannik Fleischer is a PhD student in mathematics education research at Paderborn University, Germany.

His main research interest is developing a conception for teaching machine learning methods in school with a focus on decision trees, and to evaluate this by developing and examining teaching materials in practice. Since 2019, he has been teaching year-long project courses on data science in upper secondary and developing, implementing, and evaluating teaching modules for different levels in secondary school, mainly about machine learning with decision trees.

Recording & Slides & Additional Material

24.11.2021
17.30-18.30

(UTC+1)