Abstract
Trees can be used to visualize decision rules for classifications, and students may have encountered trees for different purposes in the mathematics or computer science classroom already. Everyday decisions can be supported by using simple decision trees. A new idea for students is to use trees for predictive modeling (classification) in multivariate data sets.
The human construction of trees has to be based on insights into the data and its context. Based on such experiences, algorithms for the automatic creation of trees can be developed and critically evaluated. Essential elements of predictive modeling such as the distinction between training and test data, overfitting, consequences of bias in the data (random or systematic sources), different evaluation criteria based on the confusion matrix can be discussed.
We have developed material and educational guidelines for their use for several educational levels (grade 5/6, 9/10, and 11/12). We use different computational tools: Codap (codap.concord.org) as a web-based, easy-to-use data exploration tool that has a plug-in for creating decision trees used for a start. Various types of Jupyter Notebooks, based on Python, require different levels of coding skills from the students. As a rule, we start with unplugged activities at all levels. For instance, we have developed a decision game with data cards for young kids. In their basic version, the Jupyter Notebooks appear menu-driven. In their advanced version, students get worked examples for computational essays that they can adapt for their own data and predictive modeling problems, including the adaptation and enhancement of code. All notebooks use libraries for data exploration and decision tree machine learning that we have adapted for educational purposes from professional sources.
Students encounter various multivariate data sets. These include data on nutrition values of food, data on (social) media use of adolescents, and data from medicine on heart diseases. In addition, parking lot occupancy data from their town were used, where predictive modeling is applied to help reducing parking search traffic and related emissions.
We will present some of our materials and the first results from studies in the classroom where we used the material.
Bio Rolf Biehler
Bio Yannik Fleischer
Yannik Fleischer is a PhD student in mathematics education research at Paderborn University, Germany.
His main research interest is developing a conception for teaching machine learning methods in school with a focus on decision trees, and to evaluate this by developing and examining teaching materials in practice. Since 2019, he has been teaching year-long project courses on data science in upper secondary and developing, implementing, and evaluating teaching modules for different levels in secondary school, mainly about machine learning with decision trees.