Data Detectives at Work

Data exploration of multivariate data on the media use of young people

Core idea

This teaching unit is about data science content for grades 8-10 (age 14+). Using a fictitious example of an online platform that wants to tailor advertising to young people, pupils are motivated to search for traces and patterns in a data set as data detectives in order to advise the online platform.

Worksheets, PowerPoint presentations, instructions, the YOU-PB dataset, a list of variables, and an overview of the lessons are available. The CODAP software (codap.concord.org), which is freely available on the Internet, is used for data analysis.

The project works with data from over 1000 pupils who have provided information on many varia-bles in the leisure and media sector (YOU-PB stands for "youth media usage in the area of Pader-born”). The data set for this series of lessons is available in different versions. On the one hand, a reduced version with 50 variables is available. On the other hand, the full version with over 160 vari-ables can be used in lessons. It is recommended to use the data set with the reduced number of vari-ables for better clarity. Multivariate and interesting discoveries are possible here! However, the teacher has the option of differentiating according to the ability of the class/course, or to provide an internally differentiated offer for particularly able pupils (see below).

The link to the data set used in the project (50 variables):

https://tinyurl.com/you-pb-50en

The series of lessons comprises approx. 8 lessons. In the first lessons, the students are introduced to the data and working with CODAP. The core of the series of lessons is independent data science pro-ject work by the students in lessons 5 and 6 with presentations in lesson 7. The project work takes place in small groups in which the students work independently as data science experts with the data set. For this purpose, they assign themselves to four groups of the YOU-PB data, analogous to the customer wishes of the online platform (this assignment takes place in lesson 4):

  • Customer 1 wants to promote TikTok,
  • Customer 2 wants to promote LetsPlay_YouTube videos,
  • Customer 3 would like to advertise online newspapers,
  • Customer 4 would like to advertise fixed game consoles.

In the last hour, there is a reflection on the procedure for data exploration and personal and social aspects can be discussed. In addition, the topic of data cleansing, which takes up a lot of time in the work of real data scientists, can be addressed here as an excursus and worked on in CODAP.

Target group

Mathematics from grade 8 (all school types)

Computer science from grade 8 (all school types)

Recommendation: From grade 9 on

Prior knowledge

Percentages

Time scope

 8 to 10 lessons of 45 minutes each

Goals

Statistical learning objectives
  • Learners explore multivariate data
  • Learners carry out group comparisons
  • Learners use row, column, and cell percentages and interpret the results
Computer science learning objectives
  • Learners gain an understanding of the structure and functioning of IT systems
  • Learners interpret data as information
  • Learners discuss social responsibility and the impact of the use of data
Media learning goals
  • Learners use digital tools to analyze data
  • Learners create presentations and practice presenting

Lesson overview

Lesson 1: Introduction to the project and the data
Lesson Content Goals Material
1

Introduction
This lesson introduces the project "Data detectives at work". For this purpose, the complete framework of the lesson series is presented by the teacher via presentation 1. In two work phases, the students are introduced to the available survey data and the data analysis in CODAP.
The document Lesson 1 overview provides design tips for this lesson.
Students explore the data set independently using Arbeitsblatt 1 .

Didactic notes
The introduction to CODAP can be done by means of a video or demonstration by the teacher(see lesson 1 overview)
The document Teacher notes contains an overview of how to analyze categorical variables with CODAP and possible student difficulties.

Option
At APPCamps, there is learning material on leaving data, which can be used as an introduction to discussing "data traces": https://drive.google.com/file/d/1eb0_qlnKP-63H8El0OIsDUg3xXgDtVUA/view

Students inde-pendently explore the YOU-PB data with CODAP

Lesson 1 overview
Presentation 1
Worksheet 1
Worksheet 1 solution
Teacher notes

List of variables YOU-PB

Instruction CODAP
or (in German)
https://YOU-PBtu.be/2z5H4anfhWM
(ca. 5 min)

Lesson Content Goals Material
2-3

The aim of these two lessons is for students to gain initial experi-ence as data science experts so that they can then work inde-pendently. The focus here is particularly on working with cate-gorical data.
If not already done, worksheet 1 can be discussed first.

Introduction to necessary basic concepts and building up ex-pectations
Students receive information about basic statistical terms. Using worksheet 2 , the students transfer the basic statistical terms they have learned to the data set at hand.

CODAP and analysis methods
Evaluation options regarding different percentages (row per-centages, column percentages and cell percentages, see Teacher Notesand associated statements are developed.
The first step is to "model the data" thematically. Variables of interest are recoded so that the values are reduced from e.g. seven (daily, several times a week, ... never) to two (e.g. often, rarely). This simplifies the evaluations.

Overview of the worksheets
There are instructionsinstructions that students can use to work out the content them-selves. Alternatively, the contents of the instructions should be worked out together with the teacher. There are also worksheetsthat should be worked on and discussed in plenary with a view to the later project work. Different distributions are compared with each other in the worksheets. The worksheets are structured in stages. It is important that the correlations found only apply to the people in the sample and can therefore be easily generalized because the sample (the data) was not collected in a representa-tive manner.

The instructions:
Instruction 1: Students learn to recode variables from seven expres-sions to two expressions
Instruction 2: Students learn the different evaluation options: Row, column and cell percentages
Instruction 3: Students learn to hide data in order to carry out anal-yses for subgroups
Glossary: : Important technical terms are briefly explained here

The worksheets:
Worksheet 3: two binary distributions (distributions with two values) are compared with each other
Worksheet 4: a binary distribution is set in relation to a distribution with 7 values. Either recoding is used here (recommend-ed) or precise statements must be made
Worksheet 5 (advanced): two seven-level distributions are com-pared with each other. The following should be recoded
worksheet 6 (optional, e.g. bonus): understanding percentages

Didactic notes
The presentation 2 for session 2+3 can be used as background information for the teacher or for presentation in class to accompany worksheets 3-6a.

Several teaching tests have shown that recoding variables (seven different variables are reduced to two) is a useful procedure and at the same time addresses modeling. This is explained in instruction 1 .
We suggest starting the exploration with the instructions 1 and then continuing with Instruc-tions 2.

One differentiation option here is again to have motivated stu-dents carry out evaluations with the binary variables and addi-tionally with the variables with seven values (worksheets 3-5 can be completed in both ways).
Depending on the course/class, the students can work out the evaluation options themselves using the instructions and apply them to worksheets 3-5 Alternatively, the teacher can use the PowerPoint to introduce the analysis methods and the students then work on the work-sheets. This is the necessary prerequisite for independent stu-dent exploration in the following lessons. Experience has shown that students need support when working on and interpreting the percentage evaluations with row, column or cell percentages. worksheet 6 can also provide an introduction to the discussion.

Technology
In this lesson, students should also be shown how they can copy graphics from CODAP into a Word file or a PowerPoint presenta-tion. Sharing the CODAP document via a link may also be a good way of documenting and checking students' work.

Students manipu-late data by re-coding the variables

Students establish relationships between two categorical variables and distinguish between row, column and cell percentages

Presentation 2 (optional)

Instruction 1
(https://YOU-PBtu.be/qcK_ZZsWfbQ)
Instruction 2
Instruction 3

Worksheet 3
Worksheet 4
Worksheet 5
Worksheet 6 (bonus)

YouTube (German)
https://youtu.be/otLuX8hhtq8

Lesson Content Goals Material
4

Erwartungshaltung aufbauen und passende Fragen stellen
In this lesson, the students are divided into small groups for the rest of the lesson and assigned to four different content areas of the data/customers of the online platformworksheet 7).Each group should consist of four students so that the subse-quent think-pair-share phase works well.
Experience has shown that posing suitable (statistical) questions for evaluating the data is a hurdle for students, which is why an-other focus of this lesson is to have the students work out suita-ble questions in small groups using Worksheet 8 The Think-Pair-Share method is used for this. Worksheet 8 is used for a theoretical discussion of the data and the list of varia-bles in order to build up expectations for the coming lessons. These expectations are central to the students' own data analysis and should be recorded on posters. The posters created for this purpose should be visible to all students in the classroom during the rest of the lesson.

Didactic notes
Presentation 3 summarizes the content of the last lesson and this one.
Teacher notes 4 provides instructions for this lesson.

Students ask suita-ble statistical ques-tions

Worksheet 7
Worksheet 8

Presentation 3

Teacher notes 4

Posters

Lesson Content Goals Material
5-6

Project work in small groups
In these two lessons, the data exploration procedure is first planned by the small groups(worksheet 9).The data exploration then takes place in the small groups(worksheet 10).and the presentation of the results is prepared (with template presentation).At the beginning of the sixth lesson, criteria for good statistical presentations can possibly be discussed with (instruction presentation).in order to prepare the preparation of the presentations. In les-sons 5-6, the teacher is mainly available to answer questions and make suggestions, while the students work as independently as possible with CODAP and PowerPoint.

Didactic notes
Various trials have shown that two additional double lessons and, if necessary, homework can also be used for this project phase.

Students explore the data according to their own questions

Students document their explorations in PowerPoint

Students prepare a presentation

Worksheet 9
Worksheet 10

instruction presentation).

Template presentation

Lesson Content Goals Material
7

Presentations by the small groups
In this lesson, the presentations of the student groups take place. One group can always be given a special feedback task to stimu-late a discussion of the content (worksheet 11).Using worksheet 12,which is well suited as homework and for securing results, the students can check whether they can carry out a data analysis.

Students present the results of their project work

Students give feed-back on other presentations

Presentations by students

Worksheet 11
Worksheet 12

Lesson Content Goals Material
8

Reflect
In this lesson, a joint reflection of the entire project takes place. The individual steps of the data analysis carried out are assigned to the stages of the PPDAC cyclePresentation 8).
Furthermore, the personal and social effects of data exploration can be addressed, and an attempt can be made to think "outside the box."

Students reflect on the data analysis process

Students reflect on the social impact of data exploration

Presentation 8

Phase Content Goals Material

For research purposes, we ask that students complete an anonymous survey at the end of the lesson series and give feedback on how they liked the les-son series. The link to the survey is: https://umfrage-ddi.cs.uni-paderborn.de/limesurvey/index.php/545222?lang=de

Lesson Content Goals Material
optional

Data cleansing as detective work
The area of data cleansing can be discussed as an excursus in a separate lesson. The CODAP environment shown opposite con-tains the uncleaned YOU-PB dataset. There is also a text field with explanations and initial steps on how data cleansing can be carried out with the help of CODAP.

http://tinyurl.com/you-pb-160en en Textbox below "Detective work"

tinyurl.com/you-pb-160en

CODAP: Common Online Data Analysis Platform

COCODAP (codap.concord.org) is a free, browser-based software designed for use from grade 3 onwards. It supports students in learning and using statistics and data science. No registration is required to use the platform.

. Toolkit dieser Website gibt es viel Lernmaterial zur Arbeit mit CODAP.

Susanne Podworny's YouTube channel offers further tutorial videos on how to use CODAP (in German): https://youtube.com/playlist?list=PLzhRG7IPqbqvKH5X6TIIfF8IzATgxHjQP&feature=shared

CODAP is available in many languages. If the language is not set to your preferred language, the language can be changed at the top right.

The YOU-PB data

The YOU-PB data used here comes from a Paderborn replication study of the official JIM study, which replicates these results locally and makes them available for analysis. Since 1998, the JIM study has been conducting annual surveys on the media use of 12- to 19-year-olds in order to identify trends and developments in the digital behaviour of young people and to derive new impulses for education and culture.

The data is available in CODAP in two English versions (and four German versions).

We recommend version 1 for the classroom:

Versionens of the data:

[German only] 50 variables, binary coded https://tinyurl.com/you-pb-50binaer
(Long: https://codap.concord.org/app/static/dg/en/cert/#shared=httpscfm-shared.concord.orgzU4zLbVRo8NFLd26MxI5file.json) https://codap.concord.org/app/static/dg/de/cert/index.html#shared=https%3A%2F%2Fcfm-shared.concord.org%2F9IuhADJvA994rSs6MLct%2Ffile.json)

English: 50 variables, up to 7 categories https://tinyurl.com/you-pb-50en
(Long: https://codap.concord.org/app/static/dg/en/cert/#shared=httpscfm-shared.concord.orgzU4zLbVRo8NFLd26MxI5file.json) https://codap.concord.org/app/static/dg/de/cert/index.html#shared=https%3A%2F%2Fcfm-shared.concord.org%2FH94YlIcxyFEYY9Ou8ZeL%2Ffile.json)

English: 160 variables, up to 7 categories: http://tinyurl.com/you-pb-160en
(Long: https://codap.concord.org/app/static/dg/en/cert/#shared=httpscfm-shared.concord.orgzU4zLbVRo8NFLd26MxI5file.json) https://codap.concord.org/app/static/dg/de/cert/index.html#shared=https%3A%2F%2Fcfm-shared.concord.org%2FDohqN1dGy7r2mf9iqL2j%2Ffile.json)

[German only]: 160 variables: binary coded: https://tinyurl.com/you-pb-160binaer
(Long: https://codap.concord.org/app/static/dg/en/cert/#shared=httpscfm-shared.concord.orgzU4zLbVRo8NFLd26MxI5file.json) https://codap.concord.org/app/static/dg/de/cert/index.html#shared=https%3A%2F%2Fcfm-shared.concord.org%2FstEcYRejbsfAPmTy9a26%2Ffile.json)

Further information

Opportunity for differentiation

The series of lessons is designed to work with a "small" data set. This contains 50 variables and offers plenty of opportunities for exploration.

For particularly motivated students, it is also possible to work with the large data set instead, which contains all 160 variables that were collected in the sur-vey. However, this requires a high level of commitment and good work with the list of variables on the part of the students!

Another possibility for differentiation is to have particularly motivated students work with the "normal" variables with all seven values. The standard case should be working with binary variables, i.e. variables that have previously been recoded by the pupils, as described in Instruction_recoding in lesson 2+3.

How to deal with the tasks

In the series of lessons, a lot of work is done with worksheets. In order to document the learning process and at the same time keep motivation high, the tasks can also be worked on directly in a PowerPoint presentation. New tasks can be worked on new slides and at the end, results can be taken from the various lessons to create the final presentation.

Participation in the survey (German only)

Anyone who would like to take part in the survey with their class is welcome to do so. The data will be collected completely anonymously (clarify with the school management if necessary). The data will then be included in a new edition of the data set every year, which will remain accessible via a CODAP link.

Students can take part in the survey via this link (note: 161 questions!, didicate sufficient time): http://go.upb.de/JIM-Umfrage http://go.upb.de/YOU-PB-Umfrage

Kurzübersicht über die Inhalte der Unterrichtsstunden

Citation:

Podworny, S., Frischemeier, D., Stroop, D., Biehler, R., Fleischer, Y., Schulte, C., Höper, L. & Hüsing, S. (2025). Datendetektiv:innen bei der Arbeit. . https://www.prodabi.de/materialien/datendetektiv_innen/

Published on 13.02.2025

Version: 4

License note:

Creative Commons Attribution-ShareAlike (CC BY-SA 4.0)

Authors: Susanne Podworny, Daniel Frischemeier, Dietlinde Stroop,
Nach oben scrollen