Skip to main content

Table 3 Number of entries, sessions, and samples used throughout this paper. Based on reading app data

From: Both sides of the story: comparing student-level data on reading performance from administrative registers to application generated data from a reading app

Number of

Processing step

entries

 

22,002,522

raw data

21,995,183

removed empty/defect IDs

6,766,166

removed impossible/idle entries (1000>words_read>1 and 2000>words_per_minute>1 and 6010>durations_seconds>1)

6,034,974

removed entries before August 2019 and after May 2020

5,021,219

removed words per minute less than 15 and higher than 600 (idling and skipping)

4,689,763

removed reading speed outliers according to 1 interquartile range

4,462,661

removed words per minute outliers according to 1.5 interquartile range

4,218,061

removed words read outliers according to 1.5 interquartile range

4,039,967

removed seconds duration outliers according to 1 interquartile range

2,157,156

after session detection, see Sect. 2.2

sessions

 

215,521

total sessions detected, see Sect. 2.2

22,056

for comparison with national test used in Sect. 3.1 (only users: with more than 10 sessions; in grade 3, 5, and 7; active in September to November 2019)

209,952

for improvement over time used in Sect. 3.2 (removed users that have fewer than four entries in four months)

samples

session aggregates per user

1542

for model to predict reading speed in Sect. 3.3

1236

training data used to fit the models

308

test data to evaluate the models