Skip to main content

Table 1 Data set characteristics

From: Behavioral attributes and financial churn prediction

Data Set Source Observation Win. Labeling Win. # of TXs # of Cust. Label Sets Churn (%)
A1 Sample A 07/2014–06/2015 07/2015–11/2015 8.5M / 3.3M 55K SB 1.97
A2 Sample A 07/2014–03/2015 04/2015–06/2015 6.3M / 2.4M 53K SB, CC, CA 0.99
B1 Sample B 07/2014–06/2015 07/2015–11/2015 4.2M / 2.6M 43K SB 2.27
B2 Sample B 07/2014–03/2015 04/2015–06/2015 3.1M / 1.9M 42K SB, CC, CA 1.42
  1. Based on samples A and B, four data sets with different characteristics have been generated. The summary includes the sampling source of the data set, observation window for feature generation, labeling window for churn decision of the customers, count of all transactions and the transactions with POS location information (# of TXs), number of customers (# of Cust.), the label sets generated for the related data set, where SB, CC, and CA stand for segmentation-based, credit card usage-based, and checking account usage-based labeling, and finally percentage of churning customers in the data set (Churn(%)) according to label inac-full. The transaction and customer counts represent the state after the data filtering process.