Data Set | Source | Observation Win. | Labeling Win. | # of TXs | # of Cust. | Label Sets | Churn (%) |
---|
A1 | Sample A | 07/2014–06/2015 | 07/2015–11/2015 | 8.5M / 3.3M | 55K | SB | 1.97 |
A2 | Sample A | 07/2014–03/2015 | 04/2015–06/2015 | 6.3M / 2.4M | 53K | SB, CC, CA | 0.99 |
B1 | Sample B | 07/2014–06/2015 | 07/2015–11/2015 | 4.2M / 2.6M | 43K | SB | 2.27 |
B2 | Sample B | 07/2014–03/2015 | 04/2015–06/2015 | 3.1M / 1.9M | 42K | SB, CC, CA | 1.42 |
- Based on samples A and B, four data sets with different characteristics have been generated. The summary includes the sampling source of the data set, observation window for feature generation, labeling window for churn decision of the customers, count of all transactions and the transactions with POS location information (# of TXs), number of customers (# of Cust.), the label sets generated for the related data set, where SB, CC, and CA stand for segmentation-based, credit card usage-based, and checking account usage-based labeling, and finally percentage of churning customers in the data set (Churn(%)) according to label inac-full. The transaction and customer counts represent the state after the data filtering process.