Skip to main content

Table 1 Data set characteristics

From: Behavioral attributes and financial churn prediction

Data Set

Source

Observation Win.

Labeling Win.

# of TXs

# of Cust.

Label Sets

Churn (%)

A1

Sample A

07/2014–06/2015

07/2015–11/2015

8.5M / 3.3M

55K

SB

1.97

A2

Sample A

07/2014–03/2015

04/2015–06/2015

6.3M / 2.4M

53K

SB, CC, CA

0.99

B1

Sample B

07/2014–06/2015

07/2015–11/2015

4.2M / 2.6M

43K

SB

2.27

B2

Sample B

07/2014–03/2015

04/2015–06/2015

3.1M / 1.9M

42K

SB, CC, CA

1.42

  1. Based on samples A and B, four data sets with different characteristics have been generated. The summary includes the sampling source of the data set, observation window for feature generation, labeling window for churn decision of the customers, count of all transactions and the transactions with POS location information (# of TXs), number of customers (# of Cust.), the label sets generated for the related data set, where SB, CC, and CA stand for segmentation-based, credit card usage-based, and checking account usage-based labeling, and finally percentage of churning customers in the data set (Churn(%)) according to label inac-full. The transaction and customer counts represent the state after the data filtering process.