Behavioral attributes and financial churn prediction

EPJ Data Science

Table 1 Data set characteristics

Data Set	Source	Observation Win.	Labeling Win.	# of TXs	# of Cust.	Label Sets	Churn (%)
A1	Sample A	07/2014–06/2015	07/2015–11/2015	8.5M / 3.3M	55K	SB	1.97
A2	Sample A	07/2014–03/2015	04/2015–06/2015	6.3M / 2.4M	53K	SB, CC, CA	0.99
B1	Sample B	07/2014–06/2015	07/2015–11/2015	4.2M / 2.6M	43K	SB	2.27
B2	Sample B	07/2014–03/2015	04/2015–06/2015	3.1M / 1.9M	42K	SB, CC, CA	1.42

Based on samples A and B, four data sets with different characteristics have been generated. The summary includes the sampling source of the data set, observation window for feature generation, labeling window for churn decision of the customers, count of all transactions and the transactions with POS location information (# of TXs), number of customers (# of Cust.), the label sets generated for the related data set, where SB, CC, and CA stand for segmentation-based, credit card usage-based, and checking account usage-based labeling, and finally percentage of churning customers in the data set (Churn(%)) according to label inac-full. The transaction and customer counts represent the state after the data filtering process.