Consider the dataset D2.3, which contains simulated transactional data inspired by the subscription
meal delivery service Blue Apron (https://www.blueapron.com/). The dataset records a random
sample of Blue Apron’s subscribers’ activity (∼22,400 individuals) during Jan 2019. A detailed
codebook is available from the document C2.3.
Consider the following two specifications:
Specification 1:
Specification 2:
Using the Blue Apron data, perform the following tasks:
- Task 1:
a. [2 points] Estimate the two listed specifications using churn indicator as the
outcome and implementing f() as the logistic model.
b. [1 point] Select a model based on predictive performance criteria. Justify your
decision.
c. [1 point] Use the selected model to predict churn probabilities for every customer
in the sample. Present a histogram of these probabilities. - Task 2:
a. [2 points] Estimate the two listed models using MonthlyAddons as the outcome and
implementing f() as linear regression.
b. [1 point] Select a model based on predictive performance criteria. Justify your
decision.
c. [1 point] Use the selected model to predict MonthlyAddons for every customer in
the sample. Make sure these predictions are within range. Present a histogram
thereof. - Task 3:
a. [1 point] Export the full dataset to a csv file. The exported data must include
individual predictions for churn probabilities (task 1) and monthly add-ons (task 2),
each from their respectively preferred specification. After this file is saved as csv,
convert it into xls or xlsx so that formulas can be saved (this last step is not in R,
just a simple change of extension).