Data mining adventure

Continue on your data mining adventure by doing some classifications

  1. Conintue with your original dataset. Make any changes that were suggested and link or add to the original document. All infor should be accessible from previous portions of the project.
  2. Utilizing technology classify on one of your categorical variables (a) Use a simple decision tree to classify your data to a categorical variable. Create a visualization of the decision tree. Make sure to produce a confusion matrix. (b) Repeat your decision tree but use a cross validation technique to test the accuracy. Examine variable importance and be certain to comment on the most important variables. (c) Examine the importance of each feature using a chi-square statistic or gain ratio. Create a visualization. Does this follow what your decision trees showed?
  3. Write your report! (a) Include all items requested above. Include graphs and text about each. (b) Discuss the cross validation process chosen. Discuss whether each model is overfit and how you might tell.

Sample Solution