Pyspark dataframe problem
Given restaurant inspection dataset for 3 consecutive years (dataset2016, dataset 2017, and dataset 2018 as uploaded on Canvas), please do the following: convert the file into txt or any file type
Getting familiar with your data (create dataframe and schema), name the columns and merge the data together , and show descriptive statistics for numerical columns.
Find pairwise correlations of the numerical columns.
Checking duplicates and filling missing observations with means