Project on Linear Regression and Comparative Box Plots

Recently in California, 08/31/2020, Superior Court Judge Brad Seligman issued an injunction on the use of SAT and ACT exam scores in decisions made for admissions and scholarships at any University of California.

Seligman stated in his ruling: “The barriers faced by students with disabilities have been greatly exacerbated by the COVID-19 epidemic, which has disrupted test-taking locations, closed schools and limited access to school counselors…”

Seligman added that little data existed to show whether the tests were even valid or reliable indicators of a student's future college performance.

Mark Rosenbaum, a Los Angeles attorney who helped file the lawsuit as director of Public Counsel's Opportunity Under Law project, stated "The historic decision puts an end to racist tests that deprived countless California students of color, students with disabilities, and students from low-income families of a fair shot at admissions to the UC system."

We would like to explore if the above statements amount to nothing more than hyperbolic opinions not grounded in fact, or if there exists any evidence that there is a significant wealth and/or race gap in SAT scores.

Imagine: You are a paid statistical consultant. You have been hired to conduct an unbiased investigation of the data table below and make evidence based findings that can be used by all parties involved in the above lawsuit. In particular, you are going to do a comparison of the math SAT scores for black test takers vs. white test takers stratified by family income level. You will do this using two linear regressions and a side-by-side comparative box plot. You have been asked to report your findings in a “typical” professional paper format.

The Formal Report: Each student must turn in his or her own unique report. I would like to read your thoughts and in your words. You are allowed and encouraged to discuss your ideas with classmates, the math tutoring center, the writing center, and myself during office hours. Please DO NOT email me any rough drafts. I also recommend that you never email a classmate your report file. The report will be 12pt type, Arial or Calibri font, 1.15 to 1.25 line spacing. Use the bolded headings below (1-8) for your report, in the exact order displayed. Embed all tables and graphs in-line (meaning place them near the text where they are referenced). Use screen capture (image clipping) for graphical displays from StatKey. Size the images to make them readable. There is no minimum or maximum length to the report. It should be as long as is needed. I highly recommend that you visit the writing center and allow someone to review your work for clarity. We will spend significant class time discussing this project.

  1. Abstract
    ○ A brief synopsis of the major conclusions from your report. You typically write this last, although it appears first. Make sure to include your findings along with any metrics used to make determinations such as correlations. This section should be brief and to the point. This is different from an introduction.
  2. Introduction and Methods
    ○ Introduce the reader to the investigation. Do not assume that I am the only person to read this. You may assume the reader understands elementary statistical models and methods such as linear regression, box plots, etc. State the questions being investigated and describe the methods you are going to use.
  3. Correlation of Black Test Takers to Family Income
    ● Perform a linear regression in StatKey using the family income to predict the math SAT scores for black test takers. You will notice that family income is displayed by income ranges and you will need to make each of these into a single income number.
    ○ Use the top value in each range. Example: The range “less than 10,000” will become 10000. The range “10,000 to 15000” will become 15000 and so on. For the last range of “over 100,000” use 120000 for consistency.
    ● Use screen capture to clip an image of the regression model and embed the image into your report in this section.
    ● Discuss the strength of the association using the correlation coefficient and discuss what is being associated.

​4. Correlation of White Test Takers and Family Income
● Proceed as you did in item 3, but for white test takers.

​5. Comparative Box Plots
● Using StatKey, complete a comparative box plot analysis between the math SAT scores for black test takers and white test takers.
● Use screen capture as before.
● Discuss the differences and similarities you see in the boxes.
○ Discuss range, IQR, outliers, median, mean, symmetry, and anything else you find relevant.

  1. Overall Conclusions:
    ○ State a statistically based finding to the judge and attorney about what these data show (and don’t show). Make sure to discuss the findings in each of the items above (3,4, and 5).
  2. Confounding Variables
    ○ Discuss (in some detail) at least two possible causal variables that may be influencing the statistics you described. You will need to link either a well-researched news article, or a research paper that supports each confounding variable. Remember, a confounding variable is something that is a potential causal effect that is “pushing” on both things at once. Consider what might be causing black test scores to be lower than white test scores even when the test takers are in the same income bracket.
  3. Personal Reflections (there is no right or wrong here, except to leave it
    blank)
    ○ What parts of this project were difficult for you? Were there technology issues that got in your way? Did you find the writing center/math tutoring helpful (if used)?
    ○ What did this project help you understand about these statistical tools?
    ○ What do you think this project measures about you and your understanding of the material and in what way is this measurement different from a timed exam? Do you prefer timed exams?
    ○ How could this project be improved?

From the meta-analysis Race Poverty and SAT Scores by Ezekiel J. Dixon-Roman, Edward Everson and John J. Mcardle published May 2013, Research Gate, we have the following data table.

Note: Here we are looking at differences between black and white test takers. Similar disparities exist between white and LatinX test takers.

Some things to consider: What does “white” actually mean? When was it first used? Why was it first used? Who was originally considered white? In todays America, is “white” interchangeable with “power-class”?

Sample Solution