PM608 Statistical Investigation
统计学报告代写 Lurking variables that may also affect water quality may include depth, climates and air pollution condition, but related information
- Introduction to Advanced Statistics
- Statistical Analysis Task and Report
- Class/Group (e.g. Group A)
- Module Tutor Name 统计学报告代写
- Student ID Number
- Date of Submission
- Word count (not needed for this assignment)
I confirm that this assignment is my own work.
Where I have referred to academic sources, I have provided in-text citations and included the sources in the final reference list.
Comments and Critique 统计学报告代写
- Quality Check
There is a total of three variables in this data, the reservoir area, water quality score (IBI). As well as an indicator stating whether or not the reservoir has been chemically treated. In this case, our goal is to explore how factors like reservoir area and chemical treatment would affect water quality. The IBI score is from 0 to 100, with 0 representing the poorest water quality and 100 the best water quality. The unit of reservoir area is km sq. To conduct the quality check, summary statistics analysis is performed.
Overall, the IBI score has median (Q1, Q3) of 64 (54, 75). It has a mean value of 62.52 (SD = 16.00), and most frequently occurred value of 60. But when the data is separated to treated vs untreated. The IBI score of treated reservoirs is 68.86, which is much higher than that of untreated (56.85). Overall, the reservoir area has median (Q1, Q3) of 26(16, 37). It has a mean value of 28.20 (SD = 17.22), and most frequently occurred value of 31. Similar to the IBI score, the mean reservoir area for the treated samples is 31.26. Which is much higher than that of untreated ones (25.47). There are two small outliers for IBI score since these two values are below the Q1 – 1.5IQR line. Besides, there is a big outlier for reservoir area, since this point is above Q3 + 1.5IQR line.
According to the histograms in appendix, IBI score is a bell shaped curved but slightly left skewed while the reservoir area is right skewed. 统计学报告代写
Lurking variables that may also affect water quality may include depth, climates and air pollution condition, but related information is not provided. According to the study information, some water samples were mistakenly collected at the depth of 1 meter instead of 3 meters. This mistake may also affect the accuracy of our analysis since the water quality score of those samples can very likely be outliers. The study method shows that two types of samples can be compared by treating the water before storing in the reservoir and after. However, it didn’t mention whether or not the treatment is applied randomly or biased (based on water quality, e.g. more treatments are applied to the water sample with worse quality).
Comment and Critique on Regression Analysis
In Appendix C, the regression analysis for IBI score versus reservoir area for both treated and untreated water samples is performed. In the untreated area, IBI is more negatively correlated (correlation coefficient: -0.77, slope: -0.83) with area compared with the treated samples (correlation coefficient: -0.56, slope: -0.41). For every 1 km sq increase of area, there is 0.41 decrease of IBI scores for treated samples and 0.83 decrease of IBI scores for untreated samples. The intercept for the model of treated samples is 81.75. Which is slightly higher than that of the untreated sample (77.87). 统计学报告代写
However, it seems that the model for the treated samples has higher prediction accuracy since the residuals are closer to 0 compared with the model for the untreated sample. In this case, the model for the treated samples is more valid even though it has lower correlation coefficient and goodness-of-fit value. It is known that some samples were collected at 1m depth. So an extreme outlier was detected at area 6 sq km and IBI score 12 since the surface water has much worse quality than the deeper water. This outlier reduced the correlation coefficient and lowered the goodness-of-fit but didn’t affect the predictability.
Comment and Critique on Two-Sample Confidence Interval
Two 2-sample confidence intervals were calculated (95% CI and 90% CI). The reason why 95% CI and 90% CI were chosen was that the sample size was relatively large (more than 30). In this case, we know that the mean with much more precision than we do with a small sample. So, in order to have narrower confidence interval, 95% CI and 90% CI were calculated. We are 95% confident that the population mean difference between treated samples and untreated samples falls within [5.82, 18.19], and only 90% confident that it falls within [6.81, 17.20]. Factors that may limit the significance of the confidence intervals may include sample size, percentage and population size. The larger the sample, the more confident we are that the interval can truly reflect the population.
Recommendations and Findings 统计学报告代写
The IBI score differs significantly at both 0.05 and 0.01 significance level since both confidence intervals don’t include 0. Treated water samples have significantly higher water quality than untreated water samples and the score difference may fall within 5.82 to 18.19. The model built using treated water samples is more valid and has higher prediction accuracy. Besides, as area increases, water quality decreases.
Therefore, to improve water quality. It is recommended to shrink the reservoir area and treat the water with chemicals before storing it into the reservoir.
Appendix: Statistical Tests and Calculations
- Full Numerical Summary
- Graphical Summary
- Regression Analysis
2 scatter plots + linear regression
2 residual plots
- Two-Sample Confidence Interval