## Using MS Excel to Calculate Statistics and Visualize Data:

The purpose of the following assignment served as a means to introduce the methods of statistical analysis and visualization of data using Microsoft Excel and R software and language. The knowledge gained from the assignment included learning and becoming familiar with the techniques of statistical analysis as well as learning how to display statistical data graphically in both Excel and R software. The assignment also taught how to understand the extents of a given dataset, describing data in an objective and statistically meaningful manner, and lastly, learning the syntax and how to operate R software in order to produce results to present findings of statistical data.

Upon reviewing and statistically analyzing the Niagara College Glendale Campus Groundwater Well data set, several results in the data were found and will be presented in the subsequent paragraphs:

By statistically analyzing the data, the range of the water table depths was -2.0m to 22.0m with a 2.0m bin. Furthermore, the range of the water well pH values was 0.0H+ to 9.0H+ with a 0.5H+ bin.

The average (median) water table depth was not same as the median water table depth. By statistically analyzing the data, the mean water table depth was 8.6m and the median water table depth was 6.5m. Thus, when statistically analyzing the water table depth histogram, the data was showed to be positively skewed. The data showed a positive skew on the histogram because the mean was higher than the median within the data set.

There were no outliers in the Niagara College Glendale Campus Groundwater Well data set. The reason why there were no outliers in this data set is because there were no values that fell more than three standard deviations (99.7%) away from the mean. According to statisticians, an outlier consists of a data point that is more than three standard deviations from the mean; however, the data should be thoroughly examined before considering any outliers. In the case of the Niagara College Glendale Campus Groundwater Well data set, because of the small sample size, each value was taken into consideration before defining that there were any outliers. Lastly, if the any outliers were involved in the data set, they were not be excluded for predictive purposes.

Based on the statistical data derived, if one were to add 100 new groundwater wells to the campus, the depth of water table range at 68% of all the new wells to fall within would have a 1.5m to 15.7m water depth. This range was based on taking the mean of the water table depth and adding it to the standard deviation to get the maximum value, and taking the mean of the water table depth and subtracting it from the standard deviation to get the minimum value. Moreover, based on the statistical data derived, if one were to add 100 new groundwater wells to the campus, the depth of water table range at 95% of all the new wells to fall within would have a -5.6m to 22.8m water depth. This range was based on taking the range from the first standard deviation and adding it to the standard deviation to get the maximum value, and taking the mean of the water table depth and subtracting it from the standard deviation to get the minimum value.

From the scattergram, comparing water depth versus pH, the relationship between the dependent and independent variables were visually evident to be linear. This linearity was also evident because of the Pearson r value; the Pearson r value describes the relationship between the independent and dependant variable showing how well values are correlated together. The Pearson r value for the two data sets was -0.9 indicating that the values were closely correlated. Additionally, from the scattergram comparing water depth versus pH, the relationship between the dependent and independent variables were visually evident to be negatively correlated. The relationship between the dependent and independent variables also showed to be negative through the Pearson r value (-0.9) as well as the slope (-0.167). Lastly, from the scattergram comparing water depth versus pH, the relationship between the dependent and independent variables were strongly linear. The R2 value (0.9) depicted that the correlation was strong because it is higher than 0.60.

When adding the new Mini-piezometer data to the Niagara College Glendale Campus Groundwater Well data set, additional results in the combined data were found and will be presented in the subsequent paragraphs below:

The mean of the water table depth changed with the new data. The mean of the water table depth changed from 8.6m to 8.4m. The standard deviation for depth to water table also changed with the new data. The standard deviation for the water table depth changed from 7.1m to 6.7m.

Upon reviewing the boxplot generated from the R software, five outliers were shown. The data points were known to be outlier because they were displayed as circles, or single hollow points, on the boxplot.

The resultant equation of the simple regression model with the Depth to Water Table as the dependent variable and the pH values as the independent variable in R was: ‘Depth = -5.3793 (pH) + 43.2966’. Likewise, the R2 value for the simple regression model was 0.8983 (0.9) and the adjusted R2 value for this model was 0.8856 (0.9).

The resultant equation of the multiple regression model with the Depth to Water Table as the dependent variable and the pH values, Northing, and Easting as the independent variable in R was: 'Depth = -4.507 (pH) + 4.858x10^-3 (Easting) - 6.116x10^-3 (Northing) + 2.611x10^-4'. Likewise, the R2 value for the multiple regression model was 0.98 and the adjusted R2 value for this model was 0.97.

In the R software, the program shows the significance of variables by displaying asterisks (*) beside the variables, the more significant the variable is the more amount of asterisks there will be beside the variable. In the multiple regression model performed, both easting and northing had one asterisk, therefore, it was analyzed that no variables were insignificant in this model. Both easting and northing variables were equally significant. It was also analyzed that no independent variables were significant in the multiple regression model.

Of the two models performed, the second model (multiple regression model) would most likely yield the best prediction for the Depth to Water Table. The second model would most likely yield the best predictions due to the higher R2 value. By having a higher R2 value, the data in this model suggested that there were variables that had not been excluded from the data set; by having more variables, there were more values in the data to help improve the predictability of data, in this case, the Depth to Water Table.

Finally, none of the independent variables were strongly correlated. The only variable that came close to having a strong correlation was the Depth and Easting variable at 0.5250244 (0.5) which is considered to be moderately correlated.

If the multiple regression model were to be adjusted to avoid multi-colinearity the insignificant variables should be removed one at a time. Removing one insignificant variable at a time would result in not having high correlation between independent variables.

Thus, the the Niagara College Glendale Campus Groundwater Well data set has been reviewed and statistically analyzed, and the results were found and presented upon your request.

## Introduction to ArcGIS Spatial Analyst extension:

The purpose of the following deliverable served as a means to introduce ESRI’s ArcGIS: ArcMap software, spatial analyst extension coupled with gaining basic familiarity with the techniques of spatial analysis. It gave an opportunity to derive new analytical data from existing datasets and to perform from both a weighted multi-criteria spatial analysis and fuzzy logic analysis. The knowledge gained from the deliverable included learning practical understanding and functionality with ArcMap spatial analyst extension as well as learning how to perform two different types of analyses. The deliverable also taught how to work with various tools within the spatial analyst extension such as creating tins, converting shapefiles to rasters, and reclassifying rasters. Lastly, undertaking the role of the Niagara College’s campus geospatial environmental expert, the deliverable gave the ability to work with weighted overlays, fuzzy memberships, and lastly, fuzzy overlays.

## Introduction to ArcGIS Spatial Analyst Model Builder:

The purpose of the following assignment served as a means to introduce ArcGIS Spatial Analyst Model Builder. It gave the opportunity to gain basic familiarity with the techniques of model builder and the ability it has to perform various tasks within ArcMap in one method. The assignment also gave the opportunity to derive new data from existing data sets in order to perform both a weighted multi-criteria spatial analysis and a fuzzy logic analysis within model builder by creating a tin, producing multiple rasters from that tin, reclassifying those rasters, and outputting the rasters in either a weighted overlay or a fuzzy overlay. The knowledge gained from the deliverable included learning practical understanding and functionality with ArcMap Spatial Analyst Model Builder as well as learning how to perform two different types of analyses within the Model. The deliverable also taught how to work with various tools within the spatial analyst extension. Lastly, the deliverable gave the chance to express creative abilities by developing a poster displaying the Model that was built; with a strong graphic visualization , the poster acts as a window to see into the ways of how GIS techniques work.

## Geostatistical Analysis of Student Collected Spatial Data:

The purpose of the following deliverable served as a means to introduce ESRI’s ArcGIS: ArcMap software, geostatistical analyst extension coupled with learning geostatistical interpolation techniques and exploration. As a result, two surfaces were produced, communicating the correlation of family household income levels and vandalism in the boroughs of New York, New York. It gave the opportunity to derive new analytical methods by performing from an Inverse Distance Weighting (IDW) and Kriging on two sets of data that would predict whether the rate of vandalism in a certain neighbourhood within a borough (expressed in the number of recorded graffiti incidents), if affected by the mean family household income levels in each neighbourhood. In sum, of the two interpolation methods, it can be concluded that the Kriging surface posed better, more accurate findings while providing significant results regarding household income levels and vandalism in New York, New York.

This project was completed in partnership with Jordan Hamilton. Click here to view Jordan's eportfolio.