Skip to main content

Accuracy vs Precision Data Quality Analysis

 In this assignment, we analyzed 50 points collected at the same location via a GPS handheld device. Through these collected points we determined their precision and accuracy. 

I first determined the mean of the points collected ("waypoints") by using the summary statistics tool, and found the exact coordinates of the "Average Waypoint" via the Absolute X,Y,Z tool. I re-projected and spatially joined the layers. Lastly, I created three new fields to determine the 50th, 68th, and 90th percentile. 

Map1: GPS datapoint distribution and precision/accuracy analysis

My horizontal precision for the 68th percentile is 4.5 meters. The distance between the "Average Waypoint" and the true reference point is 3.78 meters. Horizontal precision looks at the "consistency of a measurement method,"  and aims to provide "tightly packed results." (Bolstad, 2016) Horizontal accuracy on the other hand "measures how close a database representation of an object is to the true value." (Bolstad, 2016) 

My horizontal precision (4.5 meters) overestimated my horizontal precision as the actual distance between my average waypoint and reference point was significantly more [+0.78 meters] than the true distance of 3.78 meters. When looking at my mapped percentiles in the map above, my precision analysis was not great as 56% of the mapped points feel within the 68th percentile. The data collected should have also resided within 3.78 meters instead of 4.5 meters for optimal accuracy. 

My vertical average was 27.79, while the true reference point elevation was 22.58 meters. This means that my vertical accuracy was +5.21 meters. As stated above with my horizontal accuracy of +0.78 meters, I overestimated my vertical accuracy. An additional 5.21 meters is pretty significant and may result in unusable data. 

I believe that I can use the data collected by the GPS unit, but I would need to account for a certain level of inaccuracy. I would suggest that the company that obtained the handheld GPS data re-calibrate their devices and do a refresher training with their staff to reduce user error. From a GIS analyst's point, some errors I encountered could be derived from formatting errors when we changed the projection. The data could also be old, and the physical marker could have shifted over time. 

For the last step in this analysis, I determined the RSME and created a cumulative frequency distribution (CDF) as seen below.

Graph 1: Cumulative Distribution Function graph of Map 1 dataset

The CDF in this graph looks at the relationship between the mean RSME (x-axis) and the cumulative percentage (y-axis).  It is telling me how much mean RSME error is at a certain CP. For example, at 10 CP (or the 10th percentile) there is approximately 1.2 mean RSME. If we look at the median distribution at the 50th percentile, we have a mean RSME error of about 2.5. If we are basing the value on a scale out of 7, then our RSME value can be reduced to .36 and falls into the acceptable range between 0.2 and 0.5. Therefore, our data would have an acceptable level of error, and we could proceed forward with the data.  

Other values collected for RSME and CDF analysis:


Sources:

Paul Bolstad. 2016. GIS Fundamentals: A First Text on Geographic Information Systems.  5th Edition. Eider  Press.  ISBN-13: 978-1506695877

Comments

Popular posts from this blog

Bivariate Choropleth and Proportional Symbols

In the first part of this lab, we used proportional symbols to represent positive and negative values in job increases/decreases in the USA.  Because there were negative values in this data set, I created a new map to "fix" the data. In this new map, I created a new field and copied the negative job loss data. I then used the Calculate field data and multiplied it by one to make it positive. Lastly, I overlaid both maps on the data and was able to accurately represent the increase and decrease of jobs in the USA by state.   In the second part of this lab, we delved into how to prepare data for a bivariate choropleth map, choose colors for the legend, and create a good layout.  I created three separate fields to analyze the data: Class Obese, Class Inactivity, and Class Final. I used the symbology tool to create 3 Quantile for the Obese and Inactivity classes and used each quantile to set the three classifications in the fields I created using the Select by Attributes tool to

Infographic's

 This week was a fun and challenging week as we learned about and created infographics. It was fun to create the infographics themselves, but challenging to figure out the best methods and practices in analyzing raw data.  We used 2018 County Health Rankings National Data from countyhealthrankings.org. I chose to move forward with the two values: Unhealthy Mental Days and Premature Dealth.  I   choose these two variables because those that struggle with mental health die before their time due to depression, anxiety, and/or a combination of similar issues. Both variables are normalized by relating their value to all counties within each state in the USA. For example, the poor mental health days is normalized as the average number of reported mentally unhealthy days per month per ctizen. The normalized premature rate is the “age-adjusted years of potential life lost rate per 100,000.”  Below, I created a scatterplot of the normalized data.  I choose to keep the scatterplot in a pretty tr

Color and Choropleths

This lab was very interesting as we dived into color theory.  In the first part of the lab, we created and compared linear and adjusted progression color ramps to themselves as well as a color ramp from the website colorbrewer.org.  I found, the colorbrewer color ramps are not as rhythmic when compared to the other methods, as they don’t step up at set intervals or rates. However, I don’t think that a set rate is needed to go from color to color. I preferred the colorbrewer ramp because each color was distinct from its neighbors. In the linear and adjusted color ramps, the colors looked too similar to each other and were not distinct enough for each step. I think that as long as the color ramp is moving in the opposite direction of the same color hue, the step rate or interval is not as relevant. When I first was completing the linear step I started with the purple hue option but had a difficult time, as each step in the color ramp looked the same. At one point, I created my own color