Skip to main content

Accuracy vs Precision Data Quality Analysis

 In this assignment, we analyzed 50 points collected at the same location via a GPS handheld device. Through these collected points we determined their precision and accuracy. 

I first determined the mean of the points collected ("waypoints") by using the summary statistics tool, and found the exact coordinates of the "Average Waypoint" via the Absolute X,Y,Z tool. I re-projected and spatially joined the layers. Lastly, I created three new fields to determine the 50th, 68th, and 90th percentile. 

Map1: GPS datapoint distribution and precision/accuracy analysis

My horizontal precision for the 68th percentile is 4.5 meters. The distance between the "Average Waypoint" and the true reference point is 3.78 meters. Horizontal precision looks at the "consistency of a measurement method,"  and aims to provide "tightly packed results." (Bolstad, 2016) Horizontal accuracy on the other hand "measures how close a database representation of an object is to the true value." (Bolstad, 2016) 

My horizontal precision (4.5 meters) overestimated my horizontal precision as the actual distance between my average waypoint and reference point was significantly more [+0.78 meters] than the true distance of 3.78 meters. When looking at my mapped percentiles in the map above, my precision analysis was not great as 56% of the mapped points feel within the 68th percentile. The data collected should have also resided within 3.78 meters instead of 4.5 meters for optimal accuracy. 

My vertical average was 27.79, while the true reference point elevation was 22.58 meters. This means that my vertical accuracy was +5.21 meters. As stated above with my horizontal accuracy of +0.78 meters, I overestimated my vertical accuracy. An additional 5.21 meters is pretty significant and may result in unusable data. 

I believe that I can use the data collected by the GPS unit, but I would need to account for a certain level of inaccuracy. I would suggest that the company that obtained the handheld GPS data re-calibrate their devices and do a refresher training with their staff to reduce user error. From a GIS analyst's point, some errors I encountered could be derived from formatting errors when we changed the projection. The data could also be old, and the physical marker could have shifted over time. 

For the last step in this analysis, I determined the RSME and created a cumulative frequency distribution (CDF) as seen below.

Graph 1: Cumulative Distribution Function graph of Map 1 dataset

The CDF in this graph looks at the relationship between the mean RSME (x-axis) and the cumulative percentage (y-axis).  It is telling me how much mean RSME error is at a certain CP. For example, at 10 CP (or the 10th percentile) there is approximately 1.2 mean RSME. If we look at the median distribution at the 50th percentile, we have a mean RSME error of about 2.5. If we are basing the value on a scale out of 7, then our RSME value can be reduced to .36 and falls into the acceptable range between 0.2 and 0.5. Therefore, our data would have an acceptable level of error, and we could proceed forward with the data.  

Other values collected for RSME and CDF analysis:


Sources:

Paul Bolstad. 2016. GIS Fundamentals: A First Text on Geographic Information Systems.  5th Edition. Eider  Press.  ISBN-13: 978-1506695877

Comments

Popular posts from this blog

Bivariate Choropleth and Proportional Symbols

In the first part of this lab, we used proportional symbols to represent positive and negative values in job increases/decreases in the USA.  Because there were negative values in this data set, I created a new map to "fix" the data. In this new map, I created a new field and copied the negative job loss data. I then used the Calculate field data and multiplied it by one to make it positive. Lastly, I overlaid both maps on the data and was able to accurately represent the increase and decrease of jobs in the USA by state.   In the second part of this lab, we delved into how to prepare data for a bivariate choropleth map, choose colors for the legend, and create a good layout.  I created three separate fields to analyze the data: Class Obese, Class Inactivity, and Class Final. I used the symbology tool to create 3 Quantile for the Obese and Inactivity classes and used each quantile to set the three classifications in the fields I created using the Select by Attributes...

Positional Accuracy: NSSDA

 In this analysis, I compared the street and road intersect data collected for Alburquerque, NM by the City of Alburquerque and the application StreetMaps. I used an orthophoto base layer as the reference for this analysis, to compare and determine the accuracy of both the City and Streetmap layers using NSSDA procedures. The most difficult part of this analysis for me was how to determine what 20% per quadrant looks like. Because the reference map was divided into 208 quadrants, I had to determine how to subdivide all the quadrant's equality into 20%. After multiple trials and error, I decided to subdivide the entire area (208 sub-quadrants) into 4 equal-area subsections. In this way, I could do 5 random right intersection points per subsection or 20% per subsection.  Map 1: City of Albuquerque city map data.  Map 2: City of Alburquerque SteetMap data When selecting a random intersection to place the points within each quadrant, I choose a location that had data f...

Infographic's

 This week was a fun and challenging week as we learned about and created infographics. It was fun to create the infographics themselves, but challenging to figure out the best methods and practices in analyzing raw data.  We used 2018 County Health Rankings National Data from countyhealthrankings.org. I chose to move forward with the two values: Unhealthy Mental Days and Premature Dealth.  I   choose these two variables because those that struggle with mental health die before their time due to depression, anxiety, and/or a combination of similar issues. Both variables are normalized by relating their value to all counties within each state in the USA. For example, the poor mental health days is normalized as the average number of reported mentally unhealthy days per month per ctizen. The normalized premature rate is the “age-adjusted years of potential life lost rate per 100,000.”  Below, I created a scatterplot of the normalized data.  I choose to k...