Skip to main content

Data Quality

The goal of this analysis was to determine which road dataset (TIGER or Street Centerlines) was more complete for Jackson County, Oregon. Completeness in this case is defined by which dataset layer had a greater total length per grid subsection (267 grid subsections total) and greater coverage of the entire area of study. These two main points of my study were derived from the Haklay study.

”After gauging the level of positional accuracy of the OSM dataset, the next issue is the level of completeness. While Steve Coast, the founder of OSM, stated ``it's important to let go of the concept of completeness'' (GISPro, 2007, page 22), it is important to know which areas are well covered and which are notöotherwise, the data can be assumed to be unusable. ” (Haklay, 2009)

To begin with, I used the Summary Statistics tool to find the overall completeness of each layer within the study area. Tiger roads had a total of 11382.7 km of length while Street Centerlines only had 10873.3 km. In this first step, Street Centerlines is more complete.

Below are the steps I took to determine if TIGER or Street Centerlines was more complete per grid.

1.       I first reprojected the TIGER roads layer to the same projection as the Street roads layer in order to be able to compare the layers; reprojected to  NAD_1983_StatePlane_Oregon_South_FIPS_3602_Feet_Intl.

2.       Use the Clip tool to cut out any roads outside of the grid boundary areas. (I first tried to use the intersect tool but if failed to clip out the road features outside of the grid areas.) I used either the TIGER or Street roads layer as the input, and the Grid layer as the clip feature.

3.       I used the intersect tool on each of the TIGER/Street Centerline(SC) layers with the GRID layer to add the gridcode to the attribute table.

4.       Then I used the summarize within tool to determine the length of road per grid per TIGER/SC layer.

5.       I spatial joined both of the TIGER/SC summarized within layers to analyze them.

6.       I created a new field and used calculate field to determine how many grids had a greater road length in the SC layer over the TIGER layer (aka more completeness) and vice versa.

7.       I created a new field and plugged in the percent difference formula to find the percent difference.

8.       I changed the symbology to unique, manual to create a choropleth map of the percent difference.


I found that the Street Centerline was more complete at 61.4% (164 out of 267 grids) than the TIGER dataset that fell at 49.8% (133 out of 267 grids). While the Steet Centerline was more complete per grid overall, over the TIGER layer, it should be noted that TIGER did have greater overall completeness of area coverage for Jackson County.

Map 1: Percent Difference for Completeness Analysis of Jackson County, Oregon. 

Source: 

Haklay, M. (2010). How Good is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets. Environment and Planning B: Planning and Design, 37(4), 682–703. https://doi.org/10.1068/b35097


 

Comments

Popular posts from this blog

Bivariate Choropleth and Proportional Symbols

In the first part of this lab, we used proportional symbols to represent positive and negative values in job increases/decreases in the USA.  Because there were negative values in this data set, I created a new map to "fix" the data. In this new map, I created a new field and copied the negative job loss data. I then used the Calculate field data and multiplied it by one to make it positive. Lastly, I overlaid both maps on the data and was able to accurately represent the increase and decrease of jobs in the USA by state.   In the second part of this lab, we delved into how to prepare data for a bivariate choropleth map, choose colors for the legend, and create a good layout.  I created three separate fields to analyze the data: Class Obese, Class Inactivity, and Class Final. I used the symbology tool to create 3 Quantile for the Obese and Inactivity classes and used each quantile to set the three classifications in the fields I created using the Select by Attributes...

Positional Accuracy: NSSDA

 In this analysis, I compared the street and road intersect data collected for Alburquerque, NM by the City of Alburquerque and the application StreetMaps. I used an orthophoto base layer as the reference for this analysis, to compare and determine the accuracy of both the City and Streetmap layers using NSSDA procedures. The most difficult part of this analysis for me was how to determine what 20% per quadrant looks like. Because the reference map was divided into 208 quadrants, I had to determine how to subdivide all the quadrant's equality into 20%. After multiple trials and error, I decided to subdivide the entire area (208 sub-quadrants) into 4 equal-area subsections. In this way, I could do 5 random right intersection points per subsection or 20% per subsection.  Map 1: City of Albuquerque city map data.  Map 2: City of Alburquerque SteetMap data When selecting a random intersection to place the points within each quadrant, I choose a location that had data f...

Infographic's

 This week was a fun and challenging week as we learned about and created infographics. It was fun to create the infographics themselves, but challenging to figure out the best methods and practices in analyzing raw data.  We used 2018 County Health Rankings National Data from countyhealthrankings.org. I chose to move forward with the two values: Unhealthy Mental Days and Premature Dealth.  I   choose these two variables because those that struggle with mental health die before their time due to depression, anxiety, and/or a combination of similar issues. Both variables are normalized by relating their value to all counties within each state in the USA. For example, the poor mental health days is normalized as the average number of reported mentally unhealthy days per month per ctizen. The normalized premature rate is the “age-adjusted years of potential life lost rate per 100,000.”  Below, I created a scatterplot of the normalized data.  I choose to k...