Geography 5201: Final Project

This post was last updated on

Final Project: The Non-Traditional City Index

Purpose

This project was inspired by my friends and classmates, a group that includes many graduating and recently-graduated students. Many people within my friend group are planning to move to other cities within the next few years, and while some are travelling to take a new job or meet up with family, others are more uncertain about their future location. Part of this uncertainty stems from the lack of effective and transparent city ranking systems available to the public: when it comes to looking up city information, most people either have to rely on unclear and arbitrary "top 10" lists, or else laboriously dig up statistics from a variety of different sources.

In an era when cities market themselves to attract investment and the young "creative class", it is important for the public to have access to clear, objective information that describes the differences between them. This project fills that need: it allows for users to differentiate between cities based on their personal needs, using clear and objective metrics, and then visualize those differences with an easily-understood interactive map. Additionally, this project includes a number of unconventional indices and a diverse group of urban areas, which may encourage users to widen their search.

How it Works:

The Non-Traditional City Index, or City MetaScore, is composed of a number of demographic, housing, and locational indicators that are selected by the user. The user can select an indicator for inclusion by checking a box to the left of the indicator. Each individual indicator is linked to three variables for each city: the actual, numeric value for that indicator; a rank, out of 147; and a normalized value, where the best value for that indicator is normalized to 1, the worst value for that indicator is normalized to 0, and everything else is somewhere in between. Some indicators are location-based, and describe whether or not certain points of interest can be found within a given distance of the city center. These indicators are binary (0/1): either the urban area is within range of the given features, or it is not.

The user also assigns a weight for each indicator that he or she wishes to include in the City MetaScore. The weight value determines how important each indicator is in the calculation of the final score. Weights are proportional: an indicator with a weight of 6 is considered twice as important as an indicator with a weight of 3, and three times as important as an indicator with a weight of 2. Note that weights are NOT rankings of importance: the same weight can be assigned to multiple indicators, and higher-numbered weights are assigned MORE importance than lower-numbered weights.

To calculate the MetaScore for each city, an algorithm multiplies the normalized (0-1) value for each selected indicator and multiplies it by the weight for that indicator. The algorithm then adds all of the weighted indicators together into a single score. Next, these summed scores are normalized from 0 to 1 across all 147 cities. Finally, the algorithm color-codes cities based on their normalized scores.

When the MetaScore is calculated, at least one urban area will have a MetaScore of 1. According to the algorithm, this urban area is the best possible match for the user based on his or her selected criteria. At least one other urban area will have a MetaScore of 0. According to the algorithm, this urban area is the worst possible match for the user based on his or her selected criteria. All other cities will fall somewhere on the 0-1 scale, where higher scores (symbolized with blue) indicate a better match, and lower scores (symbolized with red) indicate a worse match.

Creating the Map

My workflow for creating the map:

  • I downloaded all of the raw data (described below) and loaded it into ArcGIS.
  • Using a variety of joins and processing functions in ArcMap, I condensed the data into a single shapefile. The shapefile contained geographies for the top 147 urban areas in the United States (by population), along with an attribute table listing relevant information for each urban area. The locations of national parks, lakes, and coastline were coded into three variables each: binary functions that determined whether the features were present within 20 miles, 50 miles, or 100 miles of the city.
  • The shapefile was converted into GeoJSON using the OGR2OGR converter.
  • The resulting GeoJSON file was nearly 18 MB, far too large for rapid loading in a web application. I dramatically reduced the size of the GeoJSON file using MapShaper, a free online tool. By reducing the number of points within the city file geometry, I decreased the size of the JSON structure to 2 MB.
  • I placed the GeoJSON structure within a Javascript file and assigned it as the variable "cities". I included both this Javascript file and the final mapping file as scripts in my final map page.
  • I created a Javascript page that initially displayed the United States and the unranked cities along with an input form, then ranked the cities on command and symbolized them based on their scores. The ranking function also adds pop-up text that users can interact with.
  • I created a separate PHP page within the same directory as the map, then created a function to write URLs for that page within the Javascript map. Thanks to this PHP page, users can generate interactive detail pages that list relevant information about each of the cities.
  • Because the ranking function is triggered by a mouse click, users can rank and re-rank the urban areas multiple times without refreshing the page.

The map is displayed using the following framework:

  • The home page and input form are written in HTML.
  • The map and interactive data layers are written in Javascript with Leaflet. The Javascript code accesses the HTML input form using JQuery with AJAX.
  • The map data is stored as JSON and is represented on the map as a L.GeoJSON object.
  • After ranking the cities, users can view details about each city in a separate page by clicking on a link in the city's pop-up content. The details page is created using the PHP $_GET[] function and data passed through the URL.

Details

Geography and Mapping Tools

The "cities" listed in this index were downloaded from the 2014 TIGER/Line dataset provided by the US Census, under the "Urban Areas" categorization. The urban areas included in this index all had a population of over 250,000 according to the 2010 Census; this rule applies to 147 total urban areas in the United States, including 145 urban areas in the continental US and one urban area each in Alaska and Hawaii. The cities are represented on the map using a four-class proportional circle symbology that is related to their relative population sizes.

The map is displayed using the Leaflet open-source package for JavaScript, and the interactive component is facilitated using JQuery. The background map is provided courtesy of OpenStreetMaps through a Creative Commons ShareAlike license.

The Indicators

Demographic:

All of these indicators were derived from 2010 Census data aggregated by Urban Area, located and downloaded with the American FactFinder website. Meanings of specific indicators can be found below:

  • Total Population - Highest to Lowest: The total population of each urban area, from highest to lowest. Unlike most other indicators on this list, the Total Population indicator was normalized on a logarithmic scale (ln(x)) to account for the much higher population of a few outliers.
  • Total Population - Lowest to Highest: The total population of each urban area, from lowest to highest. Unlike most other indicators on this list, the Total Population indicator was normalized on a logarithmic scale (ln(x)) to account for the much higher population of a few outliers.
  • Similar Age Groups - 18 to 34: The proportion of people in each urban area between the ages of 18 and 34, ranked from highest to lowest.
  • Similar Age Groups - 18 to 29: The proportion of people in each urban area between the ages of 18 and 29, ranked from highest to lowest.
  • Similar Age Groups - 22 to 29: The proportion of people in each urban area between the ages of 22 and 29, ranked from highest to lowest.
  • Similar Age Groups - 22 to 24: The proportion of people in each urban area between the ages of 22 and 24, ranked from highest to lowest.
  • Similar Age Groups - Under 18: The proportion of people in each urban area below the age of 18, ranked from highest to lowest.
  • Household Type - Single Occupant: The proportion of households with only one member in the urban area, ranked from highest to lowest.
  • Household Type - Non-family (all): The proportion of households in the urban area not composed of a family, ranked from highest to lowest.
  • Household Type - Family (all): The proportion of households in the urban area composed of a family, ranked from highest to lowest.
  • Racial and ethnic minority representation: The proportion of non-white and/or Hispanic people in an urban area, ranked from highest to lowest.
  • Minority group presence - Hispanic (all races): The proportion of people in an urban area who identify as Hispanic, ranked from highest to lowest.
  • Minority group presence - Black & African American: The proportion of people in an urban area who identify as black or African-American, ranked from highest to lowest.
  • Minority group presence - Asian: The proportion of people in an urban area who identify as Asian, ranked from highest to lowest.
  • Minority group presence - Native Hawaiian & Pacific Islander: The proportion of people in an urban area who identify as native Hawaiian or Pacific Islander, ranked from highest to lowest.
  • Minority group presence - Native American & Alaskan: The proportion of people in an urban area who identify as First-Nations, Native American, or Native Alaskan, ranked from highest to lowest.
  • Minority group presence - Other Races: The proportion of people in an urban area who identify with another race not listed on the census form, ranked from highest to lowest.
  • Minority group presence - Multi-racial: The proportion of people in an urban area who identify with more than one race, ranked from highest to lowest.

Housing:

The "Homes owned vs rented" indicator was derived from 2010 Census Data on tenure, located and downloaded ith the American FactFinder website. The housing and rental price indicators were derived from publicly-available Zillow housing data. The housing price data was sorted by city and manually matched with the urban areas used for the index. In cases where urban areas were composed of two or more cities, the values were averaged for each of the cities making up the urban area. Each of the housing expenses were determined by averaging all of the monthly values in 2014. Meanings of specific indicators can be found below:

  • Homes owned vs rented - Owned: The proportion of all housing units in the urban area that are owned by the tenant, ranked from highest to lowest.
  • Homes owned vs rented - Rented: The proportion of all housing units in the urban area that are rented by the tenant, ranked from highest to lowest.
  • Housing Prices - Bottom third: The average 2014 housing price in the Zillow "bottom tier" category, ranked from lowest to highest.
  • Housing Prices - Median: The average 2014 housing price in the Zillow "middle tier" category, ranked from lowest to highest.
  • Housing Prices - Top third: The average 2014 housing price in the Zillow "top tier" category, ranked from lowest to highest.
  • Average rental price (all units): The average monthly rental cost for all units in the urban area in 2014, according to the Zillow rent index.

Location-based:

Each of the three location-based indicators was derived from geographic data using a number of Geoprocessing tools in ArcGIS. The geographic data for US National Parks and National Forests was downloaded from the National Data Clearinghouse at Data.gov. The geographic data for major lakes and coastlines in North America was downloaded from the Natural Earth Large Scale vector dataset at naturalearthdata.org.

In order to determine the location of National Parks, coastline, and major lakes in relation to each urban area, three buffers were created around the centroid of each urban area. These three buffer layers, with radial distances of 20mi, 50mi, and 100mi, were each overlapped with the three geographies of interest using the ArcGIS "Intersect" function. The output of these nine intersects included only buffers that overlapped with the features of interest. I then joined each buffer-intersect back to the original shapefile in order to determine which cities were located less than 20 miles, 50 miles, and 100 miles from the features of interest.

Other indicators used for the City MetaScore are normalized on a scale from 0 to 1, with most cities having a score somewhere in the middle. However, with the locational indicators, the cities were either in range of a certain feature or not. As such, the locational indicators are either 0 or 1 for each city. This means that locational indicators tend to have a stronger effect on the overall City MetaScore, and should generally be assigned a lower weight than the demographic and housing indicators.

The locational indicators are described in more detail below:

  • National Parks: Describes whether or not any national parks or national forests can be found within 20mi, 50mi, or 100mi of the center of the urban area.
  • The Coast: Describes whether or not the city center is located within 20mi, 50mi, or 100mi of the coastline. The coastline does NOT include the US border with other countries or the Great Lakes.
  • Major Lakes: Describes whether or not the city center is located within 20mi, 50mi, or 100mi of any major lakes. Major lakes are included in the Natural Earth "lakes and reservoirs" dataset, and include most lakes in North America with an area greater than 1 square mile. The Great Lakes are included in this dataset, but oceans are NOT included.