Data analysis with client tools: QGIS¶
Thus far in class we’ve dealt with “fluid Earth” data sets, i.e., data as function of time, depth, and space. The tools we used were Matlab and python, and the applications were to do things like time-series analysis, tends, etc. Now we will look at a new type of data, one that relies on position.
This new type of data model is referred to as geospatial data, and we will learn geospatial information system (GIS) tools, specifically QGIS. For these types of studies, the data are different, as are the typically questions posed and the software used to investigate the data. For example, we’ve looked a sea level as a function of time and asked whether (or to what degree) there was a linear trend. The input data were ASCII columns or NetCDF, and we used Matlab or python to work with the data. Now we will use geo-referenced data to ask questions like “what is the nearest hospital” or “what percentage area of coral reefs would be affected by a 2-degree rise in SST”.
These types of questions require a different type of data, for example, maps, features (e.g., roads and buildings), and shapes. These GIS data generally fall into two categories, rasters and vectors. Both contain geospatial information (e.g., lat/lon, or distance x/y from some origin). Raster files are like images, more straightforward files with some feature as a function of x and y, for example an aerial image where each pixel (lat/lon) is a certain color. Common raster formats include jpegs, tiffs, Geotiffs, etc.
Vector files are more complex and include features (e.g., roads) along with geospatial information. They can also include information in a database. One example vector data model is a shapefile. Shapefiles typically come as a set, in the sense there are several different files with different extensions. An example shapefile may contain country information, e.g., coutries.shp. In addition to geographic boundaries, it could contain information on population, currency, GDP, etc. So the “countries” shapefile would also have a database file (countries.dbf), a projection file (countries.prj) and so on.
With these new data models and formats we will be using a new software tool called QGIS. The defacto standard in GIS application software is made by ESRI, e.g., ArcGIS. The ESRI products have a large license fee, while QGIS is free (and opensource). QGIS is available for a variety of platforms (Macs, PC, Linux, etc.). One note is that upgrades happen somewhat often, so some features described here might be different in older (or newer) versions.
QGIS has a graphical user interface (GUI) that, when started, has different panels for different functions. The figure below shows QGIS started with two sample layers, a vector file with airport locations (airports) and another one with country boundaries (cntry02). Ordinarily QGIS starts with a blank canvas.
The desktop can be divided into six different panels. Across the top (1) are the main function menus. All the QGIS options and functions are available via pull-down menus. Below this are shortcut icons (2). These provide quick access to some of the more common functions (e.g., file open, file save, load layer, etc.). Along the left (3) are similar icons for all the layer import functions (e.g., load vector layer, load raster layer, etc.). All the functions accessed via the icons in (2) and (3) are also available in the pull-downs (1).
The layers loaded appear in the “table of contents” (4). The TOC lists the layer name along with its symbology (symbol and color). There is a checkbox next to each layer that controls whether the layer is displayed. It should also be noted that these layers appear as listed, i.e., the first layer will be displayed on top of all other; the last layer listed on the bottom. The resulting layers and features are drawn in the “Map View” canvas (5). Finally, the map coordinates and projection are given in the bottom panel (6).
The first step will be to load data layers. The two in the figure above are found in cntry02.shp and airports.shp. Note these are both shapefiles, and can be loaded by clicking the small icon that looks like a “V” in panel (3), or by selecting “vector” “load vector layer” from the top panel (1). Once loaded these two will appear in the table of contents panel (4) and displayed in the map canvas panel (5).
Note that the symbology may differ from the figure above, but this is completely configurable. There are many options and functions in QGIS, and we can’t cover them all. Here we will just look at some of the common features available when working with shapefiles.
The first thing to cover is the file level operations and map controls. These are given as icons across the top of the QGIS window.
Again, everything is configurable, so these menu items may differ for different users. The set on the left include file operations “new file”, “file open”, “save”, etc. The center icons are used to control the map. The first, when clicked, allows the map (click-hold-drag) to be moved around the canvas. The next are options for zoom. Perhaps most helpful is “zoom to layer extent”, which helps when “lost”.
The right-most icons include options for interacting with the data. The first (an “I” in a blue circle) will show feature attributes. Others allow for searching/displaying/interacting with data in the file.
For most of these, it is important to remember that clicking the icon activates that function
One way to control these features is by editing the parameters in the shapefile. Right-clicking on a particular layer in the table of contents provides the following menu.
There are two main entries that we will make use of: “Properties” and “Open Attribute Table”. The first allows the user to change things like colors, symbols, etc., but also to add text labels. The second allows interaction with the underlying database. This includes things like queries, sorting, etc. These are described in more detail next. The remaining options can also be quite helpful, for example to remove layers, duplicate layers and so on.
Properties. The properties selection leads to a second menu that includes options to change general settings, styles (colors, symbols), text fields and more. The figure below shows this menu. Perhaps the most useful are “style” and “labels”. Under “style” users can change the symbols, layer transparency, and colors. There is additionally a pull-down menu for symbol type. Here, the options become “single symbol” (e.g., a dot), “categorized” (set levels), and “graduated”. For example, to color shade the countries by some value in the database, select graduated. The options then change to “column” where by the column in the database is chosen (e.g., to color shade by population pick “POP_CNTRY”). There is also an item for “classes” (i.e., number of levels to shade) and color ramp. Once selected, it is important to then click the button marked “Classify”; this will assign the color values to the number of classes identified (otherwise nothing shows up). When done, click on “Apply” and “OK” to view the result.
Open Attribute Table. The menu selection (right-click on layer) for open attribute table is available with certain file types (e.g., shapefiles) that have associated databases. When open, the attribute table appears like a spreadsheet. Most columns can be automatically sorted by simply clicking on them (first click sort up, second click sort down). It should also be noted that the table rows, if selected, will be highlighted in the map. This is way to find values from the table on the map. Conversely, items in the table can be “found” by clicking on the map. To do this, the “select single feature” menu item needs to be highlighted.
Geospatial or georeferenced data are usually used in mapping or geographic applications. While GIS tools can read ASCII tables (e.g., temperatures at various latitude and longitude points), most GIS data are either raster files or vector files.
Raster files are essentially tables, with the rows and columns being lat/lon locations and the “cell” or data value in each row/column representing some measurement or observation. Raster formats are varied, but the more popular ones are .jpg and .tif
Vector files are more complex and contain more information that a point-measurement. In some case, for example shape files, information is included in a series of files. As an example, a “shapefile” containing information about countries (countries.shp) would include a database file (countries.dbf), a geographic projection (countries.prj) and so on.
It’s still difficult to deal with more dimensions in GIS applications, for example data that are time-dependent as well as space. In addition, the types of analyses done are usually very different. Like GIS applications, many geophysical data analyses involve ASCII table data. Another very common format is NetCDF, a self-describing, machine-independent binary format. Matlab as a common tool also has a binary format.
Common raster file formats
Joint Photographic Experts Graphic: JPEG, JPG
Graphical Interchange Format: GIF
Portable Network Graphic: PNG
Tagged Image File Format: TIFF, TIF
Georeferenced TIFF: GeoTIF
Device Independent Bitmap: BMP
Docs
Portable Document Format: PDF
PostScript: PS
Vector formats
Binaries: shape, netcdf, mat, hdf, grib
QGIS:
Overview of program
How to start, stop
Page layout, top/left/pulldown menus
Example loading
Load countries
Right click properties
Symbology: lines, fill, etc.
Labels
Right click open attribute table
Load aiports
Move up/down in TOC
Turn on/off
Save it
Go to the Geoportal and find your data of interest. Here I’m looking at the General Plan of Kauai County. http://geoportal.hawaii.gov/datasets/general-plan-county-of-kauai
Click the API button drop down and copy the GeoService link – DO NOT use the OGC WFS link which will not pull in symbology set.my
QGIS, Add in the ArcGISFeatureServer Connection. The GeoService that you copied will be something like this: http://geodata.hawaii.gov/arcgis/rest/services/ParcelsZoning/MapServer/7/query?outFields=*&where=1%3D1. But you’ll want to only keep the text up until “…MapServer” and remove everything else after it. Otherwise you won’t be able to see anything layers when you add in the service.
The link to paste into the URL box: http://geodata.hawaii.gov/arcgis/rest/services/ParcelsZoning/MapServer
Get Oahu DEM from NOAA
Zoom to Oahu, look for DEM (5GB file!); add to cart, checkout, etc.
Get email with download
Load file into QGIS as GeoTiff
Duplicate layer and rename as “hillshade”
Right-click properties style
Set Band rendering type to “hillshade”
Band 1
Altitude 45 degrees
Azimuth 315
Z-factor: reset to 111100
Multidirectional checked
Brightness to 25, Saturation 0, Contrast 10
Resampling zoomed in bilinear, out average
Click apply and OK
Zoom to Diamond head
Right click on DEM
Rendered type: single pseudocolor
Make sure min/max values are correct for your DEM (15 to 143)
Color ramp: Use drop down menu >> Select all Color Ramps >> BrBG (this is the one I’m using). To invert color ramp, right click inside the color box >> Invert Color
Blending mode: Multiple (this produces better effects than Normal blending with transparency)
Resampling: again use Bilinear or Cubic to get smoother effect
Optional Transparency: change transparent if you want. I have it at 50%.
Create 3-D map view
In QGIS, go to View menu >> New 3D Map View
In the 3D Map 1 window: click the Configure button
Elevation: select your DEM or the hillshade
Make any changes to the other settings if you want
In the 3D Map 1 window: hold down the shift key and the left mouse button to zoom in/out and rotate
What is nursing home capacity by Borough in NYC?
Load nursing homes (OEM_NursingHomes)
Load New York Boroughs (nybb)
Can’t see nursing homes, how to change? click drag in toc
How to add text labels for Boroughs? properties, text
Want to find how many homes in each borough
Processing toolbox
On right, select “vector general” Join attributes by location (summary) [double click]
Parameters:
Input layer: NBB
Join layer: OEM
Check intersects
Field to summarize: capacity
Summaries to calculate: boroughs
RUN (get new layer)
Right click new layer properties symbology
Categorized
Column capacity sum
Color ramp
“classify” “apply” OK
Rainfall difference?
Load three vectors:
ccsm_polygons.shp
ppt_20C3M-180.0_180.0_-90.0_90.0.shp lat/lon and present day rainfall
ppt_SRESA2-180.0_180.0_-90.0_90.0.shp lat/lon and 20-year annual average rainfall
Select Vector Data Management Tools Joint by location
Input layer is CCSM polygons
Join layer is 20C3M
Take attributes of first feature
RUN
Select Vector Data Management Tools Joint by location
Input layer is joined layer
Join layer is SRES
Take attributes of first feature
Optionally add layer prefix
RUN (takes a while)
Unselect all the other layers, focus on new one
Open attribute table
Select edit mode, then calculator (field calculator)
Create new name (ppt_change)
Real number, precision 3
Select fields and values
Double click precip209912, then minus, then 199912
Click OK and edit mode off
Attributes of new field properties
Categorized too many, choose graduated
Add raster layer population2000
Select properties, psuedocolor, click missing, run from 10,000 to 100,000
Population impacted by 5-degree temperature change?
Load temperature change (vector)
Properties symbology Graduated
Column: tas_change
Symbol simple symbol no pen
Colormap: Magma; invert
Levels: 20
Apply; okay
Load countries file (vector)
Properties symbology Simple file fill style no brush
Load population (raster) population2000.tif
Right-click on temp, select filter
Double click tas_change
Click >=
Enter number (e.g., 4)
Okay
Right-click on temp, export save features as
Shapefile
Give name, e.g., 4deg_change
Raster Extraction click raster by mask layer
Input layer: population
Mask layer: 4deg_change
Run
Clipped (mask) properties symbology
Singleband pseudocolor
Min:100, max:10000
Color ramp: viridis
Clip out of range values