name: inter-slide class: left, middle, inverse {{ content }} --- name: layout-general layout: true class: left, middle <style> .remark-slide-number { position: inherit; } .remark-slide-number .progress-bar-container { position: absolute; bottom: 0; height: 4px; display: block; left: 0; right: 0; } .remark-slide-number .progress-bar { height: 100%; background-color: red; } </style>
--- class: middle, left, inverse # Exploratory Data Analysis XIII Spatial Data ### 2021-04-12 #### [EDA Master I MIDS et MFA](http://stephane-v-boucheron.fr/courses/eda) #### [Stéphane Boucheron](http://stephane-v-boucheron.fr) --- class: middle, inverse ## <svg style="height:0.8em;top:.04em;position:relative;fill:white;" viewBox="0 0 576 512"><path d="M0 117.66v346.32c0 11.32 11.43 19.06 21.94 14.86L160 416V32L20.12 87.95A32.006 32.006 0 0 0 0 117.66zM192 416l192 64V96L192 32v384zM554.06 33.16L416 96v384l139.88-55.95A31.996 31.996 0 0 0 576 394.34V48.02c0-11.32-11.43-19.06-21.94-14.86z"/></svg> ### [Motivation](#bigpic) ### [](#) --- template: inter-slide name: bigpic ## Why spatial data? --- ### ```r q = opq(bbox = "Gueret, France") q = add_osm_feature(q, key = "highway") dat = osmdata_sf(q) lines = dat$osm_lines pol = dat$osm_polygons pol = st_cast(pol, "MULTILINESTRING") pol = st_cast(pol, "LINESTRING") lines = rbind(lines, pol) lines = lines[, "highway"] lines = st_transform(lines, 32636) plot(lines, key.pos = 4, key.width = lcm(4), main = "") ``` --- ### Mapview ```r states <- st_read("DATA/geo_data/USA_2_GADM_fips.shp") ``` ``` ## Reading layer `USA_2_GADM_fips' from data source `/Users/stephaneboucheron/MEDOCS/TEACHING_SLIDES/EDA/DATA/geo_data/USA_2_GADM_fips.shp' using driver `ESRI Shapefile' ## Simple feature collection with 3103 features and 4 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: -124.7628 ymin: 24.52042 xmax: -66.94889 ymax: 49.3833 ## Geodetic CRS: WGS 84 ``` ```r mapview(states, zcol = "NAME_1") ```
--- exclude: true ```r # knitr::include_url("http://spatial.ly/2012/02/great-maps-ggplot2/") ``` --- template: inter-slide ## Rasters --- ### `stars` - spatial data structure for single band and multi-band rasters (`class stars`) - access the cell values and other properties of rasters - read and write raster data --- ### Definition > A raster is a matrix or an array, representing a rectangular area on the surface of the earth. > To associate the matrix or the array with the particular area it represents, the raster has some additional spatial properties, on top of the non-spatial properties that any ordinary matrix or array has: - Non-spatial properties + Values + Dimensions (rows, columns, layers) - Spatial properties + Extent + Coordinate Reference System (CRS) + (Resolution) > a matrix is used to store single-band raster values in a stars raster object [What is raster data?](https://desktop.arcgis.com/en/arcmap/10.3/manage-data/raster-and-images/what-is-raster-data.htm) --- ### Coordinate Reference System (CRS) > The Coordinate Reference System (CRS) is the particular system that “associates” the raster coordinates (which are just pairs of x/y values) to geographic locations. Raster resolution is the size of a raster cell, in the x and y directions. The resolution is listed in parentheses because it can be calculated given the extent and the number of rows and columns --- ### Raster File Formats Commonly used raster file formats can be divided in two groups. - “Simple” raster file formats, such as GeoTIFF, are single-band or multi-band rasters with a geo-referenced extent, as discussed above. - “Complex” raster file formats, such as HDF, contain additional complexity, such as more than three dimensions, and/or metadata, such as band names, time stamps, units of measurement, and so on. --- ### - [An example of an HDF file structure](http://matthewrocklin.com/blog/work/2018/02/06/hdf-in-the-cloud) - --- ### Volcano contour plot ```r image(volcano, col = terrain.colors(30), asp = ncol(volcano) / nrow(volcano)) contour(volcano, add = TRUE) ``` <div class="figure"> <img src="cm-13-EDA_files/figure-html/unnamed-chunk-5-1.png" alt=" Volcano image with contours" width="504" /> <p class="caption"> Volcano image with contours</p> </div> -- ### Topographic Information on Auckland's Maunga Whau Volcano #### Description: Maunga Whau (Mt Eden) is one of about 50 volcanos in the Auckland volcanic field. This data set gives topographic information for Maunga Whau on a 10m by 10m grid. Usage: `volcano` Format: A `matrix` with 87 rows and 61 columns, rows corresponding to grid lines running east to west and columns to grid lines running south to north. Source: Digitized from a topographic map by Ross Ihaka. .fr[From the documentation `?volcano`] --- ### Loading rasters ```r s <- stars::read_stars("DATA/geo_data/dem.tif") r <- stars::read_stars("DATA/geo_data/MOD13A3_2000_2019.tif") names(r) <- "NDVI" ``` --- ### About `stars` [`stars` vignettes](https://r-spatial.github.io/stars/articles/) Package stars provides infrastructure for data cubes, array data with labeled dimensions, with emphasis on arrays where some of the dimensions relate to time and/or space. Spatial data cubes are arrays with one or more spatial dimensions. Raster data cubes have at least two spatial dimensions that form the raster tesselation. Vector data cubes have at least one spatial dimension that may for instance reflect a polygon tesselation, or a set of point locations. Conversions between the two (rasterization, polygonization) are provided. Vector data are represented by simple feature geometries (packages sf). Tidyverse methods are provided. --- ```r attributes(r)[[2]] ``` ``` ## from to offset delta refsys point values x/y ## x 1 167 3239946 926.625 unnamed FALSE NULL [x] ## y 1 485 3717158 -926.625 unnamed FALSE NULL [y] ## band 1 233 NA NA NA NA NULL ``` --- ```r plot(r[,,,2]) ``` <img src="cm-13-EDA_files/figure-html/unnamed-chunk-8-1.png" width="504" /> ```r # mapview::mapview(r[,,,2]) ``` --- ```r # cubeview(r) ``` --- ```r st_crs(r) ``` ``` ## Coordinate Reference System: ## User input: unnamed ## wkt: ## PROJCRS["unnamed", ## BASEGEOGCRS["unnamed ellipse", ## DATUM["unknown", ## ELLIPSOID["unnamed",6371007.181,0, ## LENGTHUNIT["metre",1, ## ID["EPSG",9001]]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433, ## ID["EPSG",9122]]]], ## CONVERSION["Sinusoidal", ## METHOD["Sinusoidal"], ## PARAMETER["Longitude of natural origin",0, ## ANGLEUNIT["degree",0.0174532925199433], ## ID["EPSG",8802]], ## PARAMETER["False easting",0, ## LENGTHUNIT["metre",1], ## ID["EPSG",8806]], ## PARAMETER["False northing",0, ## LENGTHUNIT["metre",1], ## ID["EPSG",8807]]], ## CS[Cartesian,2], ## AXIS["easting",east, ## ORDER[1], ## LENGTHUNIT["metre",1, ## ID["EPSG",9001]]], ## AXIS["northing",north, ## ORDER[2], ## LENGTHUNIT["metre",1, ## ID["EPSG",9001]]]] ``` ```r st_bbox(r) ``` ``` ## xmin ymin xmax ymax ## 3239946 3267745 3394692 3717158 ``` --- ### Subsetting raster objects The stars subset operator `[` works as follows: - The 1st index selects attributes - The 2nd index selects the first dimension, typically `[x]`, i.e., raster **columns** - The 3rd index selects the second dimension, typically `[y]`, i.e., raster **rows** - The 4th index selects the _thirds_ dimension (if there is one), typically band, i.e., _raster layers_ (Subsequent indices, if any, select the remaining dimensions) ```r attributes(r[,1,1,1]) ``` ``` ## $names ## [1] "NDVI" ## ## $dimensions ## from to offset delta refsys point values x/y ## x 1 1 3239946 926.625 unnamed FALSE NULL [x] ## y 1 1 3717158 -926.625 unnamed FALSE NULL [y] ## band 1 1 NA NA NA NA NULL ## ## $class ## [1] "stars" ``` ```r str(r[,10,100,1]) ``` ``` ## List of 1 ## $ NDVI: num [1, 1, 1] NA ## - attr(*, "dimensions")=List of 3 ## ..$ x :List of 7 ## .. ..$ from : int 10 ## .. ..$ to : int 10 ## .. ..$ offset: num 3239946 ## .. ..$ delta : num 927 ## .. ..$ refsys:List of 2 ## .. .. ..$ input: chr "unnamed" ## .. .. ..$ wkt : chr "PROJCRS[\"unnamed\",\n BASEGEOGCRS[\"unnamed ellipse\",\n DATUM[\"unknown\",\n ELLIPSOID[\"| __truncated__ ## .. .. ..- attr(*, "class")= chr "crs" ## .. ..$ point : logi FALSE ## .. ..$ values: NULL ## .. ..- attr(*, "class")= chr "dimension" ## ..$ y :List of 7 ## .. ..$ from : int 100 ## .. ..$ to : int 100 ## .. ..$ offset: num 3717158 ## .. ..$ delta : num -927 ## .. ..$ refsys:List of 2 ## .. .. ..$ input: chr "unnamed" ## .. .. ..$ wkt : chr "PROJCRS[\"unnamed\",\n BASEGEOGCRS[\"unnamed ellipse\",\n DATUM[\"unknown\",\n ELLIPSOID[\"| __truncated__ ## .. .. ..- attr(*, "class")= chr "crs" ## .. ..$ point : logi FALSE ## .. ..$ values: NULL ## .. ..- attr(*, "class")= chr "dimension" ## ..$ band:List of 7 ## .. ..$ from : int 1 ## .. ..$ to : int 1 ## .. ..$ offset: num NA ## .. ..$ delta : num NA ## .. ..$ refsys: chr NA ## .. ..$ point : logi NA ## .. ..$ values: NULL ## .. ..- attr(*, "class")= chr "dimension" ## ..- attr(*, "raster")=List of 3 ## .. ..$ affine : num [1:2] 0 0 ## .. ..$ dimensions : chr [1:2] "x" "y" ## .. ..$ curvilinear: logi FALSE ## .. ..- attr(*, "class")= chr "stars_raster" ## ..- attr(*, "class")= chr "dimensions" ## - attr(*, "class")= chr "stars" ``` --- template: inter-slide ## Vectors --- ### Data structures for vector layers: - points, - lines and - polygons #### Spatial and non-spatial properties of vector layers #### Transform a layer from one Coordinate Reference System (CRS) to another ??? We will use the following R packages: sf mapview stars --- ### Definition: Vector layers Vector layers are essentially sets of _geometries_ associated with non-spatial attributes The geometries are sequences of one or more _point coordinates_, possibly connected to form _lines_ or _polygons_ The non-spatial attributes come in the form of a table. --- ### Commonly used vector layer file formats - binary formats, such as the ESRI Shapefile (.shp) - plain text formats, such as + .GeoJSON (.json or .geojson) + GPS Exchange Format (.gpx) Vector layers are also frequently kept in a spatial database, such as PostgreSQL/PostGIS. --- ### `sp` Classes for vector layers (legacy) The `sp` package defines 6 main classes for vector layers One for each geometry type (points, lines, polygons) One for geometry only and one for geometry with attributes | Class | Geometry type | Attributes |---------------|--------|------| | SpatialPoints | Points | - | | SpatialPointsDataFrame | Points | data.frame| | SpatialLines | Lines | - | | SpatialLinesDataFrame | Lines | data.frame | | SpatialPolygons | Polygons | - | | SpatialPolygonsDataFrame | Polygons | data.frame | --- ### `sf` `sf` has become the standard package for working with vector data in R, practically replacing `sp`, `rgdal`, and `rgeos` > The Simple Features standard defines several types of geometries, of which seven are most commonly used in the world of GIS and spatial data analysis > When working with spatial databases, Simple Features are commonly specified as Well Known Text (WKT). A subset of simple features forms the GeoJSON standard --- ### a hierarchical class system with three classes (Table 7.3): - Class `sfg` —a single geometry - Class `sfc` —a geometry column, which is a set of `sfg` geometries + CRS information - Class `sf` —a layer, which is an `sfc` geometry column inside a `data.frame` with non-spatial attributes --- ### ```r a <- st_polygon(list(cbind(c(0,0,7.5,7.5,0), c(0,-1,-1,0,0)) ) ) a ``` ``` ## POLYGON ((0 0, 0 -1, 7.5 -1, 7.5 0, 0 0)) ``` ```r # plot(a, col="yellow", border = "blue", lwd = .5) ``` --- template: inter-slide ## --- --- template: inter-slide ## References --- ### <svg style="height:0.8em;top:.04em;position:relative;fill:code;" viewBox="0 0 640 512"><path d="M255.03 261.65c6.25 6.25 16.38 6.25 22.63 0l11.31-11.31c6.25-6.25 6.25-16.38 0-22.63L253.25 192l35.71-35.72c6.25-6.25 6.25-16.38 0-22.63l-11.31-11.31c-6.25-6.25-16.38-6.25-22.63 0l-58.34 58.34c-6.25 6.25-6.25 16.38 0 22.63l58.35 58.34zm96.01-11.3l11.31 11.31c6.25 6.25 16.38 6.25 22.63 0l58.34-58.34c6.25-6.25 6.25-16.38 0-22.63l-58.34-58.34c-6.25-6.25-16.38-6.25-22.63 0l-11.31 11.31c-6.25 6.25-6.25 16.38 0 22.63L386.75 192l-35.71 35.72c-6.25 6.25-6.25 16.38 0 22.63zM624 416H381.54c-.74 19.81-14.71 32-32.74 32H288c-18.69 0-33.02-17.47-32.77-32H16c-8.8 0-16 7.2-16 16v16c0 35.2 28.8 64 64 64h512c35.2 0 64-28.8 64-64v-16c0-8.8-7.2-16-16-16zM576 48c0-26.4-21.6-48-48-48H112C85.6 0 64 21.6 64 48v336h512V48zm-64 272H128V64h384v256z"/></svg> Packages - rgeos - sp - rgdal - raster - sf - gstat - geosphere - spdep - osmdata --- ### <svg style="height:0.8em;top:.04em;position:relative;fill:code;" viewBox="0 0 448 512"><path d="M448 360V24c0-13.3-10.7-24-24-24H96C43 0 0 43 0 96v320c0 53 43 96 96 96h328c13.3 0 24-10.7 24-24v-16c0-7.5-3.5-14.3-8.9-18.7-4.2-15.4-4.2-59.3 0-74.7 5.4-4.3 8.9-11.1 8.9-18.6zM128 134c0-3.3 2.7-6 6-6h212c3.3 0 6 2.7 6 6v20c0 3.3-2.7 6-6 6H134c-3.3 0-6-2.7-6-6v-20zm0 64c0-3.3 2.7-6 6-6h212c3.3 0 6 2.7 6 6v20c0 3.3-2.7 6-6 6H134c-3.3 0-6-2.7-6-6v-20zm253.4 250H96c-17.7 0-32-14.3-32-32 0-17.6 14.4-32 32-32h285.4c-1.9 17.1-1.9 46.9 0 64z"/></svg> Books - [Introduction to Spatial Data Programming with R](http://132.72.155.230:3838/r/) - [rspatial.org](https://rspatial.org) - [Spatial Data Science](https://spatialanalysis.github.io/tutorials/) - [Using Spatial Data with R](https://cengel.github.io/R-spatial/) - [Geocomputation with R](https://geocompr.robinlovelace.net/) - [Spatial Data Science](https://www.r-spatial.org/book/) See extended bibliography at [Introduction to Spatial Data Programming with R by M. Dorman](http://132.72.155.230:3838/r/) --- ### Tutorials See [Tutorials list from M. Dorman](http://132.72.155.230:3838/r/#courses) --- class: middle, center, inverse background-image: url('./img/pexels-cottonbro-3171837.jpg') background-size: cover # The End