Chapter 8 Plotting spatial data
Together with timelines, maps belong to the most powerful graphs, perhaps because we can immediately relate where we are, or have been, on the space of the plot. Two recent books on visualisation (Healy 2018, Wilke (2019)) contain chapters on visualising geospatial data or maps. Here, we will not try to preach the do’s and don’ts of maps, but rather point out a number of possibilities how to do things, point out challenges along the way and ways to mitigate them.
8.1 Every plot is a projection
The world is round, but plotting devices are flat. As mentioned in section 7.3.1, any time we visualise, in any way, the world on a flat device, we project: we convert angular, geodetic coordinates into Cartesian coordinates. This includes the cases where we think we “do nothing” (figure 8.1, left), or where show the world “as it is”, e.g. as seen from space (figure 8.1, right).
The left plot of figure 8.1 was obtained by
library(sf) library(rnaturalearth) w <- ne_countries(scale = "medium", returnclass = "sf") plot(st_geometry(w))
and we see that this is the default projection for data with geodetic coordinates, as indicated by
st_crs(w) #> Coordinate Reference System: #> EPSG: 4326 #> proj4string: "+proj=longlat +datum=WGS84 +no_defs"
The projection taken here is the equirectangular (or equidistant cylindrical) projection, which maps longitude and latitude linear to the x and y axis, keeping an aspect ratio of 1. Were we to do this for smaller areas not on the equator, it makes sense to choose a plot ratio such that one distance unit E-W equals one distance unit N-S on the center of the plotted area.
We can also carry out this projection before plotting. Say we want to do this for Germany, then after loading the (rough) country outline, we use
st_transform to project:
DE = st_geometry(ne_countries(country = "germany", returnclass = "sf")) DE.eqc = st_transform(DE, "+proj=eqc +lat_ts=51.14 +lon_0=90w")
st_transform takes an
sfc object, and as second argument the projection. This can either be a number of a known EPSG projection, e.g. listed at http://spatialreference.org/ref/epsg/ , or a string describing the projection (
+proj=...) with further parameters. The parameter here is
lat_ts, the latitude of true scale (i.e., one length unit N-S equals one length unit E-W), which was here chosen as the middle of the bounding box latitudes
mean(st_bbox(DE)[c("ymin", "ymax")]) #>  51.1
When we now plot both maps (figure 8.2), they look the same up to their values along the axes: degrees for geodetic (left), and metres for Cartesian coordinates.
par(mfrow = c(1, 2)) plot(DE, axes = TRUE) plot(DE.eqc, axes = TRUE)
8.1.1 What is a good projection for my data?
There is unfortunately no silver bullet here. Projections that maintain all distances do not exist; only globes do. The mostly used projections try to preserve
- areas (equal area),
- directions (conformal, e.g. Mercator),
- some properties of distances (e.g. equirectangular preserves distances along meridians, azimuthal equidistant preserves distances to a central point)
or some compromise of these. Parameters of projections decide what is shown in the center of a map and what on the fringes, which areas are up and which are down, and which areas are most enlarged. All these choices are in the end political decisions.
It is often entertaining and at times educational to play around with the different projections and understand their consequences. When the primary purpose of the map however is not to entertain or educate projection varieties, it may be preferrable to choose a well-known or less surprising projection, and move the discussion which projection should be preferred to a decision process on its own.
8.1.2 Does projection always work?
No. Look for instance at the figure 8.1, right. Countries like the USA are half out-of-sight. Where is the California coast line drawn?
The PROJ string used here was
"+proj=ortho +lat_0=30 +lon_0=-10" and we can easily check what happens to a polygon that crosses the visible area by setting both parameters to 0:
sq = rbind(c(-89,0), c(-89,1), c(-91,1), c(-91,0), c(-89,0)) pol = st_sfc(st_polygon(list(sq)), crs = 4326) (pol.o = st_transform(pol, "+proj=ortho +lat_0=0 +lon_0=0"))[] #> POLYGON ((-6377166 0, -6376194 111314, -6377166 0)) st_is_valid(pol.o, NA_on_exception=FALSE) #> Error in CPL_geos_is_valid(st_geometry(x), as.logical(NA_on_exception)): Evaluation error: IllegalArgumentException: Invalid number of points in LinearRing found 3 - must be 0 or >= 4.
where we see that the polygon is not nicely cut along the visibility line, but that the invisible points are simply dropped. This leads in this case to an invalid geometry, and may in the case of 8.2 lead to straight lines that do not follow the map border circle.
How was figure 8.2 created? By using a rather ugly script that used a projected half-sphere circle to first cooky-cut the part of the countries that would remain visible on this projection. The script is available, and so is its output.
8.2 Plotting points, lines, polygons, grid cells
Since maps are just a special form of plots of statistical data, the usual rules hold. Frequently occuring challenges include:
- polygons may be very small, and vanish when plotted
- depending on the data, polygons for different features may well overlap, and be visible only partially; using transparent fill colors may help indentify them
- when points are plotted with symbols, they may easily overlap and be hidden; density maps (chapter 13) may be more helpful
- lines may be hard to read when coloured and may overlap regardless line width
When plotting polygons filled with colors, one has the choice to plot polygon boundaries, or to suppress these. If polygon boundaries draw too much attention, an alternative is to colour them in a grey tone, or another color that doesn’t interfere with the fill colors. When suppressing boundaries entirely, polygons with (nearly) identical colors will melt together. If the property indicating the fill color is constant over the region, such as land cover type, this is OK. If the property is an aggregation, the region over which it was aggregated gets lost. Especially for extensive variables, e.g. the amount of people living in a polygon, this strongly misleads. But even with polygon boundaries, using filled polygons for such variables may not be a good idea.
The use of continuous color scales for continuously varying variables may look attractive, but is often more fancy than useful:
- it impracticle to match a color on the map with a legend value
- colors ramps often stretch non-linearly over the value range
Only for cases where the identification of values is less important than the continuity of the map, such as the coloring of a high resolution digital terrain model, it does serve its goal.
8.3 Class intervals
When plotting continuous geometry attributes using a limited set of colors (or symbols), classes need to be made from the data. The R package
classInt (Bivand 2018a) provides a number of methods to do so. Using it is quite simple:
library(classInt) r = rnorm(100) classIntervals(r) #> style: quantile #> one of 1.49e+10 possible partitions of this variable into 8 classes #> [-2.61,-1.26) [-1.26,-0.356) [-0.356,-0.131) [-0.131,0.091) #> 13 12 13 12 #> [0.091,0.433) [0.433,0.623) [0.623,1.11) [1.11,2.76] #> 12 13 12 13 classIntervals(r)$brks #>  -2.612 -1.257 -0.356 -0.131 0.091 0.433 0.623 1.113 2.755
and it takes argument
n for the number of intervals, and a
style that can be one of “fixed”, “sd”, “equal”, “pretty”, “quantile”, “kmeans”, “hclust”, “bclust”, “fisher”, or “jenks”. Style “pretty” may not obey
n; if if
n is missing, ‘nclass.Sturges’ is used; two other methods are available for choosing
8.4 Poles and datelines
Given the linestring
(ls = st_sfc(st_linestring(rbind(c(-179.5, 52), c(179.5, 52))), crs = 4326)) #> Geometry set for 1 feature #> geometry type: LINESTRING #> dimension: XY #> bbox: xmin: -180 ymin: 52 xmax: 180 ymax: 52 #> epsg (SRID): 4326 #> proj4string: +proj=longlat +datum=WGS84 +no_defs #> LINESTRING (-180 52, 180 52)
How long a distance does it span? Let’s see:
st_length(ls) #> 68677 [m]
which seems sensible. But does
ls actually intersect with the dateline?
dateline = st_sfc(st_linestring(rbind(c(180, 51), c(180, 53))), crs = 4326) st_intersects(ls, dateline) #> although coordinates are longitude/latitude, st_intersects assumes that they are planar #> Sparse geometry binary predicate list of length 1, where the predicate was `intersects' #> 1: (empty)
… it seems not? How can this be? The warning said it all: if
ls is not in spherical coordinates, it means it follows the 52th parallel, crossing
(0, 52) half way. This is as if we drew a straight line between the two points on the left figure of 8.1, almost across the complete 52N parallel.
Where do these inconsistencies come from? The software
sf is built upon (see the C/C++ libraries box in figure 1.3) is a GIS software stack that originally targeted flat, 2D geometries. The simple feature standard assumes straight lines between points, but great circle segments are not straight.
The functions that deal with spherical geometry, such as
st_length, use PostGIS extensions in liblwgeom that were added later on to PostGIS, without rewriting the entire geometry core for geodetic coordinates. More recent systems, including Google’s S21, BigQuery GIS2 and Ubers H33 were written from scratch with global data in mind, and work exclusively with geodetic coordinates.
st_wrap_dateline function can be convenient,
(ls.w = st_wrap_dateline(ls))[] #> MULTILINESTRING ((-180 52, -180 52), (180 52, 180 52))
as it cuts any geometry crossing the dateline into MULTI-geometries of which the sub-geometries touch on, but no longer cross the dateline. This is in particular convenient for plotting geodetic coordinates using naive approaches such as that of figure 8.1 left, where they would have crossed the entire plotting area. Note that by cutting the line at
st_wrap_dateline does not follow a great circle; for this, it should be preceded by
st_segmentize, as e.g. in
(ls.w2 = st_wrap_dateline(st_segmentize(ls, units::set_units(30, km))))[] #> MULTILINESTRING ((-180 52, -180 52, -180 52, -180 52), (180 52, 180 52, 180 52))
Also note that bounding boxes like
st_bbox(ls.w) #> xmin ymin xmax ymax #> -180 52 180 52
simply take the coordinate ranges, and are pretty much meaningless as descriptors of the extent of a geometry for geometries that cross the dateline.
Similar notions hold for the poles; a polygon enclosing the North pole
pole = st_sfc(st_polygon(list(rbind(c(0,80), c(120,80), c(240,80), c(0,80)))), crs = 4326)
does not include the pole
st_intersects(pole, st_sfc(st_point(c(0,90)), crs = 4326)) #> although coordinates are longitude/latitude, st_intersects assumes that they are planar #> Sparse geometry binary predicate list of length 1, where the predicate was `intersects' #> 1: (empty)
(Cartesian interpretation) but has a positive area
st_area(pole) #> 1.63e+12 [m^2]
indicating again a geodetic interpretation.
8.5 Graticules and other navigation aids
Graticules are lines on a map that follow constant latitude or longitude values. On figure 8.1 left they are drawn in grey. Graticules are often drawn in maps to give reference where something is. In our first map in figure 1.1 we can read that the area plotted is near 35\(^o\) North and 80\(^o\) West. Had we plotted the lines in the projected coordinate system, they would have been straight and their actual numbers would not have been very informative, apart from giving an interpretation of size or distances when the unit is known, and familiar to the map reader. Graticules, by that, also shed light on which projection was used: equirectangular or Mercator projections will have straight vertical and horizontal lines, conic projections have straight but diverging meridians, equal area may have curved meridians
The real navigation aid on figure 8.1 and most other maps are geographical features like the state outline, country outlines, coast lines, rivers, roads, railways and so on. If these are added sparsely and sufficiently, graticules can as well be omitted. In such cases, maps look good without axes, tics, and labels, leaving up a lot of plotting space to be filled with actual map data.
Healy, Kieran. 2018. Data Visualization, a Practical Introduction. Princeton University Press. http://socviz.co/index.html.
Wilke, Claus O. 2019. Fundamentals of Data Visualization. O’Reilly Media, Inc. https://serialmentor.com/dataviz/.
Bivand, Roger. 2018a. ClassInt: Choose Univariate Class Intervals. https://CRAN.R-project.org/package=classInt.