Chapter 3 Geometries

Having learned how we describe spaces, we can define how geometries can be described in these space. This chapter will mostly explain the geometries for simple features, and introduce the three classes sfg, sfc and sf for single geometries, geometry sets, and geometry sets with associated attributes.

3.1 Simple feature geometry types

Simple feature geometries are a way to describe the geometries of features. By features we mean things that have a geometry, some time properties, and other attributes. The main application of simple feature geometries is to describe two-dimensional geometries by points, lines, or polygons. The “simple” adjective refers to the fact that the line or polygon geometries are represented by sequences of points connected with straight lines.

Simple features access is a standard (Herring 2011, Herring (2010), ISO (2004)) for describing simple feature geometries that includes

  • a class hierarchy
  • a set of operations
  • binary and text encodings

We will now discuss the seven most common simple feature geometry types. Although in practice we will most often import spatial data from external sources (files, databases, web services), we will create them here from scratch using simple constructor functions.

3.1.1 The big seven

The most commonly used simple features geometries, used to represent a single feature are:

type description
POINT single point geometry
MULTIPOINT set of points
LINESTRING single linestring (two or more points connected by straight lines)
MULTILINESTRING set of linestrings
POLYGON exterior ring with zero or more inner rings, denoting holes
MULTIPOLYGON set of polygons
GEOMETRYCOLLECTION set of the geometries above

Points in a geometry contain at least two coordinates: x and y, in that order.

3.1.2 Valid geometries

Valid geometries obey the following properties:

  • linestrings shall not self-intersect
  • polygon rings shall be closed (the last point equals the first)
  • polygon holes (inner rings) shall be inside their exterior ring
  • polygon inner rings shall maximally touch the exterior ring in single points, not over a line
  • a polygon ring shall not repeat its own path

If this is not the case, the geometry concerned is not valid.

3.1.3 Z and M

In addition to X and Y coordinates, Single points (vertices) of simple feature geometries can have

  • a Z coordinate, denoting altitude, and/or
  • an M value, denoting some “measure”

The M attribute shall be a property of the vertex. It sounds attractive to encode a time stamp in it, e.g. to pack trajectories in LINESTRINGs. These become however invalid once the trajectory self-intersects.

Both Z and M are found relatively rarely, and software support to do something useful with them is (still) rather rare.

3.1.4 Ten further geometry types

There are 10 more geometry types which are more rare, but increasingly find implementation:

type description
CIRCULARSTRING The CIRCULARSTRING is the basic curve type, similar to a LINESTRING in the linear world. A single segment requires three points, the start and end points (first and third) and any other point on the arc. The exception to this is for a closed circle, where the start and end points are the same. In this case the second point MUST be the center of the arc, ie the opposite side of the circle. To chain arcs together, the last point of the previous arc becomes the first point of the next arc, just like in LINESTRING. This means that a valid circular string must have an odd number of points greated than 1.
COMPOUNDCURVE A compound curve is a single, continuous curve that has both curved (circular) segments and linear segments. That means that in addition to having well-formed components, the end point of every component (except the last) must be coincident with the start point of the following component.
CURVEPOLYGON Example compound curve in a curve polygon: CURVEPOLYGON(COMPOUNDCURVE(CIRCULARSTRING(0 0,2 0, 2 1, 2 3, 4 3),(4 3, 4 5, 1 4, 0 0)), CIRCULARSTRING(1.7 1, 1.4 0.4, 1.6 0.4, 1.6 0.5, 1.7 1) )
MULTICURVE A MultiCurve is a 1-dimensional GeometryCollection whose elements are Curves, it can include linear strings, circular strings or compound strings.
MULTISURFACE A MultiSurface is a 2-dimensional GeometryCollection whose elements are Surfaces, all using coordinates from the same coordinate reference system.
CURVE A Curve is a 1-dimensional geometric object usually stored as a sequence of Points, with the subtype of Curve specifying the form of the interpolation between Points
SURFACE A Surface is a 2-dimensional geometric object
POLYHEDRALSURFACE A PolyhedralSurface is a contiguous collection of polygons, which share common boundary segments
TIN A TIN (triangulated irregular network) is a PolyhedralSurface consisting only of Triangle patches.
TRIANGLE A Triangle is a polygon with 3 distinct, non-collinear vertices and no interior boundary

Note that CIRCULASTRING, COMPOUNDCURVE and CURVEPOLYGON are not described in the SFA standard, but in the SQL-MM part 3 standard. The descriptions above were copied from the PostGIS manual.

3.1.5 Encodings

Part of the simple feature standard are two encodings: a text and a binary encoding. The text strings POINT (0 1) and so on indicate text encodings, also known as well-known text (WKT) encodings, of simple feature geometries. They are meant to be human-readable.

3.2 Simple features in sf

This section describes the implementation of simple feature geometries in package sf. It will first explain how single simple feature geometries, explained in the previous section, are represented in R objects of class sfg. Next, it will explain how sets of simple feature geometry objects are collected in a list of class sfc. This list acts as a geometry list-column in data.frame objects, of class sf.

3.2.1 sfg: simple feature geometry

Point sets are stored as numeric matrix, with 2 (XY), 3 (XYZ or XYM) or 4 (XYZM) columns, with a points in each row. Individual simple feature geometry objects are implemented as:

  • numeric vector for POINT,
  • numeric matrix for MULTIPOINT and LINESTRING
  • list of numeric matrices for MULTILINESTRING and POLYGON
  • list of lists of numeric matrices for MULTIPOLYGON
  • list of (typed) geometries for GEOMETRYCOLLECTION

All other geometry types follow this, using the simplest possible option. Note that matrices can have zero points, and lists can have zero elements, in which case we have empty geometries; more about this in section 3.2.1.2.

Objects have a class indicating their dimension, type, and a superclass (sfg: simple feature geometry), and have no other attributes than their S3 class:

(pt = st_point(c(0,1)))
#> POINT (0 1)
attributes(pt)
#> $class
#> [1] "XY"    "POINT" "sfg"

We see that in addition to sfg the class attribute has two values:

  • XY telling the dimension of the point(s), can also be XYZ, XYM or XYZM
  • POINT revealing the geometry type.

Examples of XYZ and XYM and XYZM geometries are found here:

system.file("shape/storms_xyz_feature.shp", package="sf") %>%
    st_read()
#> Reading layer `storms_xyz_feature' from data source `/home/edzer/R/x86_64-pc-linux-gnu-library/3.5/sf/shape/storms_xyz_feature.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 71 features and 1 field
#> geometry type:  LINESTRING
#> dimension:      XYZ
#> bbox:           xmin: -102 ymin: 8.3 xmax: 0 ymax: 59.5
#> epsg (SRID):    NA
#> proj4string:    NA
system.file("shape/storms_xyzm_feature.shp", package="sf") %>% # badly named!
    st_read()
#> Reading layer `storms_xyzm_feature' from data source `/home/edzer/R/x86_64-pc-linux-gnu-library/3.5/sf/shape/storms_xyzm_feature.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 71 features and 1 field
#> geometry type:  LINESTRING
#> dimension:      XYM
#> bbox:           xmin: -102 ymin: 8.3 xmax: 0 ymax: 59.5
#> epsg (SRID):    NA
#> proj4string:    NA
(pzm = st_point(c(1,2,3,4)))
#> POINT ZM (1 2 3 4)

A MULTIPOINT or a LINESTRING can be created by a matrix

(m1 = rbind(c(8, 1), c(2, 5), c(3, 2)))
#>      [,1] [,2]
#> [1,]    8    1
#> [2,]    2    5
#> [3,]    3    2
(mp = st_multipoint(m1))
#> MULTIPOINT (8 1, 2 5, 3 2)
(ls = st_linestring(m1))
#> LINESTRING (8 1, 2 5, 3 2)

Although these geometries contain the same points, they have entirely different meaning: the point set is a zero-dimensional, the line a one-dimensional geometry:

st_dimension(mp)
#> [1] 0
st_dimension(ls)
#> [1] 1

A MULTILINESTRING can be constructed from a list of matrices, representing vertices:

m2 = rbind(c(22,20), c(18, 15))
(mls = st_multilinestring(list(m1, m2)))
#> MULTILINESTRING ((8 1, 2 5, 3 2), (22 20, 18 15))

A POLYGON consists of an outer ring, followed by zero or more inner rings that denote holes in the outer ring:

(ring1 = rbind(c(0,0), c(4,0), c(4,4), c(0,4), c(0,0)))
#>      [,1] [,2]
#> [1,]    0    0
#> [2,]    4    0
#> [3,]    4    4
#> [4,]    0    4
#> [5,]    0    0
(p1 = st_polygon(list(ring1)))
#> POLYGON ((0 0, 4 0, 4 4, 0 4, 0 0))
(ring2 = ring1 + 5)
#>      [,1] [,2]
#> [1,]    5    5
#> [2,]    9    5
#> [3,]    9    9
#> [4,]    5    9
#> [5,]    5    5
(ring3 = (ring1[5:1,] / 4) + 6)
#>      [,1] [,2]
#> [1,]    6    6
#> [2,]    6    7
#> [3,]    7    7
#> [4,]    7    6
#> [5,]    6    6
(p2 = st_polygon(list(ring2, ring3)))
#> POLYGON ((5 5, 9 5, 9 9, 5 9, 5 5), (6 6, 6 7, 7 7, 7 6, 6 6))

A MULTIPOLYGON can be constructed as a list of lists of matrices:

(mpol = st_multipolygon(list(list(ring1), list(ring2, ring3))))
#> MULTIPOLYGON (((0 0, 4 0, 4 4, 0 4, 0 0)), ((5 5, 9 5, 9 9, 5 9, 5 5), (6 6, 6 7, 7 7, 7 6, 6 6)))

And finally, a GEOMETRYCOLLECTION can be constructed from a list of typed geometries:

st_geometrycollection(list(pt, mp, ls, mpol))
#> GEOMETRYCOLLECTION (POINT (0 1), MULTIPOINT (8 1, 2 5, 3 2), LINESTRING (8 1, 2 5, 3 2), MULTIPOLYGON (((0 0, 4 0, 4 4, 0 4, 0 0)), ((5 5, 9 5, 9 9, 5 9, 5 5), (6 6, 6 7, 7 7, 7 6, 6 6))))

3.2.1.1 WKT, WKB encodings

By default, package sf prints the same number of digits as R, but this can be manipulated:

st_point(c(1/3, 2/3))
#> POINT (0.333 0.667)
print(st_point(c(1/3, 2/3)), digits = 16)
#> POINT (0.3333333333333333 0.6666666666666666)
print(st_point(c(1/3, 2/3)), digits = 3)
#> POINT (0.333 0.667)

An encoding that is more useful for machine-to-machine communication is well-known binary. An example of a round-trip R \(\rightarrow\) binary \(\rightarrow\) R is

(wkb = st_as_binary(st_point(c(1/3, 2/3))))
#>  [1] 01 01 00 00 00 55 55 55 55 55 55 d5 3f 55 55 55 55 55 55 e5 3f
st_as_sfc(wkb)[[1]]
#> POINT (0.333 0.667)

Object r is a raw vector, which is little useful in R. Binary conversion is used to communicate geometries to external libraries (GDAL, GEOS, liblwgeom) and spatial databases because it is fast and lossless. Whenever there is a choice, binary encoding should be prefered over text encoding.

3.2.1.2 simple, valid, empty

Methods st_is_simple and st_is_valid help detect non-simple and non-valid geometries:

st_is_simple(st_linestring(rbind(c(0,0), c(1,1), c(1,0), c(0,1)))) # self-intersects
#> [1] FALSE
st_is_valid(st_polygon(list(rbind(c(1,1), c(0,0), c(1,1), c(2,2), c(2,1), c(1,1))))) # repeats
#> [1] FALSE

A very important concept in the feature geometry framework is that of the empty geometry. We can think of an empty geometry as similar to the NA value in R vectors: it is a placeholder, but a usable value is not available. Empty geometries arise naturally when we do geometrical operations (chapter 5), for instance when we want to know where two disjoint geometries coincide:

(e = st_intersection(st_point(c(0,0)), st_point(c(1,1))))
#> GEOMETRYCOLLECTION EMPTY

It is not entirely clear what the benefit is of having typed empty geometries, but according to the simple feature standard they are. They are detected by

st_is_empty(e)
#> [1] TRUE

3.2.1.3 Conversion between geometry types

Up to the extent that a conversion is feasible, we can convert simple feature geometries using the st_cast generic:

methods(st_cast)
#>  [1] st_cast.CIRCULARSTRING*     st_cast.COMPOUNDCURVE*     
#>  [3] st_cast.CURVE*              st_cast.GEOMETRYCOLLECTION*
#>  [5] st_cast.LINESTRING*         st_cast.MULTILINESTRING*   
#>  [7] st_cast.MULTIPOINT*         st_cast.MULTIPOLYGON*      
#>  [9] st_cast.MULTISURFACE*       st_cast.POINT*             
#> [11] st_cast.POLYGON*            st_cast.sf*                
#> [13] st_cast.sfc*                st_cast.sfc_CIRCULARSTRING*
#> see '?methods' for accessing help and source code

Conversion is required e.g. to be able to plot curved geometries. CURVE, COMPOUNDCURVE and CIRCULARSTRING have st_cast methods to cast them to LINESTRING; MULTISURFACE has an st_cast method to MULTIPOLYGON. An example, needed for plotting, is

(ls <- st_as_sfc("CIRCULARSTRING(0 0,1 0,1 1)") %>% st_cast("LINESTRING"))
#> Geometry set for 1 feature 
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: 0 ymin: -0.207 xmax: 1.21 ymax: 1
#> epsg (SRID):    NA
#> proj4string:    NA
#> LINESTRING (0 0, 0.0361 -0.0337, 0.0745 -0.0647...
plot(ls, axes = TRUE)

It is convenient in other cases to analyse the point pattern from a set of vertices in a linestring. However,

rbind(c(0,0), c(1,1), c(1,0), c(0,1)) %>% st_linestring() %>% st_cast("POINT")
#> Warning in st_cast.LINESTRING(., "POINT"): point from first coordinate only
#> POINT (0 0)

does not what we expect, because it will convert a single geometry into a new single geometry. We can convert to a MULTIPOINT

rbind(c(0,0), c(1,1), c(1,0), c(0,1)) %>% st_linestring() %>% st_cast("POINT")
#> Warning in st_cast.LINESTRING(., "POINT"): point from first coordinate only
#> POINT (0 0)

but if we want to have a set of points, we need to work with sets (section 3.2.2) first, because we want a set with another cardinality:

(p <- rbind(c(0,0), c(1,1), c(1,0), c(0,1)) %>% st_linestring() %>% st_sfc() %>% st_cast("POINT"))
#> Geometry set for 4 features 
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 0 ymin: 0 xmax: 1 ymax: 1
#> epsg (SRID):    NA
#> proj4string:    NA
#> POINT (0 0)
#> POINT (1 1)
#> POINT (1 0)
#> POINT (0 1)

3.2.1.4 GEOMETRYCOLLECTION handling

Single features can have a geometry that consists of several subgeometries of different type, held in a GEOMETRYCOLLECTION. This may sound like looking for trouble, but these arise rather naturally when looking for intersections. For instance, the intersection of two LINESTRING geometries may be the combination of a LINESTRING and a POINT. Putting this intersection into a single feature geometry needs a GEOMETRYCOLLECTION.

In case we end up with GEOMETRYCOLLECTION objects, the next question is often what to do with them. One thing we can do is extract elements from them:

pt <- st_point(c(1, 0))
ls <- st_linestring(matrix(c(4, 3, 0, 0), ncol = 2))
poly1 <- st_polygon(list(matrix(c(5.5, 7, 7, 6, 5.5, 0, 0, -0.5, -0.5, 0), ncol = 2)))
poly2 <- st_polygon(list(matrix(c(6.6, 8, 8, 7, 6.6, 1, 1, 1.5, 1.5, 1), ncol = 2)))
multipoly <- st_multipolygon(list(poly1, poly2))

j <- st_geometrycollection(list(pt, ls, poly1, poly2, multipoly))

st_collection_extract(j, "POLYGON")
#> Geometry set for 3 features 
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 5.5 ymin: -0.5 xmax: 8 ymax: 1.5
#> epsg (SRID):    NA
#> proj4string:    NA
#> MULTIPOLYGON (((5.5 0, 7 0, 7 -0.5, 6 -0.5, 5.5...
#> MULTIPOLYGON (((6.6 1, 8 1, 8 1.5, 7 1.5, 6.6 1)))
#> MULTIPOLYGON (((5.5 0, 7 0, 7 -0.5, 6 -0.5, 5.5...
st_collection_extract(j, "POINT")
#> POINT (1 0)
st_collection_extract(j, "LINESTRING")
#> LINESTRING (4 0, 3 0)

which sometimes results in a geometry set, sometimes in single geometries.

3.2.2 sfc: sets of geometries

Rather than handling geometries individually, we typically handle them as sets. Package sf provides a dedicated class for this, called sfc (for simple feature geometry list column). We can create such a list column with constructor function st_sfc:

(sfc = st_sfc(st_point(c(0,1)), st_point(c(-3,2)), crs = 4326))
#> Geometry set for 2 features 
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -3 ymin: 1 xmax: 0 ymax: 2
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#> POINT (0 1)
#> POINT (-3 2)

The default report from the print method for sfc gives

  • the number of features geometries
  • the feature geometry type (here: POINT)
  • the feature geometry dimension (here: XY)
  • the bounding box for the set
  • the coordinate reference system for the set (epsg and proj4string: see chapter 7.3)
  • the first few geometries, as (abbreviated) WKT

The class of the geometry list-column,

class(sfc)
#> [1] "sfc_POINT" "sfc"

is again a combination of a specific class, and a superclass. In addition to a class, the object has further attributes

attributes(sfc) %>% names() %>% setdiff("class")
#> [1] "precision" "bbox"      "crs"       "n_empty"

which are used to record for the whole set:

  • a precision value (section 5.4)
  • the bounding box enclosing all geometries (for x and y)
  • a coordinate reference system (section 7.3)
  • the number of empty geometries contained in the set

This means that all these properties are defined for the set, and not for geometries individually.

As we’ve seen above, sets of geometries arise when we tear apart compound geometries, as in

(p <- rbind(c(0,0), c(1,1), c(1,0), c(0,1)) %>% st_linestring() %>% st_sfc() %>% st_cast("POINT"))
#> Geometry set for 4 features 
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 0 ymin: 0 xmax: 1 ymax: 1
#> epsg (SRID):    NA
#> proj4string:    NA
#> POINT (0 0)
#> POINT (1 1)
#> POINT (1 0)
#> POINT (0 1)

Here, st_sfc creates a set of one LINESTRING, and the resulting set has size 4:

length(p)
#> [1] 4

Going the other way around, we need st_combine to combine geometries into one:

p %>% st_combine
#> Geometry set for 1 feature 
#> geometry type:  MULTIPOINT
#> dimension:      XY
#> bbox:           xmin: 0 ymin: 0 xmax: 1 ymax: 1
#> epsg (SRID):    NA
#> proj4string:    NA
#> MULTIPOINT (0 0, 1 1, 1 0, 0 1)
p %>% st_combine %>% st_cast("LINESTRING")
#> Geometry set for 1 feature 
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: 0 ymin: 0 xmax: 1 ymax: 1
#> epsg (SRID):    NA
#> proj4string:    NA
#> LINESTRING (0 0, 1 1, 1 0, 0 1)

More general, in practice we will almost always work with sets of geometries, because in spatial data we typically associate an observation with a feature, which has a geometry, and we work with sets of observations.

sfc objects are lists with each entry being an sfg object:

p[[2]]
#> POINT (1 1)

and we will use these lists as list columns in data.frame or tibble objects to represent simple features with geometries in a list column. These objects are of class sf (section 3.2.3).

3.2.2.1 Feature sets with mixed geometries

Sets of simple features also consist of features with heterogeneous geometries. In this case, the geometry type of the set is GEOMETRY:

(g = st_sfc(st_point(c(0,0)), st_linestring(rbind(c(0,0), c(1,1)))))
#> Geometry set for 2 features 
#> geometry type:  GEOMETRY
#> dimension:      XY
#> bbox:           xmin: 0 ymin: 0 xmax: 1 ymax: 1
#> epsg (SRID):    NA
#> proj4string:    NA
#> POINT (0 0)
#> LINESTRING (0 0, 1 1)

These can be filtered by using st_is

g %>% st_is("LINESTRING")
#> [1] FALSE  TRUE

or, when working with sf objects,

st_sf(g) %>% filter(st_is(., "LINESTRING"))
#> Simple feature collection with 1 feature and 0 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: 0 ymin: 0 xmax: 1 ymax: 1
#> epsg (SRID):    NA
#> proj4string:    NA
#>                       g
#> 1 LINESTRING (0 0, 1 1)

3.2.3 sf: geometries with attributes

sf objects are tibble or data.frame objects with feature geometries in a list column, and an sf class added:

sf = st_sf(sfc)
class(sf)
#> [1] "sf"         "data.frame"

Although there is nothing against simply using data.frames or tibbles with sfc list columns, a number of methods have been written for sf objects that make life even more convenient, including plot methods to create maps.

In addition to the usual data.frame attributes, sf objects have two more attributes:

sf %>% attributes() %>% names() %>% setdiff(c("row.names", "class", "names"))
#> [1] "sf_column" "agr"

They are:

  • sf_column: a length one character vector with the name of the (active) geometry list-column. Note that sf objects may contain multiple geometry list-columns, but the one named here is used for all operations, as the “active” geometry.
  • agr: attribute-geometry relationships; this encodes for each of the attributes how it relates to the geometry (in case of a non-point geometry): is it constant throughout the geometry like a soil type, is it an aggregate over the geometry like a population count, or does it identify the geometry like a state name? This is explained in more depth in section 6.1.

3.3 Tesselations: coverages, rasters

A common case in spatial data analysis is that an area is split (tesselated) in a number of non-overlapping regions. Although this can be modelled by a sequence of simple feature geometries (polygons), it is hard to guarantee for a set of simple feature polygons that they overlap nowhere, or that there are no gaps between them.

More fundamental ways of storing such polygons use a topological model, examples of this are found in geographic information systems like GRASS GIS or ArcGIS. Topological models store every boundary between polygons only once, and register which polygon is on either side of a boundary.

A simpler approach, associated with the term raster data, is to tesselate each spatial dimension \(d\) into regular cells, formed e.g. by left-closed and right-open intervals \(d_i\): \[\begin{equation} d_i = d_0 + [~ i \cdot \delta, (i+1) \cdot \delta~) \end{equation}\]

with \(d_0\) an offset, \(\delta\) the interval (cell or pixel) size, and where the cell index \(i\) is an arbitrary but consecutive set of integers. The \(\delta\) value is often taken negative for the \(y\)-axis (Northing), indicating that raster row numbers increasing Southwards correspond to \(y\)-coordinates increasing Northwards.

In arbitrary polygon tesselations, assigning points to polygons when they fall on a boundary shared by two polygons is ambiguous. Using left-closed “[” and right-open “)” intervals in regular tesselations removes this ambiguity.

Tesselating the time dimension in this way is very common, and reflects the implicit assumption underlying time series packages such as xts in R. Different models can be combined: one could use simple feature polygons to tesselate space, and combine this with a regular tesselation of time in order to cover a space-time vector datacube. Raster data and data cubes are discussed in chapter 4.

3.4 Networks

Spatial networks are typically composed of linear (LINESTRING) elements, but possess further topological properties describing the network coherence:

  • start and endpoints of a linestring may be connected to other linestring start or end points, forming a set of nodes and edges
  • edges may be directed, and allow for connections (flow, transport) in only one way.

Several R packages (osmar, stplanr) have (limited) functionality available for constructing network objects, and working with them, e.g. computing shortest or fastest routes through a network.

3.5 Geometries on the sphere

Geometries on the sphere are geometries made from geodetic coordinates. The easiest of these concern small regions near the equator not covering the date line, because in that case we can ignore all problems and still do a good job. Reality is nastier.

The concept of a bounding box, defined from the coordinate ranges breaks easily down, e.g. when crossing the date line:

pts = rbind(c(-179,0), c(179,0), c(179,1), c(-179,1), c(-179,0))
date_line = st_sfc(st_polygon(list(pts)), crs = 4326)
st_bbox(date_line) %>% st_as_sfc()
#> Geometry set for 1 feature 
#> geometry type:  POLYGON
#> dimension:      XY
#> bbox:           xmin: -179 ymin: 0 xmax: 179 ymax: 1
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#> POLYGON ((-179 0, 179 0, 179 1, -179 1, -179 0))

or when a polygon contains one of the poles:

pts = rbind(c(0,89), c(120,89), c(240,89), c(0,89))
pole = st_sfc(st_polygon(list(pts)), crs = 4326)
st_bbox(pole) %>% st_as_sfc()
#> Geometry set for 1 feature 
#> geometry type:  POLYGON
#> dimension:      XY
#> bbox:           xmin: 0 ymin: 89 xmax: 240 ymax: 89
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#> POLYGON ((0 89, 240 89, 240 89, 0 89, 0 89))

where we see that in both cases the st_bbox implied polygon does not cover the area intended.

3.5.1 Straight lines?

The simple feature model assumes that linestrings and polygons are formed of points connected by straight lines. When however representing the Earth surface, what does a straight line mean? The simple feature standard does not help much, here: it assumes Cartesian space. Technically speaking, a straight line between two points on a sphere exists, but it crosses the sphere, which is not very practical in most cases. The most common case is to use great circle segments to connect points: the shortest path that follows the surface of the sphere or ellipsoid. This means that (with longitude latitude coordinates) the line between POINT(0 50) and POINT(10 50) does not cross POINT(5 50). It also means that the line between points on opposite sides of the sphere is ambiguous. Also, the direction of a great circle segment, when defined as the angle it has with meridians, is not constant.

3.5.2 Ring direction for polygons

The simple feature standard is not conclusive about the direction of points in a ring. It points out that exterior rings should be counter clockwise, when seen from above, and interior rings (holes) clockwise, but for instance st_is_valid does not invalidate clockwise exterior rings:

st_is_valid(st_polygon(list(rbind(c(0,0), c(0,1), c(1,1), c(0,0)))))
#> [1] TRUE

This may have several reasons: a lot of data may come with wrong ring directions, and the distinction between exterior and interior rings is already unambiguous by their order: the first is exterior, anything following is interior.

On the sphere, any polygon divides the sphere surface in two finite areas, meaning there is no longer an unambiguous “inside” vs. “outside”: does the polygon with longitude latitude coordinates POLYGON((0 0, 120 0, 240 0, 0 0)) denote the northern or the southern hemisphere? One can still go two directions here:

  • assume that in practice polygons never divide the Earth in two equal halves, and take the smaller area as the “inside”
  • decide strongly about ring direction, e.g. counter-clockwise (following the ring, standing on the Earth, the left-side of the ring denotes the polygon interior)

Package sf comes with a large number of functions that work both for projected (Cartesian) data as for data defined in spherical coordinates. Whenever it makes assumptions of Cartesian coordinates for spherical coordinates it emits a warning. This is discussed further in section 5.6.

References

Herring, John R. 2011. “OpenGIS Implementation Standard for Geographic Information-Simple Feature Access-Part 1: Common Architecture.” Open Geospatial Consortium Inc, 111. http://portal.opengeospatial.org/files/?artifact_id=25355.

Herring, John R. 2010. “OpenGIS Implementation Standard for Geographic Information-Simple Feature Access-Part 2: SQL Option.” Open Geospatial Consortium Inc. http://portal.opengeospatial.org/files/?artifact_id=25354.

ISO. 2004. Geographic Information – Simple Feature Access – Part 1: Common Architecture. https://www.iso.org/standard/40114.html.