Chapter 7 Reference Systems

Data are not just numbers, they are numbers with a context”; “In data analysis, context provides meaning(Cobb and Moore 1997)

7.1 Units of measurement

7.1.1 Quantities

The VIM (International vocabulary of metrology, BIPM et al. (2012)) defines a quantity as a “property of a phenomenon, body, or substance, where the property has a magnitude that can be expressed as a number and a reference”, where “[a] reference can be a measurement unit, a measurement procedure, a reference material, or a combination of such.”

One could argue whether all data is constitued of quantities, but there is no need to argue that proper data handling requires that numbers are accompanied by information on what the numbers mean, and what they are about.

A measurement system consist of base units for base quantities, and derived units for derived quantities. The SI system of units (Bureau International des Poids et Mesures 2006) consist of the seven base units length (metre, m), mass (kilogram, kg), time (second, s), electric current (ampere, A), thermodynamic temperature (kelvin, K), amount of substance (mole, mol), and luminous intensity (candela, cd). Derived units are composed of products of integer powers of base units; exampes are speed (\(\mbox{m}~\mbox{s}^{-1}\)) and density (\(\mbox{kg}~\mbox{m}^{-3}\)).

Many data variables have units that are not expressed as SI base units or derived units. Hand (2004) discusses many such measurement scales, e.g. those used to measure intelligence in social sciences, in the context of measurement units.

7.1.2 Unitless measures

The special case of unitless units can refer to either cases where units cancelled out (e.g. mass fraction: kg/kg, or angle measured in rad: m/m) or to cases where objects or events were counted (e.g. 5 apples). Adding an angle to a count of apples would not make sense; adding 5 apples to 3 oranges may make sense if the result is reinterpreted, e.g. as pieces of fruit. Flater (2018) discusses systems for proper handling of unitless quantities; handling counts could for instance link to domain-specific ontologies pointing out which things were counted, and perhaps identifying super-classes, like fruit.

7.1.3 Units in R

The units R package (Pebesma, Mailund, and Hiebert 2016b) provides units of measurement support for R, by interfacing the udunits2 units database and C library. It allows for setting, converting and deriving units:

library(units)
(a = set_units(1:3, m))
#> Units: [m]
#> [1] 1 2 3
a_km = set_units(1:3, km)
a + a_km
#> Units: [m]
#> [1] 1001 2002 3003
b = set_units(4:6, s)
a / b
#> Units: [m/s]
#> [1] 0.25 0.40 0.50

and raises errors in case of meaningless operations

a + b
#> Error: cannot convert s into m

7.1.4 Datum

For many quantities, the natural origin of values is zero. This works for amounts, and differences between amounts results in meaningful negative values. For locations and times, differences have a natural zero interpretation: length and duration. Absolute location (position) and time need a fixed origin, from which we can meaningfully measure other absolute space-time points: a datum. For space, a datum involves more than one dimension. The combination of a datum and a measurement unit (scale) is a a reference system. The next two sections will deal with temporal and spatial reference systems, and how they are handled in R.

7.2 Temporal Reference Systems

R has two native classes for time-related data: POSIXt and Date, which are used for specifying dates, and times.

7.2.1 Date

Date objects are numeric vectors of class Date, which contain the number of days since (or in case negative: before) Jan 1, 1970:

(d = as.Date("1970-02-01"))
#> [1] "1970-02-01"
as.numeric(d)
#> [1] 31

We see that the print method as well as the as.Date method use ISO 8601 (ISO8601 2000) character strings, which is standard used to read and write dates. We can also modify this to local conventions by specifying a format:

(d = as.Date("01.02.1970", format = "%d.%m.%Y"))
#> [1] "1970-02-01"
format(d, format = "%d.%m.%Y")
#> [1] "01.02.1970"

The default for format may depend on the locale settings of the computer used. The help page of ?as.Date contains further discussion of date systems, and calendars used.

7.2.2 POSIXt

POSIXt is an R native class for specifying times. It has two subclasses: POSIXct represents time as a numeric, representing decimal seconds since 1970-01-01 00:00 UTC, and POSIXlt contains the same information as a list with all time components (second, minute, hour, day of month, month, year) in list components:

t = as.POSIXct("1970-01-01 01:00:00", tz = "UTC")
as.numeric(t)
#> [1] 3600
names(unclass(as.POSIXlt(t)))
#> [1] "sec"   "min"   "hour"  "mday"  "mon"   "year"  "wday"  "yday"  "isdst"
unclass(as.POSIXlt(t))$hour
#> [1] 1

7.2.3 Time zones

Time zones can be seen as local modifiers of time: where time as numeric value is stored with respect to UTC (universal coordinated time), a local time zone is used to format it, and a time zone can be specified for creation, and formatting:

(t = as.POSIXct("1970-01-01 00:00:00", tz = "PST"))
#> [1] "1970-01-01 PST"

this adds a time zone modifier to t that redefines the time origin, as

as.numeric(t)
#> [1] 0

To convert POSIXt to Date we can use as.Date; this converts to the local date. Converting back to POSIXct looses the time of day and time zone information.

(t = as.POSIXct("1970-01-01 23:00:00", tz = "PST"))
#> [1] "1970-01-01 23:00:00 PST"
as.Date(t)
#> [1] "1970-01-01"
format(as.POSIXct(as.Date(t)), tz = "UTC")
#> [1] "1970-01-01"

Working with local time zones is sometimes confusing when the data we work with were not referenced to this time zone. It may help to set

Sys.setenv(TZ="UTC")

at the start of an R script. The effect is, at lease on some platforms, that R thinks it is working in a UTC time zone. This way, the scripts will produce identical outputs, no matter in which time zone it is run.

7.3 Coordinate Reference Systems

We follow Lott (2015) when defining the following concepts (italics indicate literal quoting):

  • a coordinate system is a set of mathematical rules for specifying how coordinates are to be assigned to points
  • a datum is a parameter or set of parameters that define the position of the origin, the scale, and the orientation of a coordinate system,
  • a geodetic datum is a datum describing the relationship of a two- or three-dimensional coordinate system to the Earth, and
  • a coordinate reference system is a coordinate system that is related to an object by a datum; for geodetic and vertical datums, the object will be the Earth.

A readable text that further explains these concepts is Iliffe and Lott (2008).

Essentially it boils down to the Earth not following a regular shape. The topography of the Earth is of course known to vary strongly, but also the surface formed by constant gravity at mean sea level, the geoid, is irregular. A commonly used model that is fit to the geoid is an ellipsoid of revolution, which is an ellipse with two identical minor axes. This model can be fit locally to be highly precise, can be fixed for particular tectonic plates (like ETRS89), or can be globally fit (like WGS84).

The definitions above state that coordinates in degrees longitude and latitude can only have a meaning, i.e. can only be understood as Earth coordinates when the datum they relate to is given.

Recomputing coordinates in a new datum is called coordinate transformation.

7.3.1 Projections

When we look at geographical data on a paper map or a screen, or on any flat device, we see the values projected – they are no longer arranged on an a sphere or ellipsoid. Even if we plot degrees longitude/latitude data on a flat x/y plane with unit aspect ratio, we use a projection, called plate carrée.

Note that even for projected data, the data that were projected are associated with a reference ellipsoid (datum). Going from one projection to the other without changing datum is called coordinate conversion, and usually passes through the geodetic coordinates of the datum involved; up to numerical errors this process is lossless and invertible.

7.3.2 Describing Coordinate Reference Systems

Lott (2015) describes a standard for encoding coordinate reference system using well known text; the standard is referred to as WKT2. GDAL and PROJ support this encoding (FIXME: verify this is true by the time this book goes into print).

Traditionally, PROJ used a string representation to encode coordinate reference systems (datums) and projections, called the proj4string. Some of these come from a catalogue (originally) compiled by the European Petroleum Survey Group (now: International Association of Oil & Gas Producers), and have a so-called epsg code.

Package sf provides a crs class which is initialised either by an epsg code, like

st_crs(4326)
#> Coordinate Reference System:
#>   EPSG: 4326 
#>   proj4string: "+proj=longlat +datum=WGS84 +no_defs"

or by a PROJ string, like

st_crs("+proj=longlat")
#> Coordinate Reference System:
#>   EPSG: 4326 
#>   proj4string: "+proj=longlat +ellps=WGS84 +no_defs"

and can be missing valued

st_crs()
#> Coordinate Reference System: NA

A number of methods are available for crs objects:

methods(class = "crs")
#> [1] $          is.na      Ops        print      st_as_text st_crs    
#> see '?methods' for accessing help and source code

The Ops methods are convenience functions, e.g.

st_crs(4326) == st_crs("+proj=longlat")
#> [1] FALSE

but there not all cases semantically identical crs objects will yield equality in this test.

st_as_text converts a crs object into WKT, we print it using cat:

cat(st_as_text(st_crs(4326), pretty = TRUE))
#> GEOGCS["WGS 84",
#>     DATUM["WGS_1984",
#>         SPHEROID["WGS 84",6378137,298.257223563,
#>             AUTHORITY["EPSG","7030"]],
#>         AUTHORITY["EPSG","6326"]],
#>     PRIMEM["Greenwich",0,
#>         AUTHORITY["EPSG","8901"]],
#>     UNIT["degree",0.0174532925199433,
#>         AUTHORITY["EPSG","9122"]],
#>     AUTHORITY["EPSG","4326"]]

It should be noted that at the time of writing this, a new draft standard for WKT (informally called WKT2, Lott (2015)) is rapidly being implemented in GDAL and PROJ, and can be expected in R once these changes appear in released versions.

st_crs is also a generic, with methods for

methods(st_crs)
#>  [1] st_crs.bbox*      st_crs.character* st_crs.crs*      
#>  [4] st_crs.default*   st_crs.numeric*   st_crs.Raster*   
#>  [7] st_crs.sf*        st_crs.sfc*       st_crs.Spatial*  
#> [10] st_crs.stars*    
#> see '?methods' for accessing help and source code

The method for sfc objects can drill further into the underlying data, by adding an argument; a few of these are printed by

st_crs(st_sfc(crs = 4326), parameters = TRUE)[c(1:4, 8)]
#> $SemiMajor
#> 6378137 [m]
#> 
#> $SemiMinor
#> 6356752 [m]
#> 
#> $InvFlattening
#> [1] 298
#> 
#> $units_gdal
#> [1] "degree"
#> 
#> $ud_unit
#> 1 [°]

where we see that udunits and GDAL units are integrated. The major axis lengths (SemiMajor) and inverse flattening are used e.g. to compute great circle distances on the ellipsoid, using the algorithm from Karney (2013). This algorithm is implemented in PROJ, and interfaced by lwgeom::st_geod_distance, which is called from sf::st_distance when objects have geodetic coordinates.

References

Cobb, George W., and David S. Moore. 1997. “Mathematics, Statistics and Teaching.” The American Mathematical Monthly 104 (9): 801–23. http://www.jstor.org/stable/2975286.

BIPM, IEC, ILAC IFCC, IUPAP IUPAC, and OIML ISO. 2012. “The International Vocabulary of Metrology–Basic and General Concepts and Associated Terms (Vim), 3rd Edn. Jcgm 200: 2012.” JCGM (Joint Committee for Guides in Metrology). https://www.bipm.org/en/publications/guides/.

Bureau International des Poids et Mesures. 2006. The International System of Units (SI), 8th Edition. Organisation Intergouvernementale de la Convention du Mètre. https://www.bipm.org/en/publications/si-brochure/download.html.

Hand, David J. 2004. Measurement: Theory and Practice. A Hodder Arnold Publication.

Flater, David. 2018. “Architecture for Software-Assisted Quantity Calculus.” Computer Standards & Interfaces 56: 144–47. doi:10.1016/j.csi.2017.10.002.

Pebesma, Edzer, Thomas Mailund, and James Hiebert. 2016b. “Measurement Units in R.” The R Journal 8 (2): 486–94. doi:10.32614/RJ-2016-061.

ISO8601, ISO. 2000. “Data Elements and Interchange Formats–Information Interchange–Representation of Dates and Times.” ISO/Tc154.

Lott, Roger. 2015. “Geographic Information-Well-Known Text Representation of Coordinate Reference Systems.” Open Geospatial Consortium. http://docs.opengeospatial.org/is/12-063r5/12-063r5.html.

Iliffe, Jonathan, and Roger Lott. 2008. Datums and Map Projections for Remote Sensing, Gis, and Surveying. Whittles Pub. CRC Press, Scotland, UK.

Karney, Charles FF. 2013. “Algorithms for Geodesics.” Journal of Geodesy 87 (1). Springer: 43–55. https://link.springer.com/content/pdf/10.1007/s00190-012-0578-z.pdf.