R data structures

This chapter provides some minimal set of R basics that may make it easier to read this book. A more comprehensive book on R basics is given in (Wickham 2014a), chapter 2.

As pointed out by (Chambers 2016), everything that exists in R is an object. This includes objects that make things happen, such as language objects or functions, but also the more basic “things”, such as data objects.

21.4 Homogeneous vectors

Data objects contain data, and possibly metadata. Data is always in the form of a vector, which can have different type. We can find the type by typeof, and vector length by length. Vectors are created by c, which combines individual elements:

typeof(1:10)
#> [1] "integer"
length(1:10)
#> [1] 10
typeof(1.0)
#> [1] "double"
length(1.0)
#> [1] 1
typeof(c("foo", "bar"))
#> [1] "character"
length(c("foo", "bar"))
#> [1] 2
typeof(c(TRUE, FALSE))
#> [1] "logical"

Vectors of this kind can only have a single type.

Note that vectors can have length zero, e.g. in,

i = integer(0)
typeof(i)
#> [1] "integer"
i
#> integer(0)
length(i)
#> [1] 0

We can retrieve (or in assignments: replace) elements in a vector using [ or [[:

a = c(1,2,3)
a[2]
#> [1] 2
a[[2]]
#> [1] 2
a[2:3]
#> [1] 2 3
a[2:3] = c(5,6)
a
#> [1] 1 5 6
a[[3]] = 10
a
#> [1]  1  5 10

where the difference is that [ can operate on an index range (or multiple indexes), and [[ operates on a single vector value.

21.5 Heterogeneous vectors: list

An additional vector type is the list, which can combine any types in its elements:

l <- list(3, TRUE, "foo")
typeof(l)
#> [1] "list"
length(l)
#> [1] 3

For lists, there is a further distinction between [ and [[: the single [ returns always a list, and [[ returns the contents of a list element:

l[1]
#> [[1]]
#> [1] 3
l[[1]]
#> [1] 3

For replacement, one case use [ when providing a list, and [[ when providing a new value:

l[1:2] = list(4, FALSE)
l
#> [[1]]
#> [1] 4
#> 
#> [[2]]
#> [1] FALSE
#> 
#> [[3]]
#> [1] "foo"
l[[3]] = "bar"
l
#> [[1]]
#> [1] 4
#> 
#> [[2]]
#> [1] FALSE
#> 
#> [[3]]
#> [1] "bar"

In case list elements are named, as in

l = list(first = 3, second = TRUE, third = "foo")
l
#> $first
#> [1] 3
#> 
#> $second
#> [1] TRUE
#> 
#> $third
#> [1] "foo"

we can use names as in l[["second"]] and this can be abbreviated to

l$second
#> [1] TRUE
l$second = FALSE
l
#> $first
#> [1] 3
#> 
#> $second
#> [1] FALSE
#> 
#> $third
#> [1] "foo"

This is convenient, but also requires name look-up in the names attribute (see below).

21.5.1 NULL and removing list elements

NULL is the null value in R; it is special in the sense that it doesn’t work in simple comparisons:

3 == NULL # not FALSE!
#> logical(0)
NULL == NULL # not even TRUE!
#> logical(0)

but has to be treated specially, using is.null:

is.null(NULL)
#> [1] TRUE

When we want to remove one or more list elements, we can do so by creating a new list that does not contain the elements that needed removal, as in

l = l[c(1,3)] # remove second, implicitly
l
#> $first
#> [1] 3
#> 
#> $third
#> [1] "foo"

but we can also assign NULL to the element we want to eliminate:

l$second = NULL
l
#> $first
#> [1] 3
#> 
#> $third
#> [1] "foo"

21.6 Attributes

We can glue arbitrary metadata objects to data objects, as in

a = 1:3
attr(a, "some_meta_data") = "foo"
a
#> [1] 1 2 3
#> attr(,"some_meta_data")
#> [1] "foo"

and this can be retrieved, or replaced by

attr(a, "some_meta_data")
#> [1] "foo"
attr(a, "some_meta_data") = "bar"
attr(a, "some_meta_data")
#> [1] "bar"

In essence, the attribute of an object is a named list, and we can get or set the complete list by

attributes(a)
#> $some_meta_data
#> [1] "bar"
attributes(a) = list(some_meta_data = "foo")
attributes(a)
#> $some_meta_data
#> [1] "foo"

A number of attributes are treated specially by R, see e.g. ?attributes.

21.6.1 object class and class attribute

Every object in R “has a class”, meaning that class(obj) returns a character vector with the class of obj. Some objects have an implicit class, e.g. vectors

class(1:3)
#> [1] "integer"
class(c(TRUE, FALSE))
#> [1] "logical"
class(c("TRUE", "FALSE"))
#> [1] "character"

but we can also set the class explicit, either by using attr or by using class in the left-hand side of an expression:

a = 1:3
class(a) = "foo"
a
#> [1] 1 2 3
#> attr(,"class")
#> [1] "foo"
class(a)
#> [1] "foo"
attributes(a)
#> $class
#> [1] "foo"

in which case the newly set class overrides the earlier implicit class. This way, we can add methods for class foo, e.g. by

print.foo = function(x, ...) print(paste("an object of class foo with length", length(x)))
print(a)
#> [1] "an object of class foo with length 3"

Providing such methods are generally intended to create more usable software, but at the same time they may make the objects more opaque. It is sometimes useful to see what an object “is made of” by printing it after the class attribute is removed, as in

unclass(a)
#> [1] 1 2 3

As a more elaborate example, consider the case where a polygon is made using package sf:

library(sf)
p = st_polygon(list(rbind(c(0,0), c(1,0), c(1,1), c(0,0))))
p
#> POLYGON ((0 0, 1 0, 1 1, 0 0))

which prints the well-known-text form; to understand what the data structure is like, we can use

unclass(p)
#> [[1]]
#>      [,1] [,2]
#> [1,]    0    0
#> [2,]    1    0
#> [3,]    1    1
#> [4,]    0    0

21.6.2 the dim attribute

The dim attribute sets the matrix or array dimensions:

a = 1:8
class(a)
#> [1] "integer"
attr(a, "dim") = c(2,4) # or: dim(a) = c(2,4)
class(a)
#> [1] "matrix"
a
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    3    5    7
#> [2,]    2    4    6    8
attr(a, "dim") = c(2,2,2) # or: dim(a) = c(2,2,2)
class(a)
#> [1] "array"
a
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]    5    7
#> [2,]    6    8

21.7 various names attributes

Named vectors carry their names in a names attribute. We saw examples for lists above, an example for a numeric vector is:

a = c(first = 3, second = 4, last = 5)
a["second"]
#> second 
#>      4
attributes(a)
#> $names
#> [1] "first"  "second" "last"

More name attributes are e.g. dimnames of matrices or arrays:

a = matrix(1:4, 2, 2)
dimnames(a) = list(rows = c("row1", "row2"), cols = c("col1", "col2"))
a
#>       cols
#> rows   col1 col2
#>   row1    1    3
#>   row2    2    4
attributes(a)
#> $dim
#> [1] 2 2
#> 
#> $dimnames
#> $dimnames$rows
#> [1] "row1" "row2"
#> 
#> $dimnames$cols
#> [1] "col1" "col2"

Data.frame objects have rows and columns, and each have names:

df = data.frame(a = 1:3, b = c(TRUE, FALSE, TRUE))
attributes(df)
#> $names
#> [1] "a" "b"
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#> [1] 1 2 3

21.8 using structure

When programming, the pattern of adding or modifying attributes before returning an object is extremely common, an example being:

f = function(x) {
   a = create_obj(x) # call some other function
   attributes(a) = list(class = "foo", meta = 33)
   a
}

The last two statements can be contracted in

f = function(x) {
   a = create_obj(x) # call some other function
   structure(a, class = "foo", meta = 33)
}

where function structure adds, replaces, or (in case of value NULL) removes attributes from the object in its first argument.

References

Wickham, Hadley. 2014a. Advanced R. CRC Press. http://adv-r.had.co.nz/.

Chambers, John. 2016. Extending R. CRC Press.