A Quick Introduction to iNEXT.3D via
Examples
Anne Chao, K.H. Hu,
2023-09-06
## install iNEXT.3D package from CRAN
# install.packages("iNEXT.3D") # coming soon
## install the latest version from github
install.packages('devtools')
library(devtools)
install_github('AnneChao/iNEXT.3D')
## import packages
library(iNEXT.3D)
library(ggplot2)
In this document, here provide a quick introduction demonstrating how
to run the package iNEXT.3D
(iNterpolation and
EXTrapolation in three Dimensions). iNEXT.3D
has several
main functions: iNEXT3D
, ggiNEXT3D
,
AO3D
, ggAO3D
, estimate3D
, and
DataInfo3D.
iNEXT.3D
is the extension of R package iNEXT
(Hsieh et al., 2016). iNEXT.3D
focuses on three measures of
Hill numbers of order q: species richness (q = 0
), Shannon
diversity (q = 1
, the exponential of Shannon entropy) and
Simpson diversity (q = 2
, the inverse of Simpson
concentration) and extend Hill number to three dimensions: taxonomic
diversity (TD), phylogenetic diversity (PD), and functional diversity
(FD) under Hill-Chao family frame work (Chao et al., 2019). Besides,
iNEXT.3D
also provide statistic estimation for three
dimensions biodiversity (Chao et al., 2021). For each diversity measure,
iNEXT.3D
uses the observed sample of abundance or incidence
data (called the “reference sample”) to compute diversity estimates and
the associated confidence intervals for the following two types of
rarefaction and extrapolation (R/E):
- Sample-size-based R/E sampling curves:
iNEXT3D
computes
diversity estimates for rarefied and extrapolated samples up to an
appropriate size. This type of sampling curve plots the diversity
estimates with respect to sample size.
- Coverage-based R/E sampling curves:
iNEXT3D
computes
diversity estimates for rarefied and extrapolated samples with sample
completeness (as measured by sample coverage) up to an appropriate
coverage. This type of sampling curve plots the diversity estimates with
respect to sample coverage.
iNEXT.3D
also plots the above two types of sampling
curves and a sample completeness curve by ggiNEXT3D
. The
sample completeness curve provides a bridge between these two types of
curves.
How to cite
If you publish your work based on results from iNEXT.3D package, you
should make references to the following Online reference:
- Chao, A., Henderson, P. A., Chiu, C.-H., Moyes, F., Hu, K.-H.,
Dornelas, M and. Magurran, A. E. (2021). Measuring temporal change in
alpha diversity: a framework integrating taxonomic, phylogenetic and
functional diversity and the iNEXT.3D standardization. Methods in
Ecology and Evolution, 12, 1926-1940.
SOFTWARE NEEDED TO RUN iNEXT.3D IN R
HOW TO RUN iNEXT.3D:
The iNEXT.3D
package can be downloaded from Anne Chao’s
iNEXT.3D_github using
the following commands. For a first-time installation, an additional
visualization extension package (ggplot2
) must be installed
and loaded.
## install iNEXT.3D package from CRAN
# install.packages("iNEXT.3D") # coming soon
## install the latest version from github
install.packages('devtools')
library(devtools)
install_github('AnneChao/iNEXT.3D')
## import packages
library(iNEXT.3D)
library(ggplot2)
An online version of iNEXT.3D(https://chao.shinyapps.io/iNEXT_3D/) is also available
for users without R background.
MAIN FUNCTION: iNEXT3D()
First describe the main function iNEXT3D()
with default
arguments:
iNEXT3D(data, diversity = 'TD', q = c(0,1,2), datatype = "abundance",
size = NULL, endpoint = NULL, knots = 40, nboot = 50, conf = 0.95, nT = NULL,
PDtree = NULL, PDreftime = NULL, PDtype = 'meanPD',
FDdistM, FDtype = 'AUC', FDtau = NULL)
The arguments of this function are briefly described below, and will be
explained in more details by illustrative examples in later text. This
main function computes diversity estimates of order q, the sample
coverage estimates and related statistics for K (if
knots = K
) evenly-spaced knots (sample sizes) between size
1 and the
endpoint
, where the endpoint is described below.
Each knot represents a particular sample size for which diversity
estimates will be calculated. By default,
endpoint
= double
the reference sample size for abundance data or double the total
sampling units for incidence data. For example, if
endpoint = 10
,
knot = 4
, diversity estimates
will be computed for a sequence of samples with sizes (1, 4, 7,
10).
data
|
- For
datatype = “abundance” , data can be input as a
vector of species abundances (for a single assemblage),
matrix/data.frame (species by assemblages), or a list of species
abundance vectors.
- For
datatype = “incidence_freq” , data can be input as a
vector of incidence frequencies (for a single assemblage),
matrix/data.frame (species by assemblages), or a list of incidence
frequencies; the first entry in all types of input must be the number of
sampling units in each assemblage.
- For
datatype = “incidence_raw” , data can be input as a
list of matrix/data.frame (species by sampling units); data can also be
input as a matrix/data.frame by merging all sampling units across
assemblages based on species identity; in this case, the number of
sampling units (nT , see below) must be input.
|
diversity
|
selection of diversity type: ‘TD’ = Taxonomic diversity,
‘PD’ = Phylogenetic diversity, and ‘FD’ =
Functional diversity.
|
q
|
a numerical vector specifying the diversity orders. Default is
c(0, 1, 2) .
|
datatype
|
data type of input data: individual-based abundance data (datatype
= “abundance” ), sampling-unit-based incidence frequencies data
(datatype = “incidence_freq” ), or species by sampling-units
incidence matrix (datatype = “incidence_raw” ) with all
entries being 0 (non-detection) or 1 (detection).
|
size
|
an integer vector of sample sizes (number of individuals or sampling
units) for which diversity estimates will be computed. If
NULL , then diversity estimates will be computed for those
sample sizes determined by the specified/default endpoint
and knots .
|
endpoint
|
an integer specifying the sample size that is the endpoint
for rarefaction/extrapolation. If NULL, then endpoint
= double reference sample size.
|
knots
|
an integer specifying the number of equally-spaced knots
(say K, default is 40) between size 1 and the endpoint ;each
knot represents a particular sample size for which diversity estimate
will be calculated. If the endpoint is smaller than the
reference sample size, then iNEXT3D() computes only the
rarefaction esimates for approximately K evenly spaced
knots . If the endpoint is larger than the
reference sample size, then iNEXT3D() computes rarefaction
estimates for approximately K/2 evenly spaced knots between
sample size 1 and the reference sample size, and computes extrapolation
estimates for approximately K/2 evenly spaced knots between
the reference sample size and the endpoint .
|
nboot
|
a positive integer specifying the number of bootstrap replications when
assessing sampling uncertainty and constructing confidence intervals.
Enter 0 to skip the bootstrap procedures. Default is 50.
|
conf
|
a positive number < 1 specifying the level of confidence interval.
Default is 0.95.
|
nT
|
(required only when datatype = “incidence_raw” and input
data is matrix/data.frame) a vector of nonnegative integers specifying
the number of sampling units in each assemblage. If assemblage names are
not specified, then assemblages are automatically named as
“assemblage1”, “assemblage2”,…, etc.
|
PDtree
|
(required only when diversity = “PD” ), a phylogenetic tree
in Newick format for all observed species in the pooled assemblage.
|
PDreftime
|
(required only when diversity = “PD” ), a vector of
numerical values specifying reference times for PD. Default is
NULL (i.e., the age of the root of PDtree).
|
PDtype
|
(required only when diversity = “PD” ), select PD type:
PDtype = “PD” (effective total branch length) or
PDtype = “meanPD” (effective number of equally divergent
lineages). Default is “meanPD” , where meanPD =
PD/tree depth .
|
FDdistM
|
(required only when diversity = “FD” ), a species pairwise
distance matrix for all species in the pooled assemblage.
|
FDtype
|
(required only when diversity = “FD” ), select FD type:
FDtype = “tau_values” for FD under specified threshold
values, or FDtype = “AUC” (area under the curve of
tau-profile) for an overall FD which integrates all threshold values
between zero and one. Default is “AUC” .
|
FDtau
|
(required only when diversity = “FD” and FDtype =
“tau_values” ), a numerical vector between 0 and 1 specifying tau
values (threshold levels). If NULL (default), then
threshold is set to be the mean distance between any two individuals
randomly selected from the pooled assemblage (i.e., quadratic entropy).
|
GRAPHIC DISPLAYS: FUNCTION ggiNEXT3D()
The function ggiNEXT3D()
, which extends
ggplot2
with default arguments, is described as
follows:
ggiNEXT3D(outcome, type = 1:3, facet.var = "Assemblage", color.var = "Order.q")
Here outcome
is the object of iNEXT3D()
’s
output. Three types of curves are allowed for different diversity
dimensions:
Sample-size-based R/E curve (type = 1
): This curve
plots diversity estimates with confidence intervals as a function of
sample size.
Sample completeness curve (type = 2
): This curve
plots the sample coverage with respect to sample size.
Coverage-based R/E curve (type = 3
): This curve
plots the diversity estimates with confidence intervals as a function of
sample coverage.
The argument facet.var = "Order.q"
or
facet.var = "Assemblage"
is used to create a separate plot
for each value of the specified variable. For example, the following
code displays a separate plot of the diversity order q. The
ggiNEXT3D()
function is a wrapper with package
ggplot2
to create a R/E curve in a single line of code. The
figure object is of class "ggplot"
, so can be manipulated
by using the ggplot2
tools.
When facet.var = "Assemblage"
in ggiNEXT3D
function, it creates a separate plot for each assemblage and the
different color lines represent each diversity order. Sample-size-based
R/E curve (type = 1
) as below:
# Sample-size-based R/E curves, separating by "assemblage""
ggiNEXT3D(out.TD, type = 1, facet.var = "Assemblage")

When facet.var = "Order.q"
in ggiNEXT3D
function, it creates a separate plot for each diversity order and the
different color lines represent each assemblage. Sample-size-based R/E
curve (type = 1
) as below:
# Sample-size-based R/E curves, separating by "Order.q"
ggiNEXT3D(out.TD, type = 1, facet.var = "Order.q")

The following command return the sample completeness (sample
coverage) curve (type = 2
) in which different colors are
used for the three assemblages.
ggiNEXT3D(out.TD, type = 2, facet.var = "Order.q", color.var = "Assemblage")

The following commands return the coverage-based R/E sampling curves
in which different colors are used for the three assemblages
(facet.var = "Assemblage"
) and for three diversity orders
(facet.var = "Order.q"
).
ggiNEXT3D(out.TD, type = 3, facet.var = "Assemblage")

ggiNEXT3D(out.TD, type = 3, facet.var = "Order.q")

EXAMPLE for INCIDENCE-RAW DATA
Incidence raw data is allowed for three diversity dimensions. For
illustration, use the Hinkley’s fish data (the dataset is included in
the package) at three time periods (1981-1985, 1987-1991 and 2015-2019).
This data set (fish
) includes a list with three matrices;
each matrix is a species-by-sampling-unit data. Here only use taxonomic
diversity (TD) as demonstration below.
data("fish")
head(fish$data$`1981-1985`)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
Agonus_cataphractus 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 1 1 0
Alosa_fallax 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 1 0 1 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 0 1
Ammodytes_marinus 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Ammodytes_tobianus 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Anguilla_anguilla 1 1 1 1 0 0 0 1 0 1 0 1 0 1 1 0 0 0 1 0 0 1 0 1 1 1 1 0 0 0 1 0 1 1 0 0 1 1 0 1 0 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 0 0 0 0
Aphia_minuta 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0
# fish.tree = fish$tree ## for PD
# fish.dis = fish$dist ## for FD
out.raw <- iNEXT3D(data = fish$data, diversity = "TD",
q = c(0, 1, 2), datatype = "incidence_raw", nboot = 30)
ggiNEXT3D(out.raw, type = 1)

ggiNEXT3D(out.raw, type = 2)

ggiNEXT3D(out.raw, type = 3)

EXAMPLE for INCIDENCE-FREQUENCY DATA
We transform Hinkley’s fish data (incidence-raw data) to
incidence-frequency data (incidence_freq
). The first row
must be the total number of sampling units, followed by the species
incidence frequencies in each assemblage. Here only use taxonomic
diversity (TD) for demonstration below.
fish.freq = cbind(c( ncol(fish$data$`1981-1985`), rowSums(fish$data$`1981-1985`) ),
c( ncol(fish$data$`1987-1991`), rowSums(fish$data$`1987-1991`) ),
c( ncol(fish$data$`2015-2019`), rowSums(fish$data$`2015-2019`) ))
rownames(fish.freq)[1] = "sample units"
colnames(fish.freq) = names(fish$data)
head(fish.freq)
1981-1985 1987-1991 2015-2019
sample units 60 60 60
Agonus_cataphractus 14 19 21
Alosa_fallax 29 22 18
Ammodytes_marinus 0 0 0
Ammodytes_tobianus 4 2 8
Anguilla_anguilla 30 29 8
Note that incidence-frequency data
(datatype = "incidence_freq
) is allowed only for diversity
class: "TD"
, "FD"
. If "PD"
required, use incidence-raw data instead. The following commands return
three R/E sampling curves for fish data. The argument
color.var = "Order.q"
is used to display curves in
different colors for diversity order.
out.incfreq <- iNEXT3D(data = fish.freq, diversity = "TD",
q = c(0, 1, 2), datatype = "incidence_freq",nboot = 30)
# Sample-size-based R/E curves
ggiNEXT3D(out.incfreq, type = 1, color.var = "Order.q")

# Sample completeness curves
ggiNEXT3D(out.incfreq, type = 2)

# Coverage-based R/E curves
ggiNEXT3D(out.incfreq, type = 3, color.var = "Order.q")

POINT ESTIMATION FUNCTION: estimate3D()
estimate3D(data, diversity = "TD", q = c(0, 1, 2), datatype = "abundance",
base = "coverage", level = NULL, nboot = 50, conf = 0.95,
nT = NULL, PDtree, PDreftime = NULL, PDtype = "meanPD",
FDdistM, FDtype = "AUC", FDtau = NULL)
estimate3D
is used to compute three diversity dimensions
(TD, PD, FD) estimates with q = 0, 1, 2 under any specified level of
sample size (when base = "size"
) or sample coverage (when
base = "coverage"
) for either abundance data
(datatype = "abundance"
) or incidence data
(datatype = "incidence_freq"
or
"incidence_raw"
). If level = NULL
, this
function computes the diversity estimates for the minimum sample size
among all samples extrapolated to double reference sizes (when
base = "size"
) or the minimum sample coverage among all
samples extrapolated to double reference sizes (when
base = "coverage"
).
For example, the following command returns the taxonomic diversity
(‘TD’) with a specified level of sample coverage = 99.5% for the dunes
data. For some assemblages, this coverage value corresponds to the
rarefaction part whereas the others correspond to extrapolation.
estimate3D(dunes$data, diversity = 'TD', q = c(0,1,2), datatype = "abundance",
base = "coverage", level = 0.995)
Assemblage m Method Order.q SC qD s.e. qD.LCL qD.UCL
1 EM 637.4836 Extrapolation 0 0.995 18.692936 3.7151524 11.411371 25.974501
2 EM 637.4836 Extrapolation 1 0.995 9.400018 0.4263251 8.564436 10.235600
3 EM 637.4836 Extrapolation 2 0.995 7.268513 0.3410917 6.599986 7.937041
4 MO 732.4154 Rarefaction 0 0.995 37.191467 1.3095147 34.624866 39.758069
5 MO 732.4154 Rarefaction 1 0.995 19.428728 0.4145147 18.616294 20.241162
6 MO 732.4154 Rarefaction 2 0.995 13.855022 0.4035675 13.064045 14.646000
7 TR 853.3564 Rarefaction 0 0.995 41.115774 2.6535592 35.914893 46.316654
8 TR 853.3564 Rarefaction 1 0.995 23.815218 0.5957004 22.647667 24.982769
9 TR 853.3564 Rarefaction 2 0.995 18.531144 0.5996717 17.355809 19.706479
EMPIRICAL AND ASYMPTOTIC DIVERSITY FUNCTION: AO3D
AO3D(
data, diversity = "TD", q = seq(0, 2, 0.2), datatype = "abundance",
nboot = 50, conf = 0.95, nT = NULL, method = c("Asymptotic", "Observed"),
PDtree, PDreftime = NULL, PDtype = "meanPD",
FDdistM, FDtype = "AUC", FDtau = NULL
)
The function AO3D()
compute three diversity dimensions
(TD, PD, FD) for empirical (observed) diversity and estimated asymptotic
diversity with any diversity order. For example, the following commands
returns empirical and asymptotic taxonomic diversity (‘TD’) for dunes
data, along with its confidence interval at diversity order q from 0 to
2. Here only show the first ten rows.
out1 <- AO3D(dunes$data, diversity = 'TD', datatype = "abundance",
method = c("Asymptotic", "Observed"), nboot = 30, conf = 0.95)
head(out1, 10)
Order.q qD s.e. qD.LCL qD.UCL Assemblage Method
1 0.0 21.487936 3.9145582 13.815543 29.160329 EM Asymptotic
2 0.2 16.936433 2.0626563 12.893701 20.979165 EM Asymptotic
3 0.4 13.885407 1.0562767 11.815142 15.955671 EM Asymptotic
4 0.6 11.862781 0.5891924 10.707985 13.017577 EM Asymptotic
5 0.8 10.496859 0.4118876 9.689575 11.304144 EM Asymptotic
6 1.0 9.541765 0.3605125 8.835173 10.248357 EM Asymptotic
7 1.2 8.847719 0.3517639 8.158274 9.537164 EM Asymptotic
8 1.4 8.325449 0.3555980 7.628489 9.022408 EM Asymptotic
9 1.6 7.920890 0.3628292 7.209758 8.632022 EM Asymptotic
10 1.8 7.600095 0.3706320 6.873669 8.326520 EM Asymptotic
GRAPHIC DISPLAYS FUNCTION: ggAO3D()
ggAO3D(outcome, profile = "q")
ggAO3D
plots q-profile, time-profile, and tau-profile
based on ggplot2
. Here outcome
is the object
from the function AO3D
, and profile
is a
profile selection versus to diversity. Default is
profile = "q"
. Note that profile = "time"
is
allowed for only when diversity = “PD” and profile = "tau"
profile is allowed for only when diversity = "FD"
and
FDtype = "tau_values"
.
# q-profile curves
ggAO3D(out1,profile = "q")

The argument profile = "time"
in ggAO3D
function creates a separate plot for each diversity order q. Therefore
the different assemblages will be represented by different color
lines.
# time-profile curves, separating by "Order.q"
data(data.inc)
data <- data.inc$data
tree <- data.inc$tree
nT <- data.inc$nT
out2 <- AO3D(data, diversity = 'PD', q = c(0, 1, 2), datatype = "incidence_raw",
nT = nT, nboot = 30, method = c("Asymptotic", "Observed"),
PDtree = tree, PDreftime = seq(0.1, 82.8575, length.out = 40))
ggAO3D(out2, profile = "time")

The argument profile = "tau"
in ggAO3D
function creates a separate plot for each diversity order q. Therefore
the different assemblages will be represented by different color
lines.
# tau-profile curves, separating by "Order.q"
data <- dunes$data
distM <- dunes$dist
out3 <- AO3D(data, diversity = 'FD', q = c(0, 1, 2), datatype = "abundance", nboot = 30, method = c("Asymptotic", "Observed"),
FDtau = seq(0, 1, 0.1), FDdistM = distM, FDtype = 'tau_values')
ggAO3D(out3, profile = "tau")

License
The iNEXT.3D package is licensed under the GPLv3. To help refine
iNEXT.3D
, your comments or feedback would be welcome
(please send them to Anne Chao or report an issue on the iNEXT.3D github
iNEXT.3D_github.
References
Chao, A., Chiu, C.-H., Villéger, S., Sun, I.-F., Thorn, S., Lin,
Y.-C., Chiang, J. M. and Sherwin, W. B. (2019). An attribute-diversity
approach to functional diversity, functional beta diversity, and related
(dis)similarity measures. Ecological Monographs, 89, e01343.
10.1002/ecm.1343.
Chao, A., Henderson, P. A., Chiu, C.-H., Moyes, F., Hu, K.-H.,
Dornelas, M and Magurran, A. E. (2021). Measuring temporal change in
alpha diversity: a framework integrating taxonomic, phylogenetic and
functional diversity and the iNEXT.3D standardization. Methods in
Ecology and Evolution, 12, 1926-1940.
T.C. Hsieh, K. H. Ma, and Chao, A. (2016). iNEXT: An R package
for rarefaction and extrapolation of species diversity (Hill numbers).
Methods in Ecology and Evolution, 7, 1451-1456.