| Title: | Quantifying phenotypic spaces |
|---|---|
| Description: | Facilitates quantifying phenotypic trait spaces and comparing spaces across groups. |
| Authors: | Marcelo Araya-Salas, Karan Odom & Alejandro Rico-Guevara |
| Maintainer: | Marcelo Araya-Salas <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.1.1 |
| Built: | 2026-05-11 05:46:03 UTC |
| Source: | https://github.com/maRce10/PhenotypeSpace |
binary_triangular_matrix creates binary triangular matrices representing categorical data in a distance matrix form
binary_triangular_matrix(group, labels = NULL)binary_triangular_matrix(group, labels = NULL)
group |
Character vector or factor containing categories to be represented as a pairwise binary matrix. Several observations per categories (at least some categories) are required. |
labels |
Character vector or factor containing labels to be used for rows/columns in the output matrix. Optional. Default is |
The function creates binary triangular matrices representing categorical data in a pairwise distance matrix form. Such matrices represent group membership by assigning 0 to pairs of observations that belong to the same category (individual, group, population) and 1 to those belonging to different categories. Binary pairwise matrices can be useful to evaluate association between a categorical and continuous variable (represented as pairwise distances) using Mantel test (as in Araya-Salas et al. 2019).
A pairwise distance matrix that represents group membership. See details.
Marcelo Araya-Salas [email protected])
Araya-Salas, M, K. Odom. & A. Rico-Guevara. 2022, PhenotypeSpace: an R package to quantify and compare phenotypic trait spaces R package version 0.1.0.
Araya-Salas M, G Smith-vidaurre, D Mennill, P González-Gómez, J Cahill & T Wright. 2019. Social group signatures in hummingbird displays provide evidence of co-occurrence of vocal and visual learning. Proceedings of the Royal Society B. 286: 20190666.
Smouse PE, Long JC, Sokal RR. 1986 Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Syst. Zool. 35, 627–632.
distance_to_rectangular, rectangular_to_triangular
{ # create 3 groups each one with 2 observations groups <- paste0("G", rep(1:3, each = 2)) # create binary matrix binary_triangular_matrix(group = groups) # create binary matrix using labels binary_triangular_matrix(group = groups, labels = paste(groups, 1:6, sep = "-")) }{ # create 3 groups each one with 2 observations groups <- paste0("G", rep(1:3, each = 2)) # create binary matrix binary_triangular_matrix(group = groups) # create binary matrix using labels binary_triangular_matrix(group = groups, labels = paste(groups, 1:6, sep = "-")) }
distance_to_rectangular converts binary triangular matrices to rectangular matrices using Multidimensional Scaling.
distance_to_rectangular( distance.matrix, labels = names(distance.matrix), n.dimensions = 2, metric = TRUE, ... )distance_to_rectangular( distance.matrix, labels = names(distance.matrix), n.dimensions = 2, metric = TRUE, ... )
distance.matrix |
Distance matrix (i.e. object of class 'dist'). Can be created using the function |
labels |
Character vector or factor containing labels to be used for rows/columns in the output data frame. Default is |
n.dimensions |
Integer vector of length 1 indicating the number of of dimensions to represent distances in a new space. Default is 2. |
metric |
Logical argument to control if Metric (a.k.a. Classical, |
... |
Additional arguments to be passed to |
It is a silly wrapper over 2 multidimensional scaling functions (isoMDS and cmdscale) that simplifies the calculation of Multidimensional Scaling and formatting of its output to be used with other functions in the package.
A data frame with the new dimensions representing the position of observations in a new n-dimension space. If metric = FALSE the output data frame is embedded in a list that also includes the stress value.
Marcelo Araya-Salas [email protected])
Araya-Salas, M, K. Odom. & A. Rico-Guevara. 2022, PhenotypeSpace: an R package to quantify and compare phenotypic trait spaces R package version 0.1.0.
distance_to_rectangular, rectangular_to_triangular
{ data("example_space") dist_example <- dist(example_space[example_space$group %in% c("G1", "G2"), c("dimension_1", "dimension_2")]) # convert into a 2-dimension space rect_example <- distance_to_rectangular(distance.matrix = dist_example, metric = TRUE) head(rect_example) # convert into a 2-dimension space with non-metric MDS rect_example <- distance_to_rectangular(distance.matrix = dist_example, metric = FALSE, maxit = 3) }{ data("example_space") dist_example <- dist(example_space[example_space$group %in% c("G1", "G2"), c("dimension_1", "dimension_2")]) # convert into a 2-dimension space rect_example <- distance_to_rectangular(distance.matrix = dist_example, metric = TRUE) head(rect_example) # convert into a 2-dimension space with non-metric MDS rect_example <- distance_to_rectangular(distance.matrix = dist_example, metric = FALSE, maxit = 3) }
example_space is a data frame with 1550 observations of a simulated phenotypic tri-dimensional space including 5 groups.
data(example_space)data(example_space)
An object of class data.frame with 1550 rows and 5 columns.
Marcelo Araya Salas, PhenotypeSpace
{ # load data data("example_space") #plot space in 2 dimensions xs <- tapply(example_space$dimension_1, example_space$group, mean) ys <- tapply(example_space$dimension_2, example_space$group, mean) plot(example_space[, c("dimension_1", "dimension_2")], col = example_space$color, pch = 20, cex = 1.8) text(xs, ys, labels = names(xs), cex = 2.5) # Install and load necessary libraries rlang::check_installed("plotly") library(plotly) # plot space in 4 dimensions plot_ly( data = example_space, x = ~dimension_1, y = ~dimension_2, z = ~dimension_3, type = "scatter3d", mode = "markers", alpha = 0.8, marker = list(size = 4), color = ~ group, colors = unique(example_space$color) ) }{ # load data data("example_space") #plot space in 2 dimensions xs <- tapply(example_space$dimension_1, example_space$group, mean) ys <- tapply(example_space$dimension_2, example_space$group, mean) plot(example_space[, c("dimension_1", "dimension_2")], col = example_space$color, pch = 20, cex = 1.8) text(xs, ys, labels = names(xs), cex = 2.5) # Install and load necessary libraries rlang::check_installed("plotly") library(plotly) # plot space in 4 dimensions plot_ly( data = example_space, x = ~dimension_1, y = ~dimension_2, z = ~dimension_3, type = "scatter3d", mode = "markers", alpha = 0.8, marker = list(size = 4), color = ~ group, colors = unique(example_space$color) ) }
plot_space plots bidimensional trait spaces
plot_space( X, dimensions, indices, basecex = 1, title = NULL, colors = c("#3E4A89FF", "#35B779FF"), point.colors = colors, point.alpha = 0.7, point.cex = 1, background.indices = NULL, pch = 1, labels = c("sub-space", "total space"), legend.pos = "topright", density.alpha = 0.6, ... )plot_space( X, dimensions, indices, basecex = 1, title = NULL, colors = c("#3E4A89FF", "#35B779FF"), point.colors = colors, point.alpha = 0.7, point.cex = 1, background.indices = NULL, pch = 1, labels = c("sub-space", "total space"), legend.pos = "topright", density.alpha = 0.6, ... )
X |
Data frame containing columns for the dimensions of the phenotypic space (numeric) of the total trait space. This is required so the extent of the plotting area represents the overall trait space in which the sub-space is found. |
dimensions |
Character vector of length 2 with the names of the columns containing the dimensions of the phenotypic space. |
indices |
Numeric vector with the indices of the rows in 'X' to be used as sub-space for plotting. |
basecex |
Numeric vector of length 1 controlling the relative size of the axis labels and tick labels, legend and title. Legend and title are multiply by 1.5 ( |
title |
Character vector of length 1 to be used as the plot title. Default is |
colors |
Character vector with the colors to use for density plotting. 2 values must be supplied if 'background.indices' is supplied. Default is |
point.colors |
Character vector with the colors to use for point plotting. 2 values must be supplied if 'background.indices' is supplied. Default is the same as "colors". |
point.alpha |
Numeric vector of length 1 >= 0 and <= 1 with the alpha value for color transparency. Default is 0.7. If 0 points are not plotted. |
point.cex |
Numeric vector of length 1 controlling the relative size of the points. Default is 1. If 0 points are not plotted. |
background.indices |
Numeric vector with the indices of the rows in 'X' to be used as background traits space for plotting. Points from 'indices' will be plotted on top of these points. |
pch |
Either an integer specifying a symbol or a single character to be used as the default in plotting points. See |
labels |
Character vector with the labels to be used in the legend. Not used if |
legend.pos |
Controls the position of the legend. Can take the following values: "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright" (default), "right" and "center". If |
density.alpha |
Numeric vector of length 1 >= 0 and <= 1 with the alpha value for color transparency to be used in the highest density regions. Lower density regions will gradually increase in transparency starting from the supplied value. Default is 0.6. If 0 densities are not plotted. |
... |
Additional arguments to be passed to |
The function plots a sub-group of data (i.e. sub-space) within the overall trait space. The total trait space can also be plotted in the background. By default both points and kernel densities are shown. Graphs are returned in the active graphic device.
A single panel plot in the active graphic device.
Marcelo Araya-Salas [email protected])
Araya-Salas, M, K. Odom. & A. Rico-Guevara. 2022, PhenotypeSpace: an R package to quantify and compare phenotypic trait spaces R package version 0.1.0.
distance_to_rectangular, rectangular_to_triangular
{ data("example_space") # no background plot_space(X = example_space, dimensions = c("dimension_1", "dimension_2"), indices = which(example_space$group == "G2")) # add background plot_space(X = example_space, dimensions = c("dimension_1", "dimension_2"), indices = which(example_space$group == "G2"), background.indices = which(example_space$group != "G2")) # change legend labels plot_space(X = example_space, dimensions = c("dimension_1", "dimension_2"), indices = which(example_space$group == "G2"), background.indices = which(example_space$group != "G2"), labels = c("G3", "trait space")) # change legend position plot_space(X = example_space, dimensions = c("dimension_1", "dimension_2"), indices = which(example_space$group == "G2"), background.indices = which(example_space$group != "G2"), labels = c("G3", "trait space"), legend.pos = "left") # with title plot_space(X = example_space, dimensions = c("dimension_1", "dimension_2"), indices = which(example_space$group == "G2"), background.indices = which(example_space$group != "G2"), labels = c("G3", "trait space"), legend.pos = "bottomleft", title = "G3") }{ data("example_space") # no background plot_space(X = example_space, dimensions = c("dimension_1", "dimension_2"), indices = which(example_space$group == "G2")) # add background plot_space(X = example_space, dimensions = c("dimension_1", "dimension_2"), indices = which(example_space$group == "G2"), background.indices = which(example_space$group != "G2")) # change legend labels plot_space(X = example_space, dimensions = c("dimension_1", "dimension_2"), indices = which(example_space$group == "G2"), background.indices = which(example_space$group != "G2"), labels = c("G3", "trait space")) # change legend position plot_space(X = example_space, dimensions = c("dimension_1", "dimension_2"), indices = which(example_space$group == "G2"), background.indices = which(example_space$group != "G2"), labels = c("G3", "trait space"), legend.pos = "left") # with title plot_space(X = example_space, dimensions = c("dimension_1", "dimension_2"), indices = which(example_space$group == "G2"), background.indices = which(example_space$group != "G2"), labels = c("G3", "trait space"), legend.pos = "bottomleft", title = "G3") }
rarefact_space_similarity
rarefact_space_similarity( formula, data, n = NULL, replace = FALSE, seed = NULL, cores = 1, pb = TRUE, iterations = 30, ... )rarefact_space_similarity( formula, data, n = NULL, replace = FALSE, seed = NULL, cores = 1, pb = TRUE, iterations = 30, ... )
formula |
an object of class "formula" (or one that can be coerced to that class).Must follow the form |
data |
Data frame containing columns for the dimensions of the phenotypic space (numeric) and a categorical or factor column with group labels. |
n |
Integer vector of length 1 indicating the number of samples to be use for rarefaction (i.e. how many samples per group will be gather at each iteration). Default is the minimum sample size across groups. |
replace |
Logical argument to control if sampling is done with replacement. Default is |
seed |
Integer vector of length 1 setting the seed (see |
cores |
Numeric vector of length 1. Controls whether parallel computing is applied by specifying the number of cores to be used. Default is 1 (i.e. no parallel computing). |
pb |
Logical argument to control if progress bar is shown. Default is |
iterations |
Integer vector of length 1. Controls how the number of times the rarefaction routine is iterated. Default is 30. |
... |
Additional arguments to be passed to |
The function applies a rarefaction sub-sampling procedure for evaluating pairwise space similarity (internally using space_similarity). The spread and shape of a phenotypic space might change as a function of number of samples. Hence, ideally, spaces should be compared between groups of similar sample sizes. Rarefaction allows to compare groups of unbalanced sample sizes by randomly re-sampling observations using the same number samples across groups iteratively.
A data frame containing the mean, minimum, maximum and standard deviation of the similarity metric across iterations for each pair of groups. If the similarity metric is not symmetric (e.g. the proportional area of A that overlaps B is not necessarily the same as the area of B that overlaps A, see space_similarity) separated columns are supplied for the two comparisons.
Marcelo Araya-Salas [email protected])
Araya-Salas, M, K. Odom. & A. Rico-Guevara. 2022, PhenotypeSpace: an R package to quantify and compare phenotypic trait spaces R package version 0.1.0.
space_similarity, rectangular_to_triangular
{ # load data data("example_space") # get proportion of space that overlaps (try with more iterations on your own data) prop_overlaps <- rarefact_space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "proportional.overlap", iterations = 5) # get minimum convex polygon overlap for each group (non-symmetric) mcp_overlaps <- rarefact_space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, iterations = 5) # convert to non-symmetric triangular matrix rectangular_to_triangular(mcp_overlaps, symmetric = FALSE) }{ # load data data("example_space") # get proportion of space that overlaps (try with more iterations on your own data) prop_overlaps <- rarefact_space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "proportional.overlap", iterations = 5) # get minimum convex polygon overlap for each group (non-symmetric) mcp_overlaps <- rarefact_space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, iterations = 5) # convert to non-symmetric triangular matrix rectangular_to_triangular(mcp_overlaps, symmetric = FALSE) }
rarefact_space_size
rarefact_space_size( formula, data, n = NULL, replace = FALSE, seed = NULL, cores = 1, pb = TRUE, iterations = 30, ... )rarefact_space_size( formula, data, n = NULL, replace = FALSE, seed = NULL, cores = 1, pb = TRUE, iterations = 30, ... )
formula |
an object of class "formula" (or one that can be coerced to that class).Must follow the form |
data |
Data frame containing columns for the dimensions of the phenotypic space (numeric) and a categorical or factor column with group labels. |
n |
Integer vector of length 1 indicating the number of samples to be use for rarefaction (i.e. how many samples per group will be gather at each iteration). Default is the minimum sample size across groups. Integer vector of length 1 indicating the number of samples to be use for rarefaction (i.e. how many samples per group will be gather at each iteration). Default is the minimum sample size across groups. |
replace |
Logical argument to control if sampling is done with replacement. Default is |
seed |
Integer vector of length 1 setting the seed (see |
cores |
Numeric vector of length 1. Controls whether parallel computing is applied by specifying the number of cores to be used. Default is 1 (i.e. no parallel computing). |
pb |
Logical argument to control if progress bar is shown. Default is |
iterations |
Integer vector of length 1. Controls how the number of times the rarefaction routine is iterated. Default is 30. |
... |
Additional arguments to be passed to |
The function applies a rarefaction sub-sampling procedure for evaluating pairwise space size differences (internally using space_size). The size of a phenotypic space might change as a function of number of samples. Hence, ideally, spaces should be compared between groups of similar sample sizes. Rarefaction allows to compare groups of unbalanced sample sizes by randomly re-sampling observations using the same number samples across groups iteratively.
A data frame containing the mean, minimum, maximum and standard deviation of the size difference across iterations for each pair of groups.
Marcelo Araya-Salas [email protected])
Araya-Salas, M, K. Odom. & A. Rico-Guevara. 2022, PhenotypeSpace: an R package to quantify and compare phenotypic trait spaces R package version 0.1.0.
rarefact_space_similarity, space_size_difference
{ # load data data("example_space") # get rarefacted MCP space size # (try with more iterations on your own data) rarefact_space_size( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mcp") # mst rarefacted rarefact_space_size( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mst") }{ # load data data("example_space") # get rarefacted MCP space size # (try with more iterations on your own data) rarefact_space_size( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mcp") # mst rarefacted rarefact_space_size( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mst") }
rarefact_space_size_difference
rarefact_space_size_difference( formula, data, n = NULL, replace = FALSE, seed = NULL, cores = 1, pb = TRUE, iterations = 30, ... )rarefact_space_size_difference( formula, data, n = NULL, replace = FALSE, seed = NULL, cores = 1, pb = TRUE, iterations = 30, ... )
formula |
an object of class "formula" (or one that can be coerced to that class).Must follow the form |
data |
Data frame containing columns for the dimensions of the phenotypic space (numeric) and a categorical or factor column with group labels. |
n |
Integer vector of length 1 indicating the number of samples to be use for rarefaction (i.e. how many samples per group will be gather at each iteration). Default is the minimum sample size across groups. |
replace |
Logical argument to control if sampling is done with replacement. Default is |
seed |
Integer vector of length 1 setting the seed (see |
cores |
Numeric vector of length 1. Controls whether parallel computing is applied by specifying the number of cores to be used. Default is 1 (i.e. no parallel computing). |
pb |
Logical argument to control if progress bar is shown. Default is |
iterations |
Integer vector of length 1. Controls how the number of times the rarefaction routine is iterated. Default is 30. |
... |
Additional arguments to be passed to |
The function applies a rarefaction sub-sampling procedure for evaluating pairwise space size differences (internally using space_size_difference). The size of a phenotypic space might change as a function of number of samples. Hence, ideally, spaces should be compared between groups of similar sample sizes. Rarefaction allows to compare groups of unbalanced sample sizes by randomly re-sampling observations using the same number samples across groups iteratively.
A data frame containing the mean, minimum, maximum and standard deviation of the space size difference across iterations for each pair of groups.
Marcelo Araya-Salas [email protected])
Araya-Salas, M, K. Odom. & A. Rico-Guevara. 2022, PhenotypeSpace: an R package to quantify and compare phenotypic trait spaces R package version 0.1.0.
rarefact_space_similarity, space_size_difference
{ # load data data("example_space") # get rarefied size difference using MCP (try with more iterations on your own data) mcp_size_diff <- rarefact_space_size_difference( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mcp", iterations = 5) # convert to non-symmetric triangular matrix rectangular_to_triangular(mcp_size_diff, symmetric = FALSE) }{ # load data data("example_space") # get rarefied size difference using MCP (try with more iterations on your own data) mcp_size_diff <- rarefact_space_size_difference( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mcp", iterations = 5) # convert to non-symmetric triangular matrix rectangular_to_triangular(mcp_size_diff, symmetric = FALSE) }
rectangular_to_triangular converts rectangular pairwise matrices as those output by many PhenotypeSpace functions into triangular pairwise matrices.
rectangular_to_triangular(X, distance = TRUE, symmetric = TRUE)rectangular_to_triangular(X, distance = TRUE, symmetric = TRUE)
X |
Data frame containing three columns. The first two columns must contain group labels which will appear as rows (1 column) and column names (2 column) in the output triangular matrix. The third column (and fourth column if |
distance |
Logical argument to control if the input data contains pairwise distances (dissimilarities) or similarities. If |
symmetric |
Logical argument to define if values are duplicated on both off-diagonal triangles (a symmetric triangular matrix, |
The function converts rectangular pairwise matrices as those output by many PhenotypeSpace functions into triangular pairwise matrices. It takes a data frame in which each observation (row) contains the pairwise value and related labels of the 'groups' being compared. The first two columns must contain group labels which will appear as rows (1 column) and column names (2 column) in the output triangular matrix. The third column (and fourth column if symmetric = FALSE) must have the numeric values to be included in the output triangular matrix.
A pairwise triangular matrix in which labels from the first group column in 'X' are shown in the columns and labels from the second group are shown in the rows. If symmetric = FALSE the same information is shown below and above the diagonal.
Marcelo Araya-Salas [email protected])
Araya-Salas, M, K. Odom. & A. Rico-Guevara. 2022, PhenotypeSpace: an R package to quantify and compare phenotypic trait spaces R package version 0.1.0.
distance_to_rectangular, binary_triangular_matrix
{ # load data data("example_space") # get proportion of space that overlaps prop_overlaps <- space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "proportional.overlap") # get symmetric triangular matrix rectangular_to_triangular(prop_overlaps) # get minimum convex polygon overlap for each group (non-symmetric) mcp_overlaps <- space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mcp.overlap") # get a non-symmetric triangular matrix rectangular_to_triangular(mcp_overlaps, symmetric = FALSE) }{ # load data data("example_space") # get proportion of space that overlaps prop_overlaps <- space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "proportional.overlap") # get symmetric triangular matrix rectangular_to_triangular(prop_overlaps) # get minimum convex polygon overlap for each group (non-symmetric) mcp_overlaps <- space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mcp.overlap") # get a non-symmetric triangular matrix rectangular_to_triangular(mcp_overlaps, symmetric = FALSE) }
space_similarity estimate pairwise similarities of phenotype spaces
space_similarity( formula, data, cores = 1, method = "mcp.overlap", pb = TRUE, outliers = 0.95, pairwise.scale = FALSE, distance.method = "Euclidean", seed = NULL, ... )space_similarity( formula, data, cores = 1, method = "mcp.overlap", pb = TRUE, outliers = 0.95, pairwise.scale = FALSE, distance.method = "Euclidean", seed = NULL, ... )
formula |
an object of class "formula" (or one that can be coerced to that class).Must follow the form |
data |
Data frame containing columns for the dimensions of the phenotypic space (numeric) and a categorical or factor column with group labels. |
cores |
Numeric vector of length 1. Controls whether parallel computing is applied by specifying the number of cores to be used. Default is 1 (i.e. no parallel computing). |
method |
Character vector of length 1. Controls the method of (di)similarity metric to be compare the phenotypic sub-spaces of two groups at the time. Seven built-in metrics are available which quantify as pairwise sub-space overlap ('similarity') or pairwise distance between bi-dimensional sub-spaces ('dissimilarity'):
In addition, machine learning classification models can also be used for quantify dissimilarity as a measured of how discriminable two groups are. These models can use more than two dimensions to represent phenotyypic spaces. The following classification models can be used: "AdaBag", "avNNet", "bam", "C5.0", "C5.0Cost", "C5.0Rules", "C5.0Tree", "gam", "gamLoess", "glmnet", "glmStepAIC", "kernelpls", "kknn", "lda", "lda2", "LogitBoost", "msaenet", "multinom", "nnet", "null", "ownn", "parRF", "pcaNNet", "pls", "plsRglm", "pre", "qda", "randomGLM", "rf", "rFerns", "rocc", "rotationForest", "rotationForestCp", "RRF", "RRFglobal", "sda", "simpls", "slda", "smda", "snn", "sparseLDA", "svmLinear2", "svmLinearWeights", "treebag", "widekernelpls" and "wsrf". See https://topepo.github.io/caret/train-models-by-tag.html for details on each of these models. Additional arguments can be pased using |
pb |
Logical argument to control if progress bar is shown. Default is |
outliers |
Numeric vector of length 1. A value between 0 and 1 controlling the proportion of outlier observations to be excluded. Outliers are determined as those farthest away from the sub-space centroid. Ignored when using machine learning methods. |
pairwise.scale |
Logical argument to control if pairwise phenotypic spaces are scaled (i.e. z-transformed) prior to similarity estimation. If so ( |
distance.method |
Character vector of length 1 indicating the method to be used for measuring distances (hence only applicable when distances are calculated). Available distance measures are: "Euclidean" (default), "Manhattan", "supremum", "Canberra", "Wave", "divergence", "Bray", "Soergel", "Podani", "Chord", "Geodesic" and "Whittaker". If a similarity measure is used similarities are converted to distances. |
seed |
Integer number containing the random number generator (RNG) state for random number generation in order to make results from the machine learning stochastic methods replicable. |
... |
Additional arguments to be passed to |
The function quantifies pairwise similarity between phenotypic sub-spaces. The built-in methods quantify similarity as the overlap (similarity, or machine learning based discriminability) or distance (dissimilarity) between group. Machine learning methods implemented in the caret package function train are available to assess the similarity of spaces as the proportion of observations that are incorrectly classified. In this case group overlaps are the class-wise errors (if available) while the mean overlap is calculated as 1- model accuracy.
A data frame containing the similarity metric for each pair of groups. If the similarity metric is not symmetric (e.g. the proportional area of A that overlaps B is not necessarily the same as the area of B that overlaps A, see space_similarity) separated columns are supplied for the two comparisons.
Marcelo Araya-Salas [email protected])
Araya-Salas, M, K. Odom. & A. Rico-Guevara. 2022, PhenotypeSpace: an R package to quantify and compare phenotypic trait spaces R package version 0.1.0.
rarefact_space_similarity, space_size_difference
{ # load data data("example_space") # get proportion of space that overlaps prop_overlaps <- space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "proportional.overlap") #' # get symmetric triangular matrix rectangular_to_triangular(prop_overlaps) # get minimum convex polygon overlap for each group (non-symmetric) mcp_overlaps <- space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mcp.overlap") # convert to non-symmetric triangular matrix rectangular_to_triangular(mcp_overlaps, symmetric = FALSE) # check available distance measures summary(proxy::pr_DB) # get eculidean distances (default) area_dist <- space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "distance", distance.method = "Euclidean") # get Canberra distances area_dist <- space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "distance", distance.method = "Canberra") ## using machine learning classification methods # check if caret package and needed dependencies are available rlang::check_installed("caret") rlang::check_installed("randomForest") # random forest 3 dimension data, using 5 repeats and repeated CV resampling # extract data subset sub_data <- example_space[example_space$group %in% c("G1", "G2", "G3"), ] # set method parameters ctrl <- caret::trainControl(method = "repeatedcv", repeats = 5) # get similarities ("overlap") space_similarity( formula = group ~ dimension_1 + dimension_2 + dimension_3, data = sub_data, method = "rf", trControl = ctrl, tuneLength = 4, seed = 123 ) # Single C5.0 Tree using boot resampling ctrl <- caret::trainControl(method = "boot") space_similarity( formula = group ~ dimension_1 + dimension_2, data = sub_data, method = "C5.0Tree", trControl = ctrl, tuneLength = 3 ) }{ # load data data("example_space") # get proportion of space that overlaps prop_overlaps <- space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "proportional.overlap") #' # get symmetric triangular matrix rectangular_to_triangular(prop_overlaps) # get minimum convex polygon overlap for each group (non-symmetric) mcp_overlaps <- space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mcp.overlap") # convert to non-symmetric triangular matrix rectangular_to_triangular(mcp_overlaps, symmetric = FALSE) # check available distance measures summary(proxy::pr_DB) # get eculidean distances (default) area_dist <- space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "distance", distance.method = "Euclidean") # get Canberra distances area_dist <- space_similarity( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "distance", distance.method = "Canberra") ## using machine learning classification methods # check if caret package and needed dependencies are available rlang::check_installed("caret") rlang::check_installed("randomForest") # random forest 3 dimension data, using 5 repeats and repeated CV resampling # extract data subset sub_data <- example_space[example_space$group %in% c("G1", "G2", "G3"), ] # set method parameters ctrl <- caret::trainControl(method = "repeatedcv", repeats = 5) # get similarities ("overlap") space_similarity( formula = group ~ dimension_1 + dimension_2 + dimension_3, data = sub_data, method = "rf", trControl = ctrl, tuneLength = 4, seed = 123 ) # Single C5.0 Tree using boot resampling ctrl <- caret::trainControl(method = "boot") space_similarity( formula = group ~ dimension_1 + dimension_2, data = sub_data, method = "C5.0Tree", trControl = ctrl, tuneLength = 3 ) }
space_size
space_size( formula, data, cores = 1, method = "mcp", pb = TRUE, outliers = 0.95, ... )space_size( formula, data, cores = 1, method = "mcp", pb = TRUE, outliers = 0.95, ... )
formula |
an object of class "formula" (or one that can be coerced to that class).Must follow the form |
data |
Data frame containing columns for the dimensions of the phenotypic space (numeric) and a categorical or factor column with group labels. |
cores |
Numeric vector of length 1. Controls whether parallel computing is applied by specifying the number of cores to be used. Default is 1 (i.e. no parallel computing). |
method |
Character vector of length 1. Controls the method to be used for quantifying space size. Three metrics are available:
|
pb |
Logical argument to control if progress bar is shown. Default is |
outliers |
Numeric vector of length 1. A value between 0 and 1 controlling the proportion of outlier observations to be excluded. Outliers are determined as those farthest away from the sub-space centroid. |
... |
Additional arguments to be passed to |
The function quantifies the size of the phenotypic sub-spaces.
A data frame containing the phenotypic space size for each group.
Marcelo Araya-Salas [email protected])
Araya-Salas, M, K. Odom. & A. Rico-Guevara. 2022, PhenotypeSpace: an R package to quantify and compare phenotypic trait spaces R package version 0.1.0.
rarefact_space_size, space_size_difference, rarefact_space_size_difference
{ # load data data("example_space") # plot data xs <- tapply(example_space$dimension_1, example_space$group, mean) ys <- tapply(example_space$dimension_2, example_space$group, mean) plot(example_space[, c("dimension_1", "dimension_2")], col = example_space$color, pch = 20, cex = 1.8) text(xs, ys, labels = names(xs), cex = 2.5) # MCP spaces space_size( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mcp") # MST space_size( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mst") }{ # load data data("example_space") # plot data xs <- tapply(example_space$dimension_1, example_space$group, mean) ys <- tapply(example_space$dimension_2, example_space$group, mean) plot(example_space[, c("dimension_1", "dimension_2")], col = example_space$color, pch = 20, cex = 1.8) text(xs, ys, labels = names(xs), cex = 2.5) # MCP spaces space_size( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mcp") # MST space_size( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mst") }
space_size_difference
space_size_difference( formula, data, cores = 1, method = "mcp", pb = TRUE, outliers = 0.95, ... )space_size_difference( formula, data, cores = 1, method = "mcp", pb = TRUE, outliers = 0.95, ... )
formula |
an object of class "formula" (or one that can be coerced to that class).Must follow the form |
data |
Data frame containing columns for the dimensions of the phenotypic space (numeric) and a categorical or factor column with group labels. |
cores |
Numeric vector of length 1. Controls whether parallel computing is applied by specifying the number of cores to be used. Default is 1 (i.e. no parallel computing). |
method |
Character vector of length 1. Controls the method to be used for quantifying space size. Three metrics are available:
|
pb |
Logical argument to control if progress bar is shown. Default is |
outliers |
Numeric vector of length 1. A value between 0 and 1 controlling the proportion of outlier observations to be excluded. Outliers are determined as those farthest away from the sub-space centroid. |
... |
Additional arguments to be passed to |
The function estimates the pairwise size difference in phenotypic space as a simple subtraction between the sizes of two spaces. As such it can be seen as an additional metric of similarity complementing those found in space_similarity.
A data frame containing the space size difference for each pair of groups.
Marcelo Araya-Salas [email protected])
Araya-Salas, M, K. Odom. & A. Rico-Guevara. 2022, PhenotypeSpace: an R package to quantify and compare phenotypic trait spaces R package version 0.1.0.
space_size, space_similarity, rarefact_space_size_difference
{ # load data data("example_space") # MCP size (try with more iterations on your own data) mcp_size <- space_size_difference( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mcp") # MST size mcp_size <- space_size_difference( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mst") }{ # load data data("example_space") # MCP size (try with more iterations on your own data) mcp_size <- space_size_difference( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mcp") # MST size mcp_size <- space_size_difference( formula = group ~ dimension_1 + dimension_2, data = example_space, method = "mst") }