In genetic evaluation, animals are often compared across herds, flocks, regions, years, or other management units (MUs). Those comparisons are not equally reliable in all data sets. When management units are weakly linked by pedigree or genomic relationships, differences in estimated breeding values across units become less precise.
The connectedness package addresses that problem. It computes pairwise connectedness between MUs from the same ingredients that define the evaluation context:
The package currently supports connectedness computed from:
The package reports two complementary metrics:
Both metrics are computed under the contrast approach through mixed model equations. That point is important: the package is not combining unrelated metrics, but two connected summaries derived from the same contrast framework. This is one of the reasons why the package can also exploit ideas such as temporal overlap between units when records are restricted to a common time window.
If desired, PEVD can be returned scaled by the additive
genetic variance by setting scale_pevd = TRUE in
compute_connectedness().
For most users, the main entry point is
compute_connectedness().
A minimal pedigree-based analysis looks like this:
library(connectedness)
res <- compute_connectedness(
data = data,
animal_col = "animal",
mu_col = "MU",
fixed_formula = ~ 1 + sex,
sigma2a = 50,
sigma2e = 100,
relationship = "Ainv",
pedigree = pedigree
)
print(res)
plot(res, which = "CD")
plot(res, which = "PEVD")That is the shortest mental model of the package:
The package separates the problem into two layers.
These inputs define what comparison problem you want to study:
data: records used to define the animals and management
units involved,animal_col: animal identifier column,mu_col: management unit column,fixed_formula: fixed-effects structure,sigma2a, sigma2e: variance
components.These inputs define how animals are connected genetically:
relationship = "Ainv" requires a pedigree,relationship = "Ginv" requires a genotype matrix
X and an index linking row order to animal IDs,relationship = "Hinv" requires pedigree + genotypes +
genotyped_idx,relationship = "custom" uses a supplied inverse
kernel.So the workflow is intentionally explicit: the user chooses not only the data, but also the type of connectedness they want to quantify.
compute_connectedness() returns an object of class
"connectedness". The most important components are:
CD: matrix of pairwise CD contrast values,PEVD: matrix of pairwise PEVD contrast values,qK: denominator of the contrast under the chosen
kernel,qC: prediction error numerator of the contrast,n_target: number of target animals per MU,relationship: the inverse relationship matrix
used,overlap: temporal overlap table when a time window is
used.If scale_pevd = TRUE, the PEVD matrix is
returned on the scale PEVD / sigma2a.
The example below contrasts two small pedigree scenarios:
The goal is not biological realism, but to illustrate how the package
responds to a simple change in pedigree structure and why
PEVD and CD do not always move in the same
direction.
pedigree_weak <- data.frame(
animal = c("S1", "S2", "D1", "D2", "A1", "A2", "B1", "B2"),
sire = c("0", "0", "0", "0", "S1", "S1", "S2", "S2"),
dam = c("0", "0", "0", "0", "D1", "D2", "D1", "D2")
)
data_weak <- data.frame(
animal = c("A1", "A2", "B1", "B2"),
MU = c("MU1", "MU1", "MU2", "MU2"),
sex = c("M", "F", "M", "F")
)
res_weak <- compute_connectedness(
data = data_weak,
animal_col = "animal",
mu_col = "MU",
fixed_formula = ~ 1 + sex,
sigma2a = 50,
sigma2e = 100,
relationship = "Ainv",
pedigree = pedigree_weak
)The weak-versus-strong comparison is informative, but the two metrics do not need to move in parallel.
In this toy example, the stronger pedigree link (a shared sire across
MUs) reduces PEVD, from 33.33 in the weak scenario to 20 in
the strong scenario. This reflects a more precisely estimated contrast
between the two management units.
At the same time, the stronger link also reduces the kernel-based
denominator of the contrast (qK), from 1 in the weak
scenario to 0.5 in the strong scenario. In other words, the two MUs
become more connected, but also less genetically distinct. Because
CD is interpreted relative to that denominator, it can
decrease even when PEVD decreases.
So, in this example, stronger connectedness improves precision of the
contrast (lower PEVD), but also reduces the expected
genetic variability underlying that contrast (lower qK),
which leads to a lower CD.
A practical way to think about the four options is:
X <- matrix(
c(0, 1, 2, 1,
1, 1, 2, 0,
2, 1, 0, 1,
1, 2, 1, 0),
nrow = 4,
byrow = TRUE
)
rownames(X) <- c("A1", "A2", "B1", "B2")
animal_index <- setNames(seq_len(nrow(X)), rownames(X))
res_G <- compute_connectedness(
data = data_strong,
animal_col = "animal",
mu_col = "MU",
fixed_formula = ~ 1 + sex,
sigma2a = 50,
sigma2e = 100,
relationship = "Ginv",
X = X,
animal_index = animal_index
)renum <- renum_pedigree(pedigree_strong, verbose = FALSE)
genotyped_idx <- renum$new_id[match(rownames(X), renum$animal)]
res_H <- compute_connectedness(
data = data_strong,
animal_col = "animal",
mu_col = "MU",
fixed_formula = ~ 1 + sex,
sigma2a = 50,
sigma2e = 100,
relationship = "Hinv",
pedigree = pedigree_strong,
X = X,
genotyped_idx = genotyped_idx
)The package can optionally restrict the analysis to a common time window and report where MU pairs actually overlap in time.
data_time <- transform(
data_strong,
year = c(2020, 2021, 2020, 2021)
)
res_time <- compute_connectedness(
data = data_time,
animal_col = "animal",
mu_col = "MU",
fixed_formula = ~ 1 + sex,
sigma2a = 50,
sigma2e = 100,
relationship = "Ainv",
pedigree = pedigree_strong,
year_col = "year",
year_window = c(2020, 2021),
min_records_per_year = 1
)
plot(res_time, which = "overlap")This is a useful extension of the contrast logic: connectedness is not only a function of pedigree or genomics, but also of which units are effectively represented in the same time horizon.
The package provides a direct plotting method for a first visual inspection.
A quick visual check is often the easiest way to identify MU pairs that are well connected, weakly connected, or effectively isolated.