Introduction to connectedness

What problem does this package solve?

In genetic evaluation, animals are often compared across herds, flocks, regions, years, or other management units (MUs). Those comparisons are not equally reliable in all data sets. When management units are weakly linked by pedigree or genomic relationships, differences in estimated breeding values across units become less precise.

The connectedness package addresses that problem. It computes pairwise connectedness between MUs from the same ingredients that define the evaluation context:

the relationship structure among animals,
the assignment of animals to management units,
the fixed-effects design,
and the assumed variance components.

The package currently supports connectedness computed from:

A-inverse (pedigree-based connectedness),
G-inverse (genomic connectedness),
H-inverse (combined pedigree-genomic connectedness), and
a custom inverse kernel supplied by the user.

Which metrics does the package compute?

The package reports two complementary metrics:

PEVD contrast: prediction error variance of differences between MU contrasts. Lower values indicate stronger connectedness.
CD contrast: coefficient of determination for the same contrasts. Higher values indicate stronger connectedness.

Both metrics are computed under the contrast approach through mixed model equations. That point is important: the package is not combining unrelated metrics, but two connected summaries derived from the same contrast framework. This is one of the reasons why the package can also exploit ideas such as temporal overlap between units when records are restricted to a common time window.

If desired, PEVD can be returned scaled by the additive genetic variance by setting scale_pevd = TRUE in compute_connectedness().

Quick start

For most users, the main entry point is compute_connectedness().

A minimal pedigree-based analysis looks like this:

library(connectedness)

res <- compute_connectedness(
  data          = data,
  animal_col    = "animal",
  mu_col        = "MU",
  fixed_formula = ~ 1 + sex,
  sigma2a       = 50,
  sigma2e       = 100,
  relationship  = "Ainv",
  pedigree      = pedigree
)

print(res)
plot(res, which = "CD")
plot(res, which = "PEVD")

That is the shortest mental model of the package:

provide the animals, MUs, and fixed-effect structure,
choose the relationship representation,
obtain pairwise CD and PEVD between MUs.

Inputs: what does the function need?

The package separates the problem into two layers.

1. Evaluation context

These inputs define what comparison problem you want to study:

data: records used to define the animals and management units involved,
animal_col: animal identifier column,
mu_col: management unit column,
fixed_formula: fixed-effects structure,
sigma2a, sigma2e: variance components.

2. Relationship structure

These inputs define how animals are connected genetically:

relationship = "Ainv" requires a pedigree,
relationship = "Ginv" requires a genotype matrix X and an index linking row order to animal IDs,
relationship = "Hinv" requires pedigree + genotypes + genotyped_idx,
relationship = "custom" uses a supplied inverse kernel.

So the workflow is intentionally explicit: the user chooses not only the data, but also the type of connectedness they want to quantify.

Outputs: what do you get back?

compute_connectedness() returns an object of class "connectedness". The most important components are:

CD: matrix of pairwise CD contrast values,
PEVD: matrix of pairwise PEVD contrast values,
qK: denominator of the contrast under the chosen kernel,
qC: prediction error numerator of the contrast,
n_target: number of target animals per MU,
relationship: the inverse relationship matrix used,
overlap: temporal overlap table when a time window is used.

If scale_pevd = TRUE, the PEVD matrix is returned on the scale PEVD / sigma2a.

Example

The example below contrasts two small pedigree scenarios:

a weakly connected setting, where the two MUs are linked only weakly,
a more connected setting, where a sire is shared across MUs.

The goal is not biological realism, but to illustrate how the package responds to a simple change in pedigree structure and why PEVD and CD do not always move in the same direction.

Scenario 1: weaker connectedness

pedigree_weak <- data.frame(
  animal = c("S1", "S2", "D1", "D2", "A1", "A2", "B1", "B2"),
  sire   = c("0",  "0",  "0",  "0",  "S1", "S1", "S2", "S2"),
  dam    = c("0",  "0",  "0",  "0",  "D1", "D2", "D1", "D2")
)

data_weak <- data.frame(
  animal = c("A1", "A2", "B1", "B2"),
  MU     = c("MU1", "MU1", "MU2", "MU2"),
  sex    = c("M", "F", "M", "F")
)

res_weak <- compute_connectedness(
  data          = data_weak,
  animal_col    = "animal",
  mu_col        = "MU",
  fixed_formula = ~ 1 + sex,
  sigma2a       = 50,
  sigma2e       = 100,
  relationship  = "Ainv",
  pedigree      = pedigree_weak
)

Scenario 2: stronger connectedness through a shared sire

pedigree_strong <- data.frame(
  animal = c("S1", "D1", "D2", "A1", "A2", "B1", "B2"),
  sire   = c("0",  "0",  "0",  "S1", "S1", "S1", "S1"),
  dam    = c("0",  "0",  "0",  "D1", "D2", "D1", "D2")
)

data_strong <- data.frame(
  animal = c("A1", "A2", "B1", "B2"),
  MU     = c("MU1", "MU1", "MU2", "MU2"),
  sex    = c("M", "F", "M", "F")
)

res_strong <- compute_connectedness(
  data          = data_strong,
  animal_col    = "animal",
  mu_col        = "MU",
  fixed_formula = ~ 1 + sex,
  sigma2a       = 50,
  sigma2e       = 100,
  relationship  = "Ainv",
  pedigree      = pedigree_strong
)

Compare the two results

res_weak$CD
res_weak$PEVD
res_weak$qK

res_strong$CD
res_strong$PEVD
res_strong$qK

The weak-versus-strong comparison is informative, but the two metrics do not need to move in parallel.

In this toy example, the stronger pedigree link (a shared sire across MUs) reduces PEVD, from 33.33 in the weak scenario to 20 in the strong scenario. This reflects a more precisely estimated contrast between the two management units.

At the same time, the stronger link also reduces the kernel-based denominator of the contrast (qK), from 1 in the weak scenario to 0.5 in the strong scenario. In other words, the two MUs become more connected, but also less genetically distinct. Because CD is interpreted relative to that denominator, it can decrease even when PEVD decreases.

So, in this example, stronger connectedness improves precision of the contrast (lower PEVD), but also reduces the expected genetic variability underlying that contrast (lower qK), which leads to a lower CD.

Optional PEVD scaling

res_strong_scaled <- compute_connectedness(
  data          = data_strong,
  animal_col    = "animal",
  mu_col        = "MU",
  fixed_formula = ~ 1 + sex,
  sigma2a       = 50,
  sigma2e       = 100,
  relationship  = "Ainv",
  pedigree      = pedigree_strong,
  scale_pevd    = TRUE
)

res_strong_scaled$PEVD

Choosing among Ainv, Ginv, Hinv, and custom

A practical way to think about the four options is:

use Ainv when the target notion of connectedness is pedigree-based,
use Ginv when realized genomic links are the main object of interest,
use Hinv when you want the connectedness measure to align with a combined pedigree-genomic evaluation context,
use custom when your inverse kernel is built elsewhere.

Example: G-inverse

X <- matrix(
  c(0, 1, 2, 1,
    1, 1, 2, 0,
    2, 1, 0, 1,
    1, 2, 1, 0),
  nrow = 4,
  byrow = TRUE
)
rownames(X) <- c("A1", "A2", "B1", "B2")

animal_index <- setNames(seq_len(nrow(X)), rownames(X))

res_G <- compute_connectedness(
  data          = data_strong,
  animal_col    = "animal",
  mu_col        = "MU",
  fixed_formula = ~ 1 + sex,
  sigma2a       = 50,
  sigma2e       = 100,
  relationship  = "Ginv",
  X             = X,
  animal_index  = animal_index
)

Example: H-inverse

renum <- renum_pedigree(pedigree_strong, verbose = FALSE)
genotyped_idx <- renum$new_id[match(rownames(X), renum$animal)]

res_H <- compute_connectedness(
  data          = data_strong,
  animal_col    = "animal",
  mu_col        = "MU",
  fixed_formula = ~ 1 + sex,
  sigma2a       = 50,
  sigma2e       = 100,
  relationship  = "Hinv",
  pedigree      = pedigree_strong,
  X             = X,
  genotyped_idx = genotyped_idx
)

Temporal overlap: why the contrast framework helps

The package can optionally restrict the analysis to a common time window and report where MU pairs actually overlap in time.

data_time <- transform(
  data_strong,
  year = c(2020, 2021, 2020, 2021)
)

res_time <- compute_connectedness(
  data                 = data_time,
  animal_col           = "animal",
  mu_col               = "MU",
  fixed_formula        = ~ 1 + sex,
  sigma2a              = 50,
  sigma2e              = 100,
  relationship         = "Ainv",
  pedigree             = pedigree_strong,
  year_col             = "year",
  year_window          = c(2020, 2021),
  min_records_per_year = 1
)

plot(res_time, which = "overlap")

This is a useful extension of the contrast logic: connectedness is not only a function of pedigree or genomics, but also of which units are effectively represented in the same time horizon.