Introduction to connectedness

What problem does this package solve?

In genetic evaluation, animals are often compared across herds, flocks, regions, years, or other management units (MUs). Those comparisons are not equally reliable in all data sets. When management units are weakly linked by pedigree or genomic relationships, differences in estimated breeding values across units become less precise.

The connectedness package addresses that problem. It computes pairwise connectedness between MUs from the same ingredients that define the evaluation context:

The package currently supports connectedness computed from:

Which metrics does the package compute?

The package reports two complementary metrics:

Both metrics are computed under the contrast approach through mixed model equations. That point is important: the package is not combining unrelated metrics, but two connected summaries derived from the same contrast framework. This is one of the reasons why the package can also exploit ideas such as temporal overlap between units when records are restricted to a common time window.

If desired, PEVD can be returned scaled by the additive genetic variance by setting scale_pevd = TRUE in compute_connectedness().

Quick start

For most users, the main entry point is compute_connectedness().

A minimal pedigree-based analysis looks like this:

library(connectedness)

res <- compute_connectedness(
  data          = data,
  animal_col    = "animal",
  mu_col        = "MU",
  fixed_formula = ~ 1 + sex,
  sigma2a       = 50,
  sigma2e       = 100,
  relationship  = "Ainv",
  pedigree      = pedigree
)

print(res)
plot(res, which = "CD")
plot(res, which = "PEVD")

That is the shortest mental model of the package:

  1. provide the animals, MUs, and fixed-effect structure,
  2. choose the relationship representation,
  3. obtain pairwise CD and PEVD between MUs.

Inputs: what does the function need?

The package separates the problem into two layers.

1. Evaluation context

These inputs define what comparison problem you want to study:

2. Relationship structure

These inputs define how animals are connected genetically:

So the workflow is intentionally explicit: the user chooses not only the data, but also the type of connectedness they want to quantify.

Outputs: what do you get back?

compute_connectedness() returns an object of class "connectedness". The most important components are:

If scale_pevd = TRUE, the PEVD matrix is returned on the scale PEVD / sigma2a.

Example

The example below contrasts two small pedigree scenarios:

The goal is not biological realism, but to illustrate how the package responds to a simple change in pedigree structure and why PEVD and CD do not always move in the same direction.

Scenario 1: weaker connectedness

pedigree_weak <- data.frame(
  animal = c("S1", "S2", "D1", "D2", "A1", "A2", "B1", "B2"),
  sire   = c("0",  "0",  "0",  "0",  "S1", "S1", "S2", "S2"),
  dam    = c("0",  "0",  "0",  "0",  "D1", "D2", "D1", "D2")
)

data_weak <- data.frame(
  animal = c("A1", "A2", "B1", "B2"),
  MU     = c("MU1", "MU1", "MU2", "MU2"),
  sex    = c("M", "F", "M", "F")
)

res_weak <- compute_connectedness(
  data          = data_weak,
  animal_col    = "animal",
  mu_col        = "MU",
  fixed_formula = ~ 1 + sex,
  sigma2a       = 50,
  sigma2e       = 100,
  relationship  = "Ainv",
  pedigree      = pedigree_weak
)

Scenario 2: stronger connectedness through a shared sire

pedigree_strong <- data.frame(
  animal = c("S1", "D1", "D2", "A1", "A2", "B1", "B2"),
  sire   = c("0",  "0",  "0",  "S1", "S1", "S1", "S1"),
  dam    = c("0",  "0",  "0",  "D1", "D2", "D1", "D2")
)

data_strong <- data.frame(
  animal = c("A1", "A2", "B1", "B2"),
  MU     = c("MU1", "MU1", "MU2", "MU2"),
  sex    = c("M", "F", "M", "F")
)

res_strong <- compute_connectedness(
  data          = data_strong,
  animal_col    = "animal",
  mu_col        = "MU",
  fixed_formula = ~ 1 + sex,
  sigma2a       = 50,
  sigma2e       = 100,
  relationship  = "Ainv",
  pedigree      = pedigree_strong
)

Compare the two results

res_weak$CD
res_weak$PEVD
res_weak$qK

res_strong$CD
res_strong$PEVD
res_strong$qK

The weak-versus-strong comparison is informative, but the two metrics do not need to move in parallel.

In this toy example, the stronger pedigree link (a shared sire across MUs) reduces PEVD, from 33.33 in the weak scenario to 20 in the strong scenario. This reflects a more precisely estimated contrast between the two management units.

At the same time, the stronger link also reduces the kernel-based denominator of the contrast (qK), from 1 in the weak scenario to 0.5 in the strong scenario. In other words, the two MUs become more connected, but also less genetically distinct. Because CD is interpreted relative to that denominator, it can decrease even when PEVD decreases.

So, in this example, stronger connectedness improves precision of the contrast (lower PEVD), but also reduces the expected genetic variability underlying that contrast (lower qK), which leads to a lower CD.

Optional PEVD scaling

res_strong_scaled <- compute_connectedness(
  data          = data_strong,
  animal_col    = "animal",
  mu_col        = "MU",
  fixed_formula = ~ 1 + sex,
  sigma2a       = 50,
  sigma2e       = 100,
  relationship  = "Ainv",
  pedigree      = pedigree_strong,
  scale_pevd    = TRUE
)

res_strong_scaled$PEVD

Choosing among Ainv, Ginv, Hinv, and custom

A practical way to think about the four options is:

Example: G-inverse

X <- matrix(
  c(0, 1, 2, 1,
    1, 1, 2, 0,
    2, 1, 0, 1,
    1, 2, 1, 0),
  nrow = 4,
  byrow = TRUE
)
rownames(X) <- c("A1", "A2", "B1", "B2")

animal_index <- setNames(seq_len(nrow(X)), rownames(X))

res_G <- compute_connectedness(
  data          = data_strong,
  animal_col    = "animal",
  mu_col        = "MU",
  fixed_formula = ~ 1 + sex,
  sigma2a       = 50,
  sigma2e       = 100,
  relationship  = "Ginv",
  X             = X,
  animal_index  = animal_index
)

Example: H-inverse

renum <- renum_pedigree(pedigree_strong, verbose = FALSE)
genotyped_idx <- renum$new_id[match(rownames(X), renum$animal)]

res_H <- compute_connectedness(
  data          = data_strong,
  animal_col    = "animal",
  mu_col        = "MU",
  fixed_formula = ~ 1 + sex,
  sigma2a       = 50,
  sigma2e       = 100,
  relationship  = "Hinv",
  pedigree      = pedigree_strong,
  X             = X,
  genotyped_idx = genotyped_idx
)

Temporal overlap: why the contrast framework helps

The package can optionally restrict the analysis to a common time window and report where MU pairs actually overlap in time.

data_time <- transform(
  data_strong,
  year = c(2020, 2021, 2020, 2021)
)

res_time <- compute_connectedness(
  data                 = data_time,
  animal_col           = "animal",
  mu_col               = "MU",
  fixed_formula        = ~ 1 + sex,
  sigma2a              = 50,
  sigma2e              = 100,
  relationship         = "Ainv",
  pedigree             = pedigree_strong,
  year_col             = "year",
  year_window          = c(2020, 2021),
  min_records_per_year = 1
)

plot(res_time, which = "overlap")

This is a useful extension of the contrast logic: connectedness is not only a function of pedigree or genomics, but also of which units are effectively represented in the same time horizon.

Visualization

The package provides a direct plotting method for a first visual inspection.

plot(res_strong, which = "CD")
plot(res_strong, which = "PEVD")
plot(res_strong, which = "all")

A quick visual check is often the easiest way to identify MU pairs that are well connected, weakly connected, or effectively isolated.