Skip to content

Commit 617baf0

Browse files
Copilotosorensen
andcommitted
Verified example runs successfully and all tests pass
Co-authored-by: osorensen <21175639+osorensen@users.noreply.github.com>
1 parent 7c48df5 commit 617baf0

File tree

375 files changed

+43296
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

375 files changed

+43296
-0
lines changed
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
Package: BayesMallowsSMC2
2+
Type: Package
3+
Title: Nested Sequential Monte Carlo for the Bayesian Mallows Model
4+
Version: 0.1.1
5+
Authors@R: c(person("Oystein", "Sorensen",
6+
email = "oystein.sorensen.1985@gmail.com",
7+
role = c("aut", "cre"),
8+
comment = c(ORCID = "0000-0003-0724-3542")))
9+
Maintainer: Oystein Sorensen <oystein.sorensen.1985@gmail.com>
10+
Description: Provides nested sequential Monte Carlo algorithms for performing
11+
sequential inference in the Bayesian Mallows model, which is a widely used
12+
probability model for rank and preference data. The package implements the
13+
SMC² (Sequential Monte Carlo Squared) algorithm for handling sequentially
14+
arriving rankings and pairwise preferences, including support for complete
15+
rankings, partial rankings, and pairwise comparisons. The methods are based
16+
on Sørensen (2025) <doi:10.1214/25-BA1564>.
17+
License: GPL-3
18+
Encoding: UTF-8
19+
LazyData: true
20+
Roxygen: list(markdown = TRUE)
21+
RoxygenNote: 7.3.2
22+
LinkingTo: Rcpp, RcppArmadillo
23+
Imports: Rcpp
24+
Depends: R (>= 4.1.0)
25+
Suggests: testthat (>= 3.0.0), label.switching (>= 1.8)
26+
Config/testthat/edition: 3
27+
NeedsCompilation: yes
28+
Packaged: 2026-01-21 09:58:44 UTC; runner
29+
Author: Oystein Sorensen [aut, cre] (<https://orcid.org/0000-0003-0724-3542>)
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Generated by roxygen2: do not edit by hand
2+
3+
export(compute_sequentially)
4+
export(precompute_topological_sorts)
5+
export(set_hyperparameters)
6+
export(set_smc_options)
7+
importFrom(Rcpp,sourceCpp)
8+
useDynLib(BayesMallowsSMC2, .registration = TRUE)
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
## usethis namespace: start
2+
#' @importFrom Rcpp sourceCpp
3+
## usethis namespace: end
4+
NULL
5+
6+
## usethis namespace: start
7+
#' @useDynLib BayesMallowsSMC2, .registration = TRUE
8+
## usethis namespace: end
9+
NULL
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Generated by using Rcpp::compileAttributes() -> do not edit by hand
2+
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393
3+
4+
#' Precompute All Topological Sorts
5+
#'
6+
#' This function precomputes all topological sorts for a given preference matrix and
7+
#' saves them to a specified output directory. It ensures the output directory exists
8+
#' and creates it if it does not.
9+
#'
10+
#' @param prefs A matrix representing the preference relations. This matrix
11+
#' must have two columns, the first of which represents the preferred item
12+
#' and the second of which represents the disfavored item.
13+
#' @param n_items An integer specifying the number of items to sort.
14+
#' @param save_frac Number between 0 and 1 specifying which fraction of sorts to save.
15+
#'
16+
#' @details
17+
#' The function generates all possible topological sorts for the provided preference matrix
18+
#' and saves approximately `save_frac` of the sorts in a matrix which is returned.
19+
#'
20+
#' @return This function returns the number of topological sorts.
21+
#'
22+
#' @export
23+
#' @examples
24+
#' # Extract preferences from user 1 in the included example data.
25+
#' prefs <- pairwise_preferences[
26+
#' pairwise_preferences$user == 1, c("top_item", "bottom_item"), drop = FALSE]
27+
#'
28+
#' # Generate all topological sorts, but don't save them:
29+
#' sorts <- precompute_topological_sorts(
30+
#' prefs = as.matrix(prefs),
31+
#' n_items = 5,
32+
#' save_frac = 0
33+
#' )
34+
#' # Number of sorts
35+
#' sorts$sort_count
36+
#' # Empty matrix
37+
#' sorts$sort_matrix
38+
#'
39+
#' # Generate all topological sorts and save them:
40+
#' sorts <- precompute_topological_sorts(
41+
#' prefs = as.matrix(prefs),
42+
#' n_items = 5,
43+
#' save_frac = 1
44+
#' )
45+
#' # Number of sorts
46+
#' sorts$sort_count
47+
#' # Matrix with all of them
48+
#' sorts$sort_matrix
49+
#'
50+
precompute_topological_sorts <- function(prefs, n_items, save_frac) {
51+
.Call(`_BayesMallowsSMC2_precompute_topological_sorts`, prefs, n_items, save_frac)
52+
}
53+
54+
run_smc <- function(input_timeseries, input_prior, input_options, input_sort_matrices, input_sort_counts) {
55+
.Call(`_BayesMallowsSMC2_run_smc`, input_timeseries, input_prior, input_options, input_sort_matrices, input_sort_counts)
56+
}
57+
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
#' Compute the Bayesian Mallows model sequentially
2+
#'
3+
#'
4+
#' @param data A dataframe containing partial rankings or pairwise preferences.
5+
#' If `data` contains complete or partial rankings, it must have the following
6+
#' columns:
7+
#'
8+
#' \itemize{
9+
#' \item `timepoint`: a numeric vector denoting the timepoint, starting at 1.
10+
#' \item `user`: a vector identifying the user.
11+
#' \item `item1`: ranking of item 1.
12+
#' \item `item2`: ranking of item 2.
13+
#' \item etc.
14+
#' }
15+
#'
16+
#' If data contains pairwise preferences, it must have the following
17+
#' structure:
18+
#'
19+
#' \itemize{
20+
#' \item `timepoint`: a numeric vector denoting the timepoint, starting at 1.
21+
#' \item `user`: a vector identifying the user.
22+
#' \item `top_item`: identifier for the preferred item.
23+
#' \item `bottom_item`: identifier for the dispreferred item.
24+
#' }
25+
#'
26+
#' @param hyperparameters A list returned from [set_hyperparameters()].
27+
#' @param smc_options A list returned from [set_smc_options()]
28+
#' @param topological_sorts A list returned from
29+
#' [precompute_topological_sorts()]. Only used with preference data, and
30+
#' defaults to `NULL`.
31+
#'
32+
#' @return An object of class BayesMallowsSMC2.
33+
#' @export
34+
#'
35+
#' @examples
36+
#' # Compute the model sequentially with complete rankings
37+
#' mod <- compute_sequentially(
38+
#' complete_rankings,
39+
#' hyperparameters = set_hyperparameters(n_items = 5),
40+
#' smc_options = set_smc_options(n_particles = 100, n_particle_filters = 1)
41+
#' )
42+
#'
43+
compute_sequentially <- function(
44+
data,
45+
hyperparameters = set_hyperparameters(),
46+
smc_options = set_smc_options(),
47+
topological_sorts = NULL
48+
){
49+
rank_columns <- grepl("item[0-9]+", colnames(data))
50+
preference_columns <- grepl("top\\_item|bottom\\_item", colnames(data))
51+
52+
if(any(rank_columns)) {
53+
input_timeseries <- split(data, f = ~ timepoint) |>
54+
lapply(split, f = ~ user) |>
55+
lapply(function(x) lapply(x, function(y) as.numeric(y[rank_columns])))
56+
57+
if(any(is.na(data[rank_columns]))) {
58+
attr(input_timeseries, "type") <- "partial rankings"
59+
} else {
60+
attr(input_timeseries, "type") <- "complete rankings"
61+
}
62+
sort_matrices <- sort_counts <- list()
63+
} else if(sum(preference_columns) == 2) {
64+
if(is.null(topological_sorts)) {
65+
stop("topological_sorts must be provided with preference data.")
66+
}
67+
input_timeseries <- split(data, f = ~ timepoint) |>
68+
lapply(split, f = ~ user) |>
69+
lapply(function(x) lapply(x, function(y) as.matrix(y[preference_columns])))
70+
attr(input_timeseries, "type") <- "pairwise preferences"
71+
72+
sort_matrices <- lapply(topological_sorts, function(x) {
73+
lapply(x, function(y) y$sort_matrix)
74+
})
75+
76+
sort_counts <- lapply(topological_sorts, function(x) {
77+
lapply(x, function(y) y$sort_count)
78+
})
79+
} else {
80+
stop("Something wrong with data")
81+
}
82+
83+
if(max(table(data$user)) > 1 &&
84+
attr(input_timeseries, "type") != "pairwise preferences") {
85+
stop("Updated users not supported.")
86+
}
87+
88+
ret <- run_smc(input_timeseries, hyperparameters, smc_options,
89+
sort_matrices, sort_counts)
90+
91+
class(ret) <- "BayesMallowsSMC2"
92+
ret
93+
}
94+
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
#' Simulated Data
2+
#'
3+
#' @format ## `complete_rankings`
4+
#' A data frame with 100 rows and 7 columns:
5+
#' \describe{
6+
#' \item{timepoint}{Timepoint}
7+
#' \item{user}{User id}
8+
#' \item{item1, item2, item3, item4, item5}{Ranking given to the item.}
9+
#' }
10+
"complete_rankings"
11+
12+
#' Simulated Data with Missing Values
13+
#'
14+
#' @format ## `partial_rankings`
15+
#' A data frame with 100 rows and 7 columns:
16+
#' \describe{
17+
#' \item{timepoint}{Timepoint}
18+
#' \item{user}{User id}
19+
#' \item{item1, item2, item3, item4, item5}{Ranking given to the item.}
20+
#' }
21+
"partial_rankings"
22+
23+
#' Simulated Data with Pairwise Preferences
24+
#'
25+
#' @format ## `pairwise_preferences`
26+
#' A data frame with 400 rows and 4 columns:
27+
#' \describe{
28+
#' \item{timepoint}{Timepoint}
29+
#' \item{user}{User id}
30+
#' \item{top_item}{Item preferred in the comparison.}
31+
#' \item{bottom_item}{Item disfavored in the comparison.}
32+
#' }
33+
"pairwise_preferences"
34+
35+
#' Simulated Two-Component Mixtures Data
36+
#'
37+
#' @format ## `mixtures`
38+
#' A data frame with 400 rows and 7 columns:
39+
#' \describe{
40+
#' \item{timepoint}{Timepoint}
41+
#' \item{user}{User id}
42+
#' \item{item1, item2, item3, item4, item5}{Ranking given to the item.}
43+
#' }
44+
"mixtures"
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#' Set hyperparameters
2+
#'
3+
#' @param n_items Integer defining the number of items.
4+
#' @param alpha_shape Shape of gamma prior for alpha.
5+
#' @param alpha_rate Rate of gamma prior for alpha.
6+
#' @param cluster_concentration Concentration parameter of Dirichlet distribution for cluster probabilities.
7+
#' @param n_clusters Integer defining the number of clusters.
8+
#'
9+
#' @return A list
10+
#' @export
11+
#'
12+
set_hyperparameters <- function(
13+
n_items, alpha_shape = 1, alpha_rate = .5, cluster_concentration = 10,
14+
n_clusters = 1) {
15+
if(missing(n_items)) stop("n_items must be provided")
16+
as.list(environment())
17+
}
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
#' Set SMC options
2+
#'
3+
#' @param n_particles Number of particles
4+
#' @param n_particle_filters Initial number of particle filters for each
5+
#' particle
6+
#' @param max_particle_filters Maximum number of particle filters.
7+
#' @param resampling_threshold Effective sample size threshold for resampling
8+
#' @param doubling_threshold Threshold for particle filter doubling. If the
9+
#' acceptance rate of the rejuvenation step falls below this threshold, the
10+
#' number of particle filters is doubled. Defaults to 0.2.
11+
#' @param max_rejuvenation_steps Maximum number of rejuvenation steps. If the
12+
#' number of unique particles has not exceeded half the number of particles
13+
#' after this many steps, the rejuvenation is still stopped.
14+
#' @param metric Metric
15+
#' @param resampler resampler
16+
#' @param latent_rank_proposal latent rank proposal
17+
#' @param verbose Boolean
18+
#' @param trace Logical specifying whether to save static parameters at each
19+
#' timestep.
20+
#' @param trace_latent Logical specifying whether to sample and save one
21+
#' complete set of latent rankings for each particle and each timepoint.
22+
#'
23+
#' @return A list
24+
#' @export
25+
#'
26+
set_smc_options <- function(
27+
n_particles = 1000, n_particle_filters = 50, max_particle_filters = 10000,
28+
resampling_threshold = n_particles / 2, doubling_threshold = .2,
29+
max_rejuvenation_steps = 20,
30+
metric = "footrule", resampler = "multinomial",
31+
latent_rank_proposal = "uniform", verbose = FALSE,
32+
trace = FALSE, trace_latent = FALSE) {
33+
as.list(environment())
34+
}
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
2+
<!-- README.md is generated from README.Rmd. Please edit that file -->
3+
4+
# BayesMallowsSMC2
5+
6+
<!-- badges: start -->
7+
8+
[![R-CMD-check](https://github.com/osorensen/BayesMallowsSMC2/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/osorensen/BayesMallowsSMC2/actions/workflows/R-CMD-check.yaml)
9+
<!-- badges: end -->
10+
11+
BayesMallowsSMC2 provides functions for performing sequential inference
12+
in the Bayesian Mallows model using the SMC$^{2}$ algorithm.
13+
14+
## Installation
15+
16+
You can install the development version of BayesMallowsSMC2 from
17+
[GitHub](https://github.com/) with:
18+
19+
``` r
20+
# install.packages("devtools")
21+
devtools::install_github("osorensen/BayesMallowsSMC2")
22+
```
23+
24+
## Usage
25+
26+
This package is under development, and is not yet well documented. For
27+
examples on how to use it, see the code in the OSF repository
28+
<https://osf.io/pquk4/>.
Binary file not shown.

0 commit comments

Comments
 (0)