| Title: | Bayesian State-Space Aggregation of Brazilian Presidential Polls |
|---|---|
| Description: | A set of dynamic measurement models to estimate latent vote shares from noisy polling sources. The models build on Jackman (2009, ISBN: 9780470011546) and feature specialized methods for bias adjustment based on past performance and correction for asymmetric errors based on candidate political alignment. |
| Authors: | Rafael N. Magalhães [aut, cre] |
| Maintainer: | Rafael N. Magalhães <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.3 |
| Built: | 2026-06-01 21:11:43 UTC |
| Source: | https://github.com/rnmag/agregr |
Defines configuration parameters for the poll aggregator, including Stan settings, and election details.
configurar_agregador( pesquisas = NULL, resultado_eleicao_passada = NULL, resultado_eleicao_atual = NULL, historico_pesquisas = NULL, candidaturas_1t = NULL, candidaturas_2t = NULL, direita_eleicao_atual = NULL, direita_eleicao_passada = "Bolsonaro", esquerda_eleicao_atual = NULL, esquerda_eleicao_passada = "Lula", eleicao_passada_primeiro_turno = "2/10/2022", eleicao_passada_segundo_turno = "30/10/2022", stan_cores = pmin(parallel::detectCores(), 4), stan_chains = 4, stan_warmup = 500, stan_sampling = 500, stan_init = 0.1, stan_adapt_delta = 0.99, saida_bases_tratadas = "resultados_agregador/bases_tratadas", saida_modelos_brutos = "resultados_agregador/modelos_brutos" )configurar_agregador( pesquisas = NULL, resultado_eleicao_passada = NULL, resultado_eleicao_atual = NULL, historico_pesquisas = NULL, candidaturas_1t = NULL, candidaturas_2t = NULL, direita_eleicao_atual = NULL, direita_eleicao_passada = "Bolsonaro", esquerda_eleicao_atual = NULL, esquerda_eleicao_passada = "Lula", eleicao_passada_primeiro_turno = "2/10/2022", eleicao_passada_segundo_turno = "30/10/2022", stan_cores = pmin(parallel::detectCores(), 4), stan_chains = 4, stan_warmup = 500, stan_sampling = 500, stan_init = 0.1, stan_adapt_delta = 0.99, saida_bases_tratadas = "resultados_agregador/bases_tratadas", saida_modelos_brutos = "resultados_agregador/modelos_brutos" )
pesquisas |
Path to a CSV file or URL containing current poll data. Defaults to a GitHub Raw URL. |
resultado_eleicao_passada |
Path to a CSV file containing results from the previous election. Defaults to a GitHub Raw URL. |
resultado_eleicao_atual |
Path to a CSV file containing results for the current election (useful for retrospective model). Defaults to a GitHub Raw URL. |
historico_pesquisas |
Path to a CSV/RDS file containing historical poll data. If NULL (default), uses the package's internal dataset. |
candidaturas_1t |
Character vector of candidates in the 1st round. If NULL, uses default candidates. |
candidaturas_2t |
Character vector of candidates in the 2nd round. If NULL, uses default candidates. |
direita_eleicao_atual |
Character vector of right-wing candidates in the current race. If NULL, uses default candidates. The model can compensate institute errors against right-wing candidates in the last election. |
direita_eleicao_passada |
Name of the right-wing candidate in the previous election. |
esquerda_eleicao_atual |
Character vector of left-wing candidates in the current race. If NULL, uses default candidates. The model can compensate institute errors against left-wing candidates in the last election. |
esquerda_eleicao_passada |
Name of the left-wing candidate in the previous election. |
eleicao_passada_primeiro_turno |
Date of the previous 1st round (e.g., "2/10/2022"). |
eleicao_passada_segundo_turno |
Date of the previous 2nd round (e.g., "30/10/2022"). |
stan_cores |
Number of CPU cores for Stan to use. |
stan_chains |
Number of MCMC chains. |
stan_warmup |
Number of warmup iterations per chain. |
stan_sampling |
Number of sampling iterations per chain. |
stan_init |
Initial value for Stan parameters. |
stan_adapt_delta |
The target acceptance rate for Stan's NUTS algorithm. |
saida_bases_tratadas |
Directory where treated data will be saved. |
saida_modelos_brutos |
Directory where raw model objects will be saved. |
A list of configuration parameters.
# Create custom Stan settings cfg_custom <- configurar_agregador( stan_warmup = 100, stan_sampling = 100 )# Create custom Stan settings cfg_custom <- configurar_agregador( stan_warmup = 100, stan_sampling = 100 )
Defines configuration parameters for graphics, including colors, fonts, and dimensions.
configurar_grafico( fonte = "Fira Sans", cores_candidaturas = NULL, simbolos = NULL, graf_largura = 2918, graf_altura = 1913, graf_unidade = "px", graf_dpi = 320, dir_grafico = "resultados_agregador/graficos" )configurar_grafico( fonte = "Fira Sans", cores_candidaturas = NULL, simbolos = NULL, graf_largura = 2918, graf_altura = 1913, graf_unidade = "px", graf_dpi = 320, dir_grafico = "resultados_agregador/graficos" )
fonte |
Font family (default: "Fira Sans"). |
cores_candidaturas |
Named vector or list of colors for candidates. Can be a partial override. |
simbolos |
Named vector or list of symbols for methodologies. Can be a partial override. |
graf_largura |
Width of saved plots. |
graf_altura |
Height of saved plots. |
graf_unidade |
Unit for dimensions ("px", "in", "cm", "mm"). |
graf_dpi |
DPI for saved plots. |
dir_grafico |
Directory to save plots. |
A list of graphic configuration parameters.
# Alternative colors for use in the config_grafico argument in a plot config_custom <- configurar_grafico( cores_candidaturas = c(Lula = "darkred") )# Alternative colors for use in the config_grafico argument in a plot config_custom <- configurar_grafico( cores_candidaturas = c(Lula = "darkred") )
Defines hyperparameters for the specific Bayesian models.
configurar_prioris(nome = "Viés Relativo com Pesos", ...)configurar_prioris(nome = "Viés Relativo com Pesos", ...)
nome |
Name of the model. Options: "Viés Relativo com Pesos", "Viés Relativo sem Pesos", "Viés Empírico", "Retrospectivo" and "Naive". |
... |
Named arguments to override default hyperparameters (e.g., |
A list of model parameters.
These hyperparameters control the strength of assumptions regarding latent state evolution, institute bias, and non-sampling errors.
Variable names refer to the model notation described in https://rnmag.github.io/agregR/index.html#conceptual-framework
Recommended reading: https://github.com/stan-dev/stan/wiki/prior-choice-recommendations
State Model - Level ()
mu_priori: Prior mean for the latent vote share at .
sd_mu_priori: Prior uncertainty for the initial latent vote.
Default values: starts with a flat prior of N(0.5, 0.5), allowing data to quickly dominate inference.
omega_eta_priori: Prior mean for the level volatility ().
sd_omega_eta_priori: Prior uncertainty for the level volatility.
Default values: With omega_eta_priori = 0.002 and sd_omega_eta_priori = 0.0001, the model assumes a baseline drift of approx. percentage points over a month ().
Higher values: The latent vote () can jump more from one day to the next. The model adapts more quickly to new polls but becomes more "jittery".
Lower values: The model assumes the public opinion level is more stable over time, resulting in smoother curves.
State Model - Trend ()
nu_priori: Prior mean for the initial trend (daily growth rate).
sd_nu_priori: Prior uncertainty for the initial trend.
Default values: With nu_priori = 0 and sd_nu_priori = 0.001, the model expects an initial trend within percentage points per day ().
omega_zeta_priori: Prior mean for the trend volatility ().
sd_omega_zeta_priori: Prior uncertainty for the trend volatility.
Default values: With omega_zeta_priori = 0 and sd_omega_zeta_priori = 0.00001, the model assumes a linear evolution, allowing the trend to shift rapidly (accelerations) only under strong evidence.
Higher values: The trend () can change direction or magnitude rapidly.
Lower values: The trend is assumed to be more constant over time (more linear evolution).
Institute Bias ()
delta_priori: Mean expected bias for institutes. Default is 0, except in "Viés Empírico" where it is anchored on past performance.
sd_delta_priori: Scale of the bias prior.
Default values: With delta_priori = 0 and sd_delta_priori = 0.02, the model assumes that 95% of institutes have a bias within percentage points ().
Higher values: Allow for larger, more variable biases across institutes.
Lower values: Constrain institutes to have similar biases (shrinkage toward the anchor).
Non-Sampling Error ()
tau_priori: Mean expected magnitude of errors not explained by sampling or house effects. In weighted models, this is replaced by the empirical RMSE from past elections.
sd_tau_priori: Prior uncertainty for non-sampling error.
Default values: With tau_priori = 0.02 and sd_tau_priori = 0.02, the model assumes a baseline of percentage points of "noise" in each poll, allowing it to spread closer to percentage points.
Higher values: The model treats polls as less precise, widening the credible intervals of the latent state.
Lower values: The model trusts polling precision more, leading to tighter intervals and potentially more sensitivity to outliers.
# Get default parameters for the "Naive" model naive_params <- configurar_prioris(nome = "Naive") # Get parameters for "Naive" and override a default value custom_params <- configurar_prioris(nome = "Naive", sd_mu_priori = 0.2)# Get default parameters for the "Naive" model naive_params <- configurar_prioris(nome = "Naive") # Get parameters for "Naive" and override a default value custom_params <- configurar_prioris(nome = "Naive", sd_mu_priori = 0.2)
Generates a plot of the aggregated poll results over time.
grafico_agregador( bd, salvar = FALSE, config_grafico = configurar_grafico(), dir_saida = NULL, ... )grafico_agregador( bd, salvar = FALSE, config_grafico = configurar_grafico(), dir_saida = NULL, ... )
bd |
The results object returned by |
salvar |
Logical. If TRUE, saves the plot to disk. |
config_grafico |
A list of graphic parameters created by |
dir_saida |
Output directory for the saved plot if |
... |
Additional arguments. |
A ggplot2 object.
if (instantiate::stan_cmdstan_exists()) { result <- rodar_agregador( data_inicio = "01/01/2025", turno = 2, cenario = "Lula vs Bolsonaro" ) # Standard plot std_plot <- grafico_agregador(result) # Altering candidate colors custom_plot <- grafico_agregador( result, config_grafico = configurar_grafico( cores_candidaturas = c(Lula = "yellow") ) ) }if (instantiate::stan_cmdstan_exists()) { result <- rodar_agregador( data_inicio = "01/01/2025", turno = 2, cenario = "Lula vs Bolsonaro" ) # Standard plot std_plot <- grafico_agregador(result) # Altering candidate colors custom_plot <- grafico_agregador( result, config_grafico = configurar_grafico( cores_candidaturas = c(Lula = "yellow") ) ) }
Generates a plot comparing prior and posterior distributions for candidates or bias.
grafico_priori_posteriori( bd, candidaturas, tipo = "Viés", salvar = FALSE, config_agregador = configurar_agregador(), config_grafico = configurar_grafico(), config_prioris = configurar_prioris(bd$nome_modelo), dir_saida = NULL )grafico_priori_posteriori( bd, candidaturas, tipo = "Viés", salvar = FALSE, config_agregador = configurar_agregador(), config_grafico = configurar_grafico(), config_prioris = configurar_prioris(bd$nome_modelo), dir_saida = NULL )
bd |
The results object returned by |
candidaturas |
A character vector of candidate names to include in the plot. |
tipo |
The type of da to plot: "Viés" (for institute bias) or "Percentual" (for candidate voting share). |
salvar |
Logical. If TRUE, saves the plot to disk. |
config_agregador |
A list of configuration parameters created by |
config_grafico |
A list of graphic parameters created by |
config_prioris |
A list of model hyperparameters created by |
dir_saida |
Output directory for the saved plot if |
A ggplot2 object.
if (instantiate::stan_cmdstan_exists()) { result <- rodar_agregador( data_inicio = "01/01/2025", turno = 2, cenario = "Lula vs Bolsonaro" ) # Prior vs Posterior plot for institute bias std_plot <- grafico_priori_posteriori( result, tipo = "Viés", candidaturas = c("Lula", "Bolsonaro") ) # Altering candidate colors custom_plot <- grafico_priori_posteriori( result, candidaturas = c("Lula", "Bolsonaro"), config_grafico = configurar_grafico( cores_candidaturas = c(Lula = "yelow") ) ) }if (instantiate::stan_cmdstan_exists()) { result <- rodar_agregador( data_inicio = "01/01/2025", turno = 2, cenario = "Lula vs Bolsonaro" ) # Prior vs Posterior plot for institute bias std_plot <- grafico_priori_posteriori( result, tipo = "Viés", candidaturas = c("Lula", "Bolsonaro") ) # Altering candidate colors custom_plot <- grafico_priori_posteriori( result, candidaturas = c("Lula", "Bolsonaro"), config_grafico = configurar_grafico( cores_candidaturas = c(Lula = "yelow") ) ) }
Generates a plot visualizing the bias of polling institutes.
grafico_vies( bd, candidaturas, salvar = FALSE, config_grafico = configurar_grafico(), dir_saida = NULL, ... )grafico_vies( bd, candidaturas, salvar = FALSE, config_grafico = configurar_grafico(), dir_saida = NULL, ... )
bd |
The results object returned by |
candidaturas |
A character vector of candidate names to include in the plot. |
salvar |
Logical. If TRUE, saves the plot to disk. |
config_grafico |
A list of graphic parameters created by |
dir_saida |
Output directory for the saved plot if |
... |
Additional arguments. |
A ggplot2 object.
if (instantiate::stan_cmdstan_exists()) { result <- rodar_agregador( data_inicio = "01/01/2025", turno = 2, cenario = "Lula vs Bolsonaro" ) # Standard bias plot std_plot <- grafico_vies( result, candidaturas = c("Lula", "Bolsonaro") ) # Altering candidate colors custom_plot <- grafico_vies( result, candidaturas = c("Lula", "Bolsonaro"), config_grafico = configurar_grafico( cores_candidaturas = c(Lula = "yellow") ) ) }if (instantiate::stan_cmdstan_exists()) { result <- rodar_agregador( data_inicio = "01/01/2025", turno = 2, cenario = "Lula vs Bolsonaro" ) # Standard bias plot std_plot <- grafico_vies( result, candidaturas = c("Lula", "Bolsonaro") ) # Altering candidate colors custom_plot <- grafico_vies( result, candidaturas = c("Lula", "Bolsonaro"), config_grafico = configurar_grafico( cores_candidaturas = c(Lula = "yellow") ) ) }
A dataset containing historical electoral polls compiled by Poder360. This dataset is used to calculate empirical priors for the models.
historico_pesquisas_poder360historico_pesquisas_poder360
A data frame with columns:
Election year
Office being contested
Condition (e.g., stimulated)
Entity that paid for the poll
Date of the poll
Reference date for the poll
Description of the electoral scenario
Unique ID for the candidate
Unique ID for the scenario
Unique ID for the poll
Name of the polling institute
Upper margin of error
Lower margin of error
Candidate name
City name (if applicable)
Official registration number
Entity where the poll was registered
Voting intention percentage
Sample size
Political party abbreviation
State abbreviation
Poll type
Vote type (Total, Valid, etc.)
Election round (1 or 2)
Poder360 via Base dos Dados
Main function to run the state-space model for poll aggregation.
rodar_agregador( bd = NULL, data_inicio = NULL, data_fim = Sys.Date(), cargo = "Presidente", ambito = "Brasil", cenario = NULL, turno, modelo = "Viés Relativo com Pesos", config_agregador = NULL, config_prioris = NULL, salvar = FALSE, dir_saida = NULL )rodar_agregador( bd = NULL, data_inicio = NULL, data_fim = Sys.Date(), cargo = "Presidente", ambito = "Brasil", cenario = NULL, turno, modelo = "Viés Relativo com Pesos", config_agregador = NULL, config_prioris = NULL, salvar = FALSE, dir_saida = NULL )
bd |
Dataframe or path to a CSV file containing poll data. |
data_inicio |
Start date for the analysis (mandatory). |
data_fim |
End date for the analysis. |
cargo |
The office/position being contested (e.g., "Presidente"). Current data only contains presidential polls, but the package supports expansion for other offices. |
ambito |
The geographical scope (e.g., "Brasil"). Current data only contains national polls, but the package supports expansion for state races. |
cenario |
The specific electoral scenario. Mandatory for second round. |
turno |
The election round (1 or 2). |
modelo |
The name of the model to run. Options: "Viés Relativo com Pesos" (default), "Viés Relativo sem Pesos", "Viés Empírico", "Retrospectivo" and "Naive". |
config_agregador |
A list of configuration parameters created by |
config_prioris |
A list of model hyperparameters created by |
salvar |
Logical. If TRUE, saves the results to disk. |
dir_saida |
Output directory for saved files if |
A list containing the model name, estimated votes, institute bias, and the raw model object.
The aggregator supports five types of Bayesian state-space models, each with specific assumptions about institute bias and non-sampling errors:
1. Viés Relativo com Pesos (Default)
Assumption: Institute biases are relative to the average of all institutes (latent "truth" is anchored to the consensus).
Bias (): Calculated relative to the mean bias of all institutes.
Weights (): Uses past election performance to weight the non-sampling error. Institutes with larger historical errors have less influence on the current estimate.
Use case: Best for general forecasting when historical data is available.
2. Viés Relativo sem Pesos
Assumption: Same as above, but treats all institutes as having equal potential quality a priori.
Bias (): Calculated relative to the mean bias.
Weights (): None. All institutes share the same prior for non-sampling error.
Use case: When historical data is unreliable or when a "fresh start" assumption is desired.
3. Viés Empírico
Assumption: Institute biases are anchored to their specific historical performance.
Bias (): Prior means are set to the bias observed in the previous election (directional error).
Weights (): Uses past performance for non-sampling error, similar to the "Com Pesos" model.
Use case: When institutes are expected to repeat their specific past directional errors (e.g., consistently underestimating a specific wing).
4. Retrospectivo
Assumption: The true election result is known and used as the final anchor for the state-space model.
Method: Runs the model "backwards" or constrained by the final result to estimate the true path of public opinion.
Use case: Post-election analysis to diagnose institute performance and calculate accurate biases for future calibration.
5. Naive
Assumption: Polls have no bias and no non-sampling error.
Method: A random walk model where the only source of uncertainty is the sampling error ().
Use case: Baseline comparison. Assumes "polls are perfect" within their margin of error.
The config_prioris argument allows customization of the model's hyperparameters with the configurar_prioris() function.
These hyperparameters control the strength of assumptions regarding latent state evolution, institute bias, and non-sampling errors.
Variable names refer to the model notation described in https://rnmag.github.io/agregR/index.html#conceptual-framework
Recommended reading: https://github.com/stan-dev/stan/wiki/prior-choice-recommendations
State Model - Level ()
mu_priori: Prior mean for the latent vote share at .
sd_mu_priori: Prior uncertainty for the initial latent vote.
Default values: starts with a flat prior of N(0.5, 0.5), allowing data to quickly dominate inference.
omega_eta_priori: Prior mean for the level volatility ().
sd_omega_eta_priori: Prior uncertainty for the level volatility.
Default values: With omega_eta_priori = 0.002 and sd_omega_eta_priori = 0.0001, the model assumes a baseline drift of approx. percentage points over a month ().
Higher values: The latent vote () can jump more from one day to the next. The model adapts more quickly to new polls but becomes more "jittery".
Lower values: The model assumes the public opinion level is more stable over time, resulting in smoother curves.
State Model - Trend ()
nu_priori: Prior mean for the initial trend (daily growth rate).
sd_nu_priori: Prior uncertainty for the initial trend.
Default values: With nu_priori = 0 and sd_nu_priori = 0.001, the model expects an initial trend within percentage points per day ().
omega_zeta_priori: Prior mean for the trend volatility ().
sd_omega_zeta_priori: Prior uncertainty for the trend volatility.
Default values: With omega_zeta_priori = 0 and sd_omega_zeta_priori = 0.00001, the model assumes a linear evolution, allowing the trend to shift rapidly (accelerations) only under strong evidence.
Higher values: The trend () can change direction or magnitude rapidly.
Lower values: The trend is assumed to be more constant over time (more linear evolution).
Institute Bias ()
delta_priori: Mean expected bias for institutes. Default is 0, except in "Viés Empírico" where it is anchored on past performance.
sd_delta_priori: Scale of the bias prior.
Default values: With delta_priori = 0 and sd_delta_priori = 0.02, the model assumes that 95% of institutes have a bias within percentage points ().
Higher values: Allow for larger, more variable biases across institutes.
Lower values: Constrain institutes to have similar biases (shrinkage toward the anchor).
Non-Sampling Error ()
tau_priori: Mean expected magnitude of errors not explained by sampling or house effects. In weighted models, this is replaced by the empirical RMSE from past elections.
sd_tau_priori: Prior uncertainty for non-sampling error.
Default values: With tau_priori = 0.02 and sd_tau_priori = 0.02, the model assumes a baseline of percentage points of "noise" in each poll, allowing it to spread closer to percentage points.
Higher values: The model treats polls as less precise, widening the credible intervals of the latent state.
Lower values: The model trusts polling precision more, leading to tighter intervals and potentially more sensitivity to outliers.
# Running the default model for a second round scenario if (instantiate::stan_cmdstan_exists()) { result <- rodar_agregador( data_inicio = "01/01/2025", turno = 2, cenario = "Lula vs Bolsonaro" ) # Tuning Stan, changing the model and altering specific priors custom_result <- rodar_agregador( data_inicio = "01/01/2025", turno = 2, cenario = "Lula vs Bolsonaro", modelo = "Viés Relativo sem Pesos", config_agregador = list(stan_chains = 1, stan_warmup = 200), config_prioris = list(tau_priori = 0.01) ) }# Running the default model for a second round scenario if (instantiate::stan_cmdstan_exists()) { result <- rodar_agregador( data_inicio = "01/01/2025", turno = 2, cenario = "Lula vs Bolsonaro" ) # Tuning Stan, changing the model and altering specific priors custom_result <- rodar_agregador( data_inicio = "01/01/2025", turno = 2, cenario = "Lula vs Bolsonaro", modelo = "Viés Relativo sem Pesos", config_agregador = list(stan_chains = 1, stan_warmup = 200), config_prioris = list(tau_priori = 0.01) ) }