List Randomization Analysis Plan for Danyang

Author

Sean Sylvia

Analysis plan for list randomization

Context and goals

Outcome: for each list experiment question, clinicians report a count of scenarios they are uncomfortable with. Higher counts indicate more discomfort (stigma).
Identification: the difference in mean counts between the “sensitive” (includes the stigmatized item) and the “control” list recovers the prevalence of discomfort with the sensitive item.
Goals:
- Estimate baseline stigma per question.
- Improve precision via clinician controls and clustering by facility.
- Assess impact of training via an interaction between list assignment and training.
- Increase power by stacking all three questions and including question fixed effects.
- When the same clinicians are observed at baseline and endline, use ANCOVA (control for baseline outcomes) to further increase power.

Variable and data structure

Preferred long format (one row per clinician × time × question):

y: integer count of uncomfortable scenarios for that list (dependent variable).
L: list indicator; 1 if assigned to sensitive list; 0 if assigned to control list.
q: question identifier (1, 2, 3).
t: time indicator; 0 for baseline, 1 for endline.
T: training indicator; 1 if clinician received training, 0 otherwise.
X: vector of clinician covariates (e.g., age, rank).
facility_id: facility cluster identifier.
clinician_id: clinician identifier.
y_baseline: baseline outcome for the same clinician, question, and list assignment (used when modeling endline only with baseline control).
Optional cross-question controls when analyzing a single question q:
- y_other1, y_other2: the clinician’s counts on the other two questions (at the same time).

If your current data are wide (e.g., y_q1, y_q2, y_q3), the code below includes a step to pivot to long.

Regression specifications

Notation: $i$ clinician, $f$ facility, $q \in \{1,2,3\}$ question, $t \in \{0,1\}$ time.

Baseline stigma per question (no controls)

For a single question $q$ at baseline: \[y_{iq0} = \alpha_q + \beta_q L_{iq0} + \varepsilon_{iq0}, \quad \text{SEs clustered at facility } f(i)\]
Interpretation: $\beta_q$ estimates the prevalence of discomfort for the sensitive item in question $q$.

Baseline stigma with clinician controls

\[y_{iq0} = \alpha_q + \beta_q L_{iq0} + \gamma^\top X_i + \varepsilon_{iq0}\] - $X_i$ could include age, rank, and/or the clinician’s counts on the other two questions at baseline: \[y_{iq0} = \alpha_q + \beta_q L_{iq0} + \rho_1 y_{i q' 0} + \rho_2 y_{i q'' 0} + \varepsilon_{iq0}, \quad q' \neq q'' \neq q\] - Cluster SEs at facility.

Training effect at endline (cross-sectional)

For endline only (t = 1): \[y_{iq1} = \alpha_q + \beta_q L_{iq1} + \delta T_i + \theta \big(L_{iq1}\times T_i\big) + \gamma^\top X_i + \varepsilon_{iq1}\]
Interpretation: $\theta$ is the training effect on the sensitive item for question $q$.

Training effect at endline with baseline control (ANCOVA)

When same clinicians observed at baseline: \[y_{iq1} = \alpha_q + \beta_q L_{iq1} + \delta T_i + \theta \big(L_{iq1}\times T_i\big) + \rho\, y_{iq0} + \gamma^\top X_i + \varepsilon_{iq1}\]
$\rho$ absorbs baseline differences and typically improves precision.

Stacked model across questions (more power)

Stack all questions and include question fixed effects $\mu_q$: \[y_{iqt} = \alpha + \beta L_{iqt} + \mu_q + \varepsilon_{iqt}\]
With training interaction (endline or pooled): \[y_{iqt} = \alpha + \beta L_{iqt} + \delta T_i + \theta \big(L_{iqt}\times T_i\big) + \mu_q + \varepsilon_{iqt}\]
With baseline control at the item level when same clinicians: \[y_{iq1} = \alpha + \beta L_{iq1} + \delta T_i + \theta \big(L_{iq1}\times T_i\big) + \rho\, y_{iq0} + \mu_q + \varepsilon_{iq1}\]
In all cases, cluster SEs at facility. Optionally include additional $X_i$.

Sample R code

The code below uses fixest for clustered standard errors and fixed effects. Replace column names as needed.

Setup and data prep

    # Packages
    library(dplyr)
    library(tidyr)
    library(fixest)   # feols, cluster-robust SE, fixed effects
    library(broom)    # tidy summaries (optional)
    
    # Assume your data frame is `df`
    # Required columns for long format:
    # clinician_id, facility_id, t (0=baseline,1=endline), q (1/2/3),
    # y (count), L (0/1), T (0/1), age, rank
    
    # If your data are wide (y_q1, y_q2, y_q3), pivot to long:
    # Example schema in wide:
    # df_wide: clinician_id, facility_id, t, T, age, rank, L_q1, L_q2, L_q3, y_q1, y_q2, y_q3
    df_long <- df_wide |>
      pivot_longer(
        cols = c(y_q1, y_q2, y_q3, L_q1, L_q2, L_q3),
        names_to = c(".value", "q"),
        names_pattern = "(y|L)_q(\\d)"
      ) |>
      mutate(
        q = as.integer(q),
        t = as.integer(t),
        L = as.integer(L),
        T = as.integer(T)
      )
    
    # If you already have long data, rename consistently:
    # df_long <- df_long |> rename(y = your_count_col, L = your_list_indicator_col, ...)
    
    # Create baseline outcome for ANCOVA models (endline rows get their baseline y merged in)
    df_long <- df_long |>
      group_by(clinician_id, q) |>
      mutate(y_baseline = y[t == 0][1]) |>
      ungroup()
    
    # Optional: when analyzing a single question q, create "other question" controls at the same time
    make_other_controls <- function(d) {
      d |>
        group_by(clinician_id, t) |>
        mutate(
          y_other1 = ifelse(q == 1, NA_real_,
                            ifelse(q == 2, y[q == 1][1], y[q == 1][1])),
          y_other2 = ifelse(q == 3, NA_real_,
                            ifelse(q == 2, y[q == 3][1], y[q == 2][1]))
        ) |>
        ungroup()
    }

1) Baseline stigma per question (no controls)

    # Filter to baseline, single question q0 (e.g., q = 1)
    q0 <- 1
    base_q <- df_long |> filter(t == 0, q == q0)
    
    m1 <- feols(
      y ~ L,
      data = base_q,
      cluster = ~ facility_id
    )
    summary(m1)

Interpretation: coefficient on L is the estimated prevalence for the sensitive item in question q0.

2) Baseline stigma with clinician controls

    # Add age and rank; optionally add other question counts
    # If you created y_other1, y_other2 for the same time, include them
    
    # Example without other-question controls:
    m2 <- feols(
      y ~ L + age + rank,
      data = base_q,
      cluster = ~ facility_id
    )
    summary(m2)
    
    # Example with other-question controls (ensure they refer to same time/baseline)
    base_q_with_others <- make_other_controls(df_long) |> filter(t == 0, q == q0)
    m2b <- feols(
      y ~ L + age + rank + y_other1 + y_other2,
      data = base_q_with_others,
      cluster = ~ facility_id
    )
    summary(m2b)

3) Training effect at endline (cross-sectional)

    end_q <- df_long |> filter(t == 1, q == q0)
    m3 <- feols(
      y ~ L + T + L:T + age + rank,
      data = end_q,
      cluster = ~ facility_id
    )
    summary(m3)
    
    # The coefficient on L:T is the estimated training effect on the sensitive item for question q0.

4) Training effect at endline with baseline control (ANCOVA)

    # Keep endline rows, merge baseline y for the same clinician and question
    end_q_ancova <- df_long |> filter(t == 1, q == q0)
    
    m4 <- feols(
      y ~ L + T + L:T + y_baseline + age + rank,
      data = end_q_ancova,
      cluster = ~ facility_id
    )
    summary(m4)

5) Stacked model across questions with question fixed effects

This increases power by using all three questions and absorbing level differences with fixed effects.

    # Stacked baseline stigma (all q), no controls, with question fixed effects
    base_all <- df_long |> filter(t == 0)
    m5 <- feols(
      y ~ L | q,
      data = base_all,
      cluster = ~ facility_id
    )
    summary(m5)
    
    # Stacked endline training interaction with question FE
    end_all <- df_long |> filter(t == 1)
    m6 <- feols(
      y ~ L + T + L:T + age + rank | q,
      data = end_all,
      cluster = ~ facility_id
    )
    summary(m6)
    
    # Stacked endline with baseline control (ANCOVA) and question FE
    m7 <- feols(
      y ~ L + T + L:T + y_baseline + age + rank | q,
      data = end_all,
      cluster = ~ facility_id
    )
    summary(m7)

Notes: - | q specifies question fixed effects. - cluster = ~ facility_id applies cluster-robust SE at the facility level.

Optional: Base R alternative for cluster-robust SE

    # For a simple model:
    library(lmtest)
    library(sandwich)
    
    fit <- lm(y ~ L + T + L:T + age + rank, data = end_all)
    vc_cl <- vcovCL(fit, cluster = ~ facility_id)   # cluster-robust
    coeftest(fit, vcov = vc_cl)

Practical guidance and reporting

Always cluster SEs at the facility level, as discussed.
Report for each model:
- Point estimate of the key coefficient(s): $\beta$ (stigma prevalence), $\theta$ (training effect).
- Cluster-robust SE, 95% CI, and p-value.
- N, number of clusters, and $R^2$ (within for FE models).
Start with:
- Model 1 and 2 per question at baseline.
- Then Model 3 (endline) and Model 4 (endline with baseline control) per question.
- Finally, the stacked models (Model 5–7) to gain power for training effects.
Controls:
- Use a small set of pre-treatment clinician covariates that plausibly explain variation in y (e.g., age, rank).
- When analyzing a single question, adding the clinician’s counts on the other two questions at the same time can substantially improve precision.
Baseline vs endline:
- If the same clinicians are observed, prefer the ANCOVA specification (Model 4, 7) over pure endline-only models.
Multiple testing:
- If reporting per-question results alongside stacked results, consider framing the stacked model as the primary training effect test to reduce multiplicity concerns.
Robustness:
- Re-estimate without controls to show the design-based estimate (pure difference in means).
- Check sensitivity to including/excluding specific controls.
Data checks:
- Verify randomization balance for L within each question/time.
- Confirm that list assignment L is independent of T (training) by design or check empirically.

--- title: "List Randomization Analysis Plan for Danyang" --- # Analysis plan for list randomization --- ## Context and goals - Outcome: for each list experiment question, clinicians report a count of scenarios they are uncomfortable with. Higher counts indicate more discomfort (stigma). - Identification: the difference in mean counts between the “sensitive” (includes the stigmatized item) and the “control” list recovers the prevalence of discomfort with the sensitive item. - Goals: - Estimate baseline stigma per question. - Improve precision via clinician controls and clustering by facility. - Assess impact of training via an interaction between list assignment and training. - Increase power by stacking all three questions and including question fixed effects. - When the same clinicians are observed at baseline and endline, use ANCOVA (control for baseline outcomes) to further increase power. --- ## Variable and data structure Preferred long format (one row per clinician × time × question): - y: integer count of uncomfortable scenarios for that list (dependent variable). - L: list indicator; 1 if assigned to sensitive list; 0 if assigned to control list. - q: question identifier (1, 2, 3). - t: time indicator; 0 for baseline, 1 for endline. - T: training indicator; 1 if clinician received training, 0 otherwise. - X: vector of clinician covariates (e.g., age, rank). - facility_id: facility cluster identifier. - clinician_id: clinician identifier. - y_baseline: baseline outcome for the same clinician, question, and list assignment (used when modeling endline only with baseline control). - Optional cross-question controls when analyzing a single question q: - y_other1, y_other2: the clinician’s counts on the other two questions (at the same time). If your current data are wide (e.g., y_q1, y_q2, y_q3), the code below includes a step to pivot to long. --- ## Regression specifications Notation: $i$ clinician, $f$ facility, $q \in \{1,2,3\}$ question, $t \in \{0,1\}$ time. 1) Baseline stigma per question (no controls) - For a single question $q$ at baseline: $$y_{iq0} = \alpha_q + \beta_q L_{iq0} + \varepsilon_{iq0}, \quad \text{SEs clustered at facility } f(i)$$ - Interpretation: $\beta_q$ estimates the prevalence of discomfort for the sensitive item in question $q$. 2) Baseline stigma with clinician controls $$y_{iq0} = \alpha_q + \beta_q L_{iq0} + \gamma^\top X_i + \varepsilon_{iq0}$$ - $X_i$ could include age, rank, and/or the clinician's counts on the other two questions at baseline: $$y_{iq0} = \alpha_q + \beta_q L_{iq0} + \rho_1 y_{i q' 0} + \rho_2 y_{i q'' 0} + \varepsilon_{iq0}, \quad q' \neq q'' \neq q$$ - Cluster SEs at facility. 3) Training effect at endline (cross-sectional) - For endline only (t = 1): $$y_{iq1} = \alpha_q + \beta_q L_{iq1} + \delta T_i + \theta \big(L_{iq1}\times T_i\big) + \gamma^\top X_i + \varepsilon_{iq1}$$ - Interpretation: $\theta$ is the training effect on the sensitive item for question $q$. 4) Training effect at endline with baseline control (ANCOVA) - When same clinicians observed at baseline: $$y_{iq1} = \alpha_q + \beta_q L_{iq1} + \delta T_i + \theta \big(L_{iq1}\times T_i\big) + \rho\, y_{iq0} + \gamma^\top X_i + \varepsilon_{iq1}$$ - $\rho$ absorbs baseline differences and typically improves precision. 5) Stacked model across questions (more power) - Stack all questions and include question fixed effects $\mu_q$: $$y_{iqt} = \alpha + \beta L_{iqt} + \mu_q + \varepsilon_{iqt}$$ - With training interaction (endline or pooled): $$y_{iqt} = \alpha + \beta L_{iqt} + \delta T_i + \theta \big(L_{iqt}\times T_i\big) + \mu_q + \varepsilon_{iqt}$$ - With baseline control at the item level when same clinicians: $$y_{iq1} = \alpha + \beta L_{iq1} + \delta T_i + \theta \big(L_{iq1}\times T_i\big) + \rho\, y_{iq0} + \mu_q + \varepsilon_{iq1}$$ - In all cases, cluster SEs at facility. Optionally include additional $X_i$. --- ## Sample R code The code below uses fixest for clustered standard errors and fixed effects. Replace column names as needed. ### Setup and data prep ```r # Packages library(dplyr) library(tidyr) library(fixest) # feols, cluster-robust SE, fixed effects library(broom) # tidy summaries (optional) # Assume your data frame is `df` # Required columns for long format: # clinician_id, facility_id, t (0=baseline,1=endline), q (1/2/3), # y (count), L (0/1), T (0/1), age, rank # If your data are wide (y_q1, y_q2, y_q3), pivot to long: # Example schema in wide: # df_wide: clinician_id, facility_id, t, T, age, rank, L_q1, L_q2, L_q3, y_q1, y_q2, y_q3 df_long <- df_wide |> pivot_longer( cols = c(y_q1, y_q2, y_q3, L_q1, L_q2, L_q3), names_to = c(".value", "q"), names_pattern = "(y|L)_q(\\d)" ) |> mutate( q = as.integer(q), t = as.integer(t), L = as.integer(L), T = as.integer(T) ) # If you already have long data, rename consistently: # df_long <- df_long |> rename(y = your_count_col, L = your_list_indicator_col, ...) # Create baseline outcome for ANCOVA models (endline rows get their baseline y merged in) df_long <- df_long |> group_by(clinician_id, q) |> mutate(y_baseline = y[t == 0][1]) |> ungroup() # Optional: when analyzing a single question q, create "other question" controls at the same time make_other_controls <- function(d) { d |> group_by(clinician_id, t) |> mutate( y_other1 = ifelse(q == 1, NA_real_, ifelse(q == 2, y[q == 1][1], y[q == 1][1])), y_other2 = ifelse(q == 3, NA_real_, ifelse(q == 2, y[q == 3][1], y[q == 2][1])) ) |> ungroup() } ``` ### 1) Baseline stigma per question (no controls) ```r # Filter to baseline, single question q0 (e.g., q = 1) q0 <- 1 base_q <- df_long |> filter(t == 0, q == q0) m1 <- feols( y ~ L, data = base_q, cluster = ~ facility_id ) summary(m1) ``` Interpretation: coefficient on L is the estimated prevalence for the sensitive item in question q0. ### 2) Baseline stigma with clinician controls ```r # Add age and rank; optionally add other question counts # If you created y_other1, y_other2 for the same time, include them # Example without other-question controls: m2 <- feols( y ~ L + age + rank, data = base_q, cluster = ~ facility_id ) summary(m2) # Example with other-question controls (ensure they refer to same time/baseline) base_q_with_others <- make_other_controls(df_long) |> filter(t == 0, q == q0) m2b <- feols( y ~ L + age + rank + y_other1 + y_other2, data = base_q_with_others, cluster = ~ facility_id ) summary(m2b) ``` ### 3) Training effect at endline (cross-sectional) ```r end_q <- df_long |> filter(t == 1, q == q0) m3 <- feols( y ~ L + T + L:T + age + rank, data = end_q, cluster = ~ facility_id ) summary(m3) # The coefficient on L:T is the estimated training effect on the sensitive item for question q0. ``` ### 4) Training effect at endline with baseline control (ANCOVA) ```r # Keep endline rows, merge baseline y for the same clinician and question end_q_ancova <- df_long |> filter(t == 1, q == q0) m4 <- feols( y ~ L + T + L:T + y_baseline + age + rank, data = end_q_ancova, cluster = ~ facility_id ) summary(m4) ``` ### 5) Stacked model across questions with question fixed effects This increases power by using all three questions and absorbing level differences with fixed effects. ```r # Stacked baseline stigma (all q), no controls, with question fixed effects base_all <- df_long |> filter(t == 0) m5 <- feols( y ~ L | q, data = base_all, cluster = ~ facility_id ) summary(m5) # Stacked endline training interaction with question FE end_all <- df_long |> filter(t == 1) m6 <- feols( y ~ L + T + L:T + age + rank | q, data = end_all, cluster = ~ facility_id ) summary(m6) # Stacked endline with baseline control (ANCOVA) and question FE m7 <- feols( y ~ L + T + L:T + y_baseline + age + rank | q, data = end_all, cluster = ~ facility_id ) summary(m7) ``` Notes: - | q specifies question fixed effects. - cluster = ~ facility_id applies cluster-robust SE at the facility level. ### Optional: Base R alternative for cluster-robust SE ```r # For a simple model: library(lmtest) library(sandwich) fit <- lm(y ~ L + T + L:T + age + rank, data = end_all) vc_cl <- vcovCL(fit, cluster = ~ facility_id) # cluster-robust coeftest(fit, vcov = vc_cl) ``` --- ## Practical guidance and reporting - Always cluster SEs at the facility level, as discussed. - Report for each model: - Point estimate of the key coefficient(s): $\beta$ (stigma prevalence), $\theta$ (training effect). - Cluster-robust SE, 95% CI, and p-value. - N, number of clusters, and $R^2$ (within for FE models). - Start with: - Model 1 and 2 per question at baseline. - Then Model 3 (endline) and Model 4 (endline with baseline control) per question. - Finally, the stacked models (Model 5–7) to gain power for training effects. - Controls: - Use a small set of pre-treatment clinician covariates that plausibly explain variation in y (e.g., age, rank). - When analyzing a single question, adding the clinician’s counts on the other two questions at the same time can substantially improve precision. - Baseline vs endline: - If the same clinicians are observed, prefer the ANCOVA specification (Model 4, 7) over pure endline-only models. - Multiple testing: - If reporting per-question results alongside stacked results, consider framing the stacked model as the primary training effect test to reduce multiplicity concerns. - Robustness: - Re-estimate without controls to show the design-based estimate (pure difference in means). - Check sensitivity to including/excluding specific controls. - Data checks: - Verify randomization balance for L within each question/time. - Confirm that list assignment L is independent of T (training) by design or check empirically.