DATA, RESEARCH, AND SOFTWARE
ECONOMETRICS, STATISTICS, AND DATA COLLECTION
PREDICTION STABILITY
🌱 (Don't) Forget About It: Forgetting Penalized Supervised Learning
🗃 Replication Materials
When supervised models are retrained on the same task, updates can introduce model regression: examples that a previous model classified correctly become misclassified. In many deployments these reversals are costly even if aggregate accuracy improves, because they change which cases are hard, disrupt downstream workflows, and can force organizations to reconfigure human effort. We capture this with a simple economic view: a new model is attractive only when its reduction in expected downstream operating costs (e.g., error handling, staffing, oversight) outweighs the cost of changing behavior on previously well-served examples, which we illustrate with a routing-and-staffing example but which applies more broadly.
🌱 Stable CART: Lower Bootstrap Prediction Variance CART
Standard CART decision trees are unstable—small changes in training data can produce substantially different tree structures and predictions. Stable CART addresses this by trading a small amount of accuracy for lower cross-bootstrap prediction variance through techniques like honest estimation (using separate data subsets for learning tree structure vs. estimating leaf values), lookahead search (considering multiple future splits before committing rather than making greedy single-step decisions), and bootstrap-aware split selection (penalizing or filtering out splits that are unstable across resampled datasets).
Bagged FSR: Rehabilitating Forward Stepwise Regression
Forward Stepwise Regression (FSR) is hardly used today. That is mostly because regularization is a better way to think about variable selection. However, part of the reason for its disuse is that FSR is a greedy optimization strategy with unstable paths. Jigger the data a little, and the search paths, variables in the final set, and the performance of the final model can all change dramatically. The same issues, however, affect another greedy optimization strategy—CART. The insight that rehabilitated CART was bagging—building multiple trees using random subspaces (sometimes on randomly sampled rows) and averaging the results. What works for CART should principally also work for FSR. If you are using FSR for prediction, you can build multiple FSR models using random subspaces and random samples of rows and then average the results. If you are using it for variable selection, you can pick variables with the highest batting average (n_selected/n_tried). (LASSO will beat it on speed, but there is little reason to expect that it will beat it on results.)
🌱 Bootstrap Consistency Regularization for Stable Neural Network Predictions
🗃 Replication Materials
Neural networks trained on the same data with different random seeds or slightly perturbed training sets can produce substantially different predictions for individual examples, even when aggregate accuracy is similar. We add a consistency penalty to the training loss that penalizes prediction disagreement across bootstrap resamples of the training data, encouraging the network to find solutions whose individual-level predictions are stable under resampling.
🌱 Selecting for Stability: Choose the Model Closest to the Ensemble
Ensembles reduce prediction variance but are expensive to deploy. Rather than averaging all models at inference time, select the single model whose predictions are closest to the ensemble average. This gives you most of the ensemble's stability benefit at the cost of a single model, with a principled selection criterion that avoids arbitrary choices among models with similar aggregate performance.
CALIBRATION & WEIGHTING
🌱 Calibration Where It Counts: Cost- and Data-Informed Isotonic Regression
📦 calibre: Advanced Calibration Models
Thresholded decisions turn probability errors into utility losses. Pooling-based monotone calibrators such as isotonic regression are flexible and reliable but can collapse distinct scores into the same probability on wide plateaus. We adopt a decisioneconomic view: choose and tune a calibrator that reduces deployment cost by improving reliability where it affects decisions and by preserving discrimination where it matters.
📦 fairlex: leximin calibration
Standard calibration minimizes average calibration error, which can hide large errors for minority groups. Leximin calibration instead minimizes the worst-off group's calibration error first, then the second-worst, and so on—applying the Rawlsian leximin criterion to the distribution of calibration quality across groups.
🌱 Rank-Preserving Calibration for Multiclass Classification
📦 Rank Preserving Calibration of Multiclass Probabilities
Multiclass calibration methods can reorder predicted class probabilities, so the class a calibrated model ranks first may differ from the class the original model ranked first. This is problematic when downstream decisions depend on the ranking, not just the probabilities. We develop calibration methods that guarantee the within-example class ranking is preserved while still improving probability reliability.
🌱 First-Order Entropy Balancing via Dual Gradient Descent
🗃 Replication Materials
Entropy balancing finds survey weights that satisfy exact moment constraints while staying close to uniform weights in KL divergence. Standard implementations solve this via Newton's method on the dual, which requires computing and inverting a Hessian at each step. We show that first-order methods—multiplicative weight updates and dual gradient descent—converge reliably, scale to high-dimensional constraint sets, and avoid the numerical instabilities that plague second-order solvers when constraints are near-collinear.
🌱 Streaming Calibration With MWU and SGD
📦 Python Package
Survey raking and probability calibration are traditionally batch procedures: collect all the data, then solve for weights or a calibration map. When data arrive in a stream—or the target distribution drifts—batch methods require repeated re-computation. We develop online versions of raking and calibration using multiplicative weight updates (MWU) and stochastic gradient descent (SGD), processing one observation at a time with convergence guarantees.
From Scores to Signs: Pairwise Win-Rate Estimation with Calibrated LLM Judges
LLM-as-judge systems produce numerical scores, but downstream decisions often require pairwise comparisons: which response is better? Converting scores to win rates requires calibration—the mapping from score differences to win probabilities. We develop calibrated estimators for pairwise win rates from cardinal LLM judge scores, accounting for judge miscalibration and non-transitivity in preferences.
RECORD LINKAGE
🌱 Inference With Fuzzy-Joined Data
Data products built on fuzzy joins ship a single canonical linkage, hiding the many-to-many candidate-match graph from which it was constructed. The two obvious things to do with that graph—expand it into a regression dataset or average covariates across candidates—both produce biased estimates. The expanded join attenuates; the collapsed estimator has a nonclassical errors-in-variables structure whose sign is indeterminate without ground truth. Multiple imputation over candidate assignments avoids both pathologies. Rubin's rules propagate matching uncertainty, and the between-imputation share of variance tells you whether matching uncertainty matters.
📦 setjoin: Record Linkage That Preserves Group Structure
Standard record linkage matches individuals optimally but ignores group structure. When household members should stay together, Hungarian matching might send them to different target households because it maximizes individual scores. setjoin uses two-level assignment—first assigning groups to groups, then matching within—achieving 4x better group coherence while also improving person-level accuracy in simulations with realistic ambiguity.
📦 tether: High-Precision Record Linkage
A 7-step record linkage pipeline—preprocess, deduplicate, block, score, filter, decide, inspect—with multi-pass support for progressively relaxed thresholds. Implements Hungarian, greedy, and row-sequential decision rules over pairwise string similarity scores.
📦 BloomJoin: Bloom Filter Based Joins
An R package implementing Bloom filter-based joins for improved performance with large datasets. Bloom filters provide a probabilistic test for set membership that can dramatically reduce the number of expensive exact comparisons needed during a join.
DATA COLLECTION
The Micro-Task Market for "Lemons": Collecting Data on Amazon's Mechanical Turk
With Doug Ahler and Carrie Roush. Political Science Research and Methods, 2021.
🗃 Replication Materials
While Amazon's Mechanical Turk (MTurk) has reduced the cost of collecting original
data, in 2018, researchers noted the potential existence of a large number of bad actors
on the platform. To evaluate data quality on MTurk, we fielded three surveys between
2018 and 2020. While find no evidence of a "bot epidemic," significant portions of the
data—between 25%-35%—are of dubious quality. While the number of IP addresses
that completed the survey multiple times or circumvented location requirements fell
almost 50% over time, suspicious IP addresses are more prevalent on MTurk than on
other platforms. Furthermore, many respondents appear to respond humorously or
insincerely, and this behavior increased over 200% from 2018-2020. Importantly, these
low-quality responses attenuate observed treatment effects by magnitudes ranging from
approximately 10-30%.
Optimal Data Collection When Strata and Strata Variances Are Known
With Ken Cor.
When the population is divided into known strata with known variances, the optimal allocation of a fixed sample budget across strata depends on stratum size, variance, and sampling cost. We derive the optimal allocation and show how much efficiency is lost by common rules of thumb like proportional allocation.
📦 Geo-sampling: Sampling Randomly From the Streets
With Suriyan Laohaprapanon.
Estimating quantities like average potholes per kilometer or pedestrian density requires randomly sampling street locations within a region. Geo-sampling addresses this by downloading street network data from OpenStreetMap for a specified administrative region, splitting each street into 0.5km segments (recording the lat/long of segment endpoints), building a database of all segments, and then drawing a random sample that can be exported as a CSV or visualized on a map for field data collection.
📦 Allocator: Optimal Itineraries For Spatially Distributed Tasks With Suriyan Laohaprapanon.
Given a set of spatially distributed tasks (e.g., field survey locations, audit sites), Allocator computes optimal itineraries that minimize total travel time or distance, assigning tasks to enumerators and sequencing visits within each assignment.
📦 reporoulette: Randomly Sample GitHub Repositories
Randomly sample GitHub repositories, optionally filtered by language, creation date, or star count. Useful for constructing representative samples of open-source projects for empirical software engineering research.
🌱 Unbiased Regression with Costly Item Labels
📦 fewlab: fewest items to label for unbiased OLS on shares
When running OLS on per-row trait shares (e.g., fraction of items in a category), labeling every item is expensive. Random labeling wastes budget on items that barely affect the regression. fewlab identifies the items with the highest statistical leverage—those that most influence the coefficient estimates—and prioritizes them for labeling, achieving unbiased OLS with a fraction of the labeling cost.
CAUSAL INFERENCE
🌱 Inferring Treatment Compliance from Delivery-Window Data
In randomized experiments with imperfect compliance, the LATE requires observing treatment receipt, and the Wald estimator requires monotonicity. When receipt is unobserved but the experiment has a pre-treatment time series and a distinct delivery window, a structural break test applied to the delivery window can classify treated units into compliers, never-takers, and defiers. This yields an inferred compliance rate with closed-form bias correction, an empirical test of monotonicity via defier detection, and a characterization of the complier subpopulation that can be projected onto the control group.
🌱 Two Regressions and a Bootstrap: Regression Calibration for ML-Generated Covariates and the Nonlinear Boundary
When ML predictions are used as regressors in downstream models, prediction error is non-classical and a growing literature proposes purpose-built corrections (GMM, prediction-powered inference, IV, joint MLE). For linear downstream models, regression calibration—replacing the ML prediction with E[X | X̂, Z] estimated on a calibration sample—already eliminates the non-classical error structure under an exogeneity condition these methods also rely on. Two OLS regressions and a two-sample bootstrap give you consistent estimates with valid confidence intervals. For nonlinear downstream models (logistic, Poisson, any GLM), Jensen's inequality breaks the argument and heavier methods are genuinely needed. The linear/nonlinear boundary is the main result.
🌱 Partial Credit: Diagnosing Proxy Covariates with Validation Swaps
When a regression uses a proxy instead of the true covariate, how much does it matter—and where? The validation-swap framework answers both questions using an internal validation sample where both the true value and the proxy are observed. Swap validated truths into the proxy design matrix row by row and watch the coefficient move. The swap path traces the coefficient as a function of the fraction of validated rows swapped in: flat means the proxy is fine, steep means it isn't. A Shapley-value decomposition (SIM) identifies which rows drive the distortion, and a portable risk score defines a proxy-safe domain on the full sample.
🌱 Smooth Operator: Optimal Filtering of Event Study Estimates
Event study designs estimate period-specific treatment effects with known standard errors from two-way fixed effects regressions. We treat the coefficient sequence as observations from a local linear trend state-space model and apply the Rauch–Tung–Striebel (Kalman) smoother, using the known heteroskedastic regression standard errors as observation noise. The smoother adapts to local precision—trusting the trend model when a period's estimate is noisy and trusting the data when it is tight—reducing level MSE by 80% and derivative MSE by 98% versus raw estimates, while also providing a derivative-based parallel trends test with correct size.
🌱 Causal Debiasing for Robust Machine Learning
Spurious associations in training data arise because models learn correlations rather than causal relations. We propose a composite loss that incorporates causal and behavioral priors: an invariance penalty for label-preserving perturbations (e.g., swapping gendered pronouns should not change predictions), a directional penalty for perturbations that should change the label in a known direction, and a falsification penalty inspired by epidemiological negative controls that discourages reliance on features with no causal relationship to the outcome. The framework unifies ideas from CheckList-style behavioral testing, causal inference, and adversarial robustness into a single training objective.
OTHER
🌱 One Concept at a Time: Subspace-Constrained Causal Inference for High-Dimensional Treatments
Social scientists increasingly wish to reason causally about high-dimensional treatments (texts, images, prompts) while isolating the effect of a single latent concept. We treat a pretrained model as a measurement device and use minimal edit pairs to estimate a concept-tangent subspace in activation space, with diagnostics that can explicitly reject the existence of a clean subspace for a given encoder and concept. When diagnostics are favorable, activation-steering maps constrained to the subspace generate approximate minimal edits, and concept coordinates serve as treatments in a double/debiased ML estimator. The resulting estimand is a local, representation-dependent linear effect—not a universal causal effect of the underlying human concept.
📦 incline: Estimate Trend at a Particular Point in Time in a Noisy Time Series
Estimating the trend (derivative) at a specific point in a noisy time series is difficult because naive approaches like computing differences between consecutive observations amplify noise rather than reveal the underlying signal. Incline addresses this by first smoothing the time series using either Savitzky-Golay filters (local polynomial fitting) or smoothing splines, then estimating the first or second derivative of the smoothed function at chosen points in time. The difference between naive and smoothed estimates can be substantial: in the provided example, the correlation between them is -0.47, making the choice of method consequential for applications like detecting sudden cost increases or identifying rapidly changing patient health trajectories.
📦 Analytic‑Hessian Bandwidth Selection
Bandwidth selection for Nadaraya-Watson kernel regression and kernel density estimation typically relies on cross-validation, which is computationally expensive. This package implements an analytic bandwidth selector based on the Hessian of the cross-validation objective, giving a closed-form approximation that avoids the grid search.
📦 A Lightweight ALS Solver for Iterative GLS
Generalized least squares requires estimating and inverting the error covariance matrix, which is expensive when the matrix is large or unstructured. This package uses alternating least squares with a low-rank factor-analytic decomposition of the covariance, making iterative GLS feasible for problems where the full covariance is too large to invert directly.
🗃 Optimal Classification Cutoffs for F1-score, etc.
Classifiers produce continuous scores; deployment requires a threshold. The optimal threshold depends on the metric you care about—F1, balanced accuracy, or a custom cost function—and the score distribution. This script computes the exact threshold that maximizes a given stepwise metric, avoiding the grid-search approximation common in practice.
📦 pyppur: Projection Pursuit Dimension Reduction With Reconstruction Loss
pyppur is a Python package that implements projection pursuit methods for dimensionality reduction. Unlike traditional PP objectives geared toward finding 'interesting' projections (non-gaussian), pyppur focuses on finding non-linear projections by minimizing either reconstruction loss or distance distortion.
ONLINE SAFETY
Exposed: Shedding Blacklight On Online Privacy
With Lucas Shen
🗃 Replication Materials
To what extent are users surveilled on the web, by what technologies, and by whom? We answer these questions by combining passively observed, anonymized browsing data of a large, representative sample of Americans with domain-level data on tracking from Blacklight. We find that nearly all users (>99%) encounter at least one ad tracker or third-party cookie over the observation window. More invasive techniques like session recording, keylogging, and canvas fingerprinting are less widespread, but over half of the users visited a site employing at least one of these within the first 48 hours of the start of tracking. Linking trackers to their parent organizations reveals that a single organization, usually Google, can track over 50% of web activity of more than half the users. Demographic differences in exposure are modest and often attenuate when we account for browsing volume. However, disparities by age and race remain, suggesting that what users browse, not just how much, shapes their surveillance risk.
Pwned: How Often Are Americans' Online Accounts Breached?
With Ken Cor. ACM Web Science Conference, 2019
🗃 Replication Materials
RELATED: Bob Rudis Analyzes Exposure by Breach; I Have Been Pwned: Evidence from the Florida Voter Registration Data
News about massive data breaches is increasingly common. But what proportion of Americans are exposed in these breaches is still unknown. We combine data from a large, representative sample of American adults (n = 5,000), recruited by YouGov, with data from Have I Been Pwned to estimate the lower bound of the number of times Americans’ private information has been exposed. We find
that at least 82.84% of Americans have had their private information, such as account credentials, Social Security Number, etc., exposed. On average, Americans’ private information has been exposed in at least three breaches. The better educated, the middle-aged, women, and Whites are more likely to have had their accounts breached than the complementary groups.
📦 Piedomains: Predict the Kind of Content Hosted by a Domain
With Rajashekar Chintalapati.
RELATED: Domain Knowledge: Predicting the Kind of Content Hosted by a Domain.
With Suriyan Laohaprapanon. Complex, Intelligent and Software Intensive Systems (CISIS), 2020.
The package infers the kind of content hosted by a domain using the domain name, the textual content, and the screenshot of the homepage. We use domain category labels from Shallalist and build our own training dataset by scraping and taking screenshots of the homepage.
Pass-Fail: Using a Password Generator to Improve Password Strength
With Rajashekar Chintalapati
How Often is Politicians' Data Breached? Evidence from HIBP
With Lucas Shen.
🗃 Replication Materials
Bad Domains: Exposure to Malicious Content Online
With Lucas Shen.
🗃 Replication Materials
📦 Know Your IP
With Suriyan Laohaprapanon
📦 virustotal: R Client for the VirusTotal Public API 2.0
Social Proof is in the Pudding: The (Non)-Impact of Social Proof on Software Downloads With Lucas Shen.
🗃 Replication Materials
Open-source software is widely used in commercial applications. Pair that with the
fact that when choosing open-source software for a new problem, developers often use
social proof as a cue. These two facts raise concern that bad actors can game social
proof metrics to induce the use of malign software. We study the question using two
field experiments. On the largest developer platform, GitHub, we buy ‘stars’ for a
random set of GitHub repositories of new Python packages and estimate their impact
on package downloads. We find no discernible impact. In another field experiment,
we manipulate the number of human downloads for Python packages. Again, we find
little effect.
TOOLS
📦 rmcp: R MCP Community Server
📦 StatQA: Extract Multimodal Stats Q/A from Tables With Provenance
StatQA is a modern Python framework for automatically extracting structured facts, statistical insights, and multimodal Q/A pairs from tabular datasets. It converts raw columns and values into clear, human-readable statements paired with rich visualizations, enabling rapid knowledge discovery, CLIP-style multimodal RAG corpus construction, and LLM training.
📦 Lost Years: Expected Number of Years Lost
With Suriyan Laohaprapanon.
Mortality rate is puzzling to mortals. A better number is the expected number of years lost. (A yet better number would be quality-adjusted years lost.) To make it easier to calculate the expected years lost, lost_years provides a convenient way to join to the SSA actuarial data, HLD data, and WHO life table data.
📦 repaper: convert photo of a form to a web based form or an editable pdf form
With Bhanu Teja.
📦 indicate: transliterate indic languages to english
With Rajashekar Chintalapati.
LayoutLens: AI-Assisted UI Testing
📱 Adjacent — Related Repositories Recommender
📱 Advertiser: Promote Your GitHub Repositories on BlueSky
📦 🍠 tuber: Access YouTube API via R
META SCIENCE
A Benchmark For Benchmarks
Review of "Noise: A Flaw in Human Judgment"
With Andrew Gelman. Chance. 2024.
Significant Error: Citations to Research With Publicized Statistical Errors With Ken Cor.
🗃 Replication Materials
Propagation of Error: Approving Citations to Problematic Research
With Ken Cor.
🗃 Replication Materials
📱 Get Notified When Cited Article is Retracted
Highlight Citations to Retracted Articles
Softverse: Auto-compute Citations to Software From Replication Files
🌐 softwarecite.com
user: Auto-compute Citations to Software From GitHub
RELATED: 📦 Python metrics
AutoSum: Summarize Publications Automatically and Discover Miscitations
By the Numbers: Toward More Precise Numerical Summaries of Results
With Andrew Guess. The Political Methodologist. 24(1): 2016
The Review: Production and Consumption of APSR Articles
superdf: Save Metadata with the Data in R and Python DataFrames
Not to Code: Evidence From Static Code Analysis of Replication Scripts
NAMES
Predicting Race and Ethnicity From Sequence of Characters in a Name
With Rajashekar Chintalapati and Suriyan Laohaprapanon. arXiv.org
RELATED: 📦 Python Package for implementing the method.
PRESS: InfoQ | AnacondaCON presentation (Video)
Sound Names: Classify Names Based on Sequence of Sounds
Graphic Names: Classify Names Using Google Image Search and Clarifai
📦 Naampy: Infer Sociodemographic Characteristics from Indian Names
With Rajashekar Chintalapati and Suriyan Laohaprapanon.
📦 Pranaam: Predict Religion From Name
With Rajashekar Chintalapati.
📦 naamkaran: a generative model for names
With Rajashekar Chintalapati.
📦 parsernaam: ML-assisted name parser
With Rajashekar Chintalapati.
📦 instate: predict spoken language from last name
With Rajashekar Chintalapati and Atul Dhingra.
RELATED: Instate: Predict the State of Residence from Last Name.
With Atul Dhingra.
DECISION MAKING
GROUP AFFECT
Inter-group Prejudice
Affect, Not Ideology: A Social Identity Perspective on Polarization
With Shanto Iyengar and Yphtach Lelkes. Public Opinion Quarterly. 76(3), 405–431, 2012.
🗃 Replication Materials
RELATED: Sort of Sorted But Definitely Cold, The Order of Feelings, Affectively Polarized?, Party Time
PRESS: The New York Times, The Washington Post, Mother Jones, Vox, etc.
The Parties in our Heads: Misperceptions About Party Composition and Their Consequences
With Doug Ahler. The Journal of Politics. 80(3), 964–981, 2018.
🗃 Replication Materials
PRESS: FiveThirtyEight, Vox, The Washington Post, The Washington Post (2), Christian Science Monitor, The Hill, PBS (Twin Cities)
RELATED: The Partisans in our Heads | Data and Scripts
Typecast: A Routine Mental Shortcut Causes Party Stereotyping
With Doug Ahler. Political Behavior. 2022.
🗃 Replication Materials | Appendix
PRESS: Heterodox Academy
All in the Eye of the Beholder: Partisan Affect and Ideological Accountability
With Shanto Iyengar.
In The Feeling, Thinking Citizen: Essays in Honor of Milton Lodge. 2018.
🗃 Replication Materials
RELATED: Still Close: Perceived Ideological Distance to Own and Main Opposing Party, 2012 Blog Post
PRESS: The New York Times
Coming to Dislike Your Opponents: The Polarizing Impact of Political Campaigns
With Shanto Iyengar.
PRESS: New York Times
Partisan Vision? Partisan Bias in Simple Visual Evaluations With Carrie Roush and Alex Theodoridis.
🗃 Replication Materials
The Hostile Audience: The Effect of Access to Broadband Internet on Partisan Affect
With Yphtach Lelkes and Shanto Iyengar. American Journal of Political Science. 61(1): 5–20, 2017.
🗃 Replication Materials
PRESS: The Guardian
Holier Than Thou? No Large Partisan Gap in Consumption of Pornography Online
With Lucas Shen. Journal of Quantitative Description. 2024.
🗃 Replication Materials
Hidden Racial Prejudice? Impact of Social Desirability Pressures on Endorsement of Racial Stereotypes
With Jon Krosnick, Tobias Stark, and Floor van Maaren. Sociological Methods and Research.51(2), 605–631, 2019.
🗃 Replication and Supplementary Materials
INFORMATION ENVIRONMENT
What and Who is on Network Television?
Working Women on Indian TV
With Asha Sood
The Face of Crime in Prime Time: Evidence from Law and Order
With Daniel Trielli.
🗃 Replication Materials
PRESS: The Washington Post
Extreme Recall: Which Politicians Come to Mind?
With Daniel Weitzel. Journal of Elections, Public Opinion and Parties. 2024.
🗃 Replication Materials
RELATED: Extreme Recall
DELIBERATION
What Would Dahl Say? An Appraisal of the Democratic Credentials of the Deliberative Polls and Other Mini-publics With Ian O'Flynn. Deliberative Mini-Publics. 41–58, 2014. ECPR Press.
How Can You Think That?: Deliberation and the Learning of Opposing Arguments
🗃 Replication Materials
Deliberative Distortions? Homogenization, Polarization, and Domination in Small Group Deliberations
With Robert Luskin, Kyu Hahn, and James Fishkin. The British Journal of Political Science. 52(3), 1205–1225, 2022.
🗃 Replication Materials
What Future for Kirkuk? Evidence from a deliberative intervention
With Ian O'Flynn, Jalal Mistaffa, and Nahwi Saeed. Democratization. 26(7), 1299–1317, 2019.
🗃 Replication Materials | Supporting Information
OTHER
Problem Solving
Is an Uncertain Prospect Less Preferred Than Its Worst Possible Outcome? New Evidence on the Uncertainty Effect
With Doug Ahler.
🗃 Replication Materials
Mixed Signals: Movie Quality Assessments Across Platforms
Americans' Attitudes Toward The Affordable Care Act: Would Better Public Understanding Increase or Decrease Favorability?
With Wendy Gross, Tobias Stark, Jon Krosnick, Josh Pasek, Trevor Thompson, Jennifer Agiesta, and Dennis Junius.
PRESS: Forbes, Pacific Standard, The Dish, among other outlets.
Americans' Attitudes toward the Affordable Care Act: What Role Do Beliefs Play?
With Gabriel Miao Li, Josh Pasek, Jon Krosnick, Tobias H. Stark, Jennifer Agiesta, Trevor Tompson, and Wendy Gross. Annals of the American Academy of Political and Social Science. 2022.
Revisiting a Natural Experiment: Do Legislators With Daughters Vote More Liberally on Women's Issues?
With Don Green, Oliver Hyman-Metzger, and Michelle Zee. Journal of Political Economy Microeconomics. 2023.
🗃 Replication Materials | Supporting Information
PRESS: Phys.org
MISSING WOMEN
Son Bias in the US: Evidence from Business Names
With Walter Guillioli
🗃 Replication Materials
Which Women Are Missing? Adult Sex Ratio By Last Name
With Suriyan Laohaprapanon
Missing Women on the Streets
Epic Children: Sex Ratio of Children of Key Characters in Epics
Missing Daughters of Indian Politicians
NEWS
Not News: Provision of Apolitical News in the British News Media With Suriyan Laohaprapanon.
🗃 Replication Materials
Strength in Numbers: Multiple Measures of Media Ideology With Philip Habel.
🗃 Replication Materials
Measuring Agendas and Positions on Agendas With Andrew Guess.
🗃 Replication Materials
📦 Notnews: Predict the Type of News Based on Story Text and URL
With Suriyan Laohaprapanon.
Unreadable News: How Readable is American News?
With Lucas Shen.
Follow Your Ideology: A Measure of Ideological Location of Media Sources
With Pablo Barberá.
The Supply of Media Slant Across Outlets and Demand for Slant Within Outlets: Evidence from US Presidential Campaign News
With Marcel Garz, Daniel Stone, and Justin Wallace. European Journal of Political Economy.
🗃 Replication Materials
Don't Expose Yourself: Discretionary Exposure to Political Information
With Yphtach Lelkes. Oxford Research Encyclopedia of Politics. 2018.
🗃 Replication Materials
RELATED: Categorizing the Content of Domains, Measuring Selective Exposure, The Fairest of All
The Good NYT: Provision of Apolitical News in the New York Times
Hard News: The Softening of Network Television News
With Daniel Weitzel.
🗃 Replication Materials
Partisan Imbalance in Politifact?
💾 Top News! URLs from News Feeds of Major National News Sites (2022-)
With Derek Willis
💾 CNN Transcripts 2000--2025
MEASURING LEARNING, KNOWLEDGE, AND MISINFORMATION
You Cannot be Serious: The Impact of Accuracy Incentives on Partisan Bias
With Markus Prior and Kabir Khanna. Quarterly Journal of Political Science. 10(4), 489–518, 2015.
🗃 Online Appendix; Replication Materials
PRESS: Washington Monthly, Pacific Standard, The New York Times
RELATED: Partisan Gaps in Retrospection are Highly Variable; Blog Post
Motivated Responding in Studies of Factual Learning
With Kabir Khanna. Political Behavior. 40(1): 79–101, 2018.
🗃 Replication Materials
RELATED: Blog Summarizing the Paper, The Innumerate American
A Gap in Our Understanding? Reconsidering the Evidence for Partisan Knowledge Gaps
With Carrie Roush. Quarterly Journal of Political Science. 18(1), 2023.
🗃 Replication Materials
RELATED: An Unclear Gap: How Vague Response Options Produce Partisan Knowledge Gaps
PRESS: Not Another Politics Podcast (U. Chicago)
The Waters of Casablanca: On Political Misinformation
With Robert Luskin.
Misinformation About Misinformation: Of Headlines and Survey Design
With Robert Luskin, Yul Min Park, and Joshua Blank.
🗃 Replication Materials
Misinformed About the Affordable Care Act? Leveraging Certainty to Assess the Prevalence of Misinformation
With Josh Pasek and Jon Krosnick. Journal of Communication. 65(4): 660–673, 2015
🗃 Supporting Information | Replication Materials
Guessing and Forgetting: A Latent Class Model for Measuring Learning
With Ken Cor. Political Analysis. 24(2): 226–242, 2016.
🗃 Replication Materials
REVIEW: '... a real contribution to the literature.' — Ed Haertel
RELATED: 📦 R Package for implementing the method.
Measuring Learning in Informative Processes
With Robert Luskin and Ariel Helfer.
A Measurement Gap? Effect of the Survey Instrument and Scoring on the Partisan Knowledge Gap
With Lucas Shen and Daniel Weitzel. Public Opinion Quarterly. 2025.
🗃 Replication Materials
Research suggests that partisan gaps in political knowledge with partisan implications
are wide and widespread. Using a series of experiments, we investigate the extent to
which partisan gaps in commercial surveys are a result of differences in beliefs than
motivated guessing. Knowledge items on commercial surveys often have features that
encourage guessing. We find that removing such features yields scales with greater reli-
ability and higher criterion validity. More substantively, partisan gaps on scales without
these “inflationary” features are roughly 40% smaller. Thus, contrary to Prior, Sood
and Khanna (2015), who find that the upward bias is explained by the knowledgeable
deliberately marking the wrong answer (partisan cheerleading), our data suggest, in
line with Bullock et al. (2015) and Graham and Yair (2023), that partisan gaps on
commercial surveys are strongly upwardly biased by motivated guessing by the ignorant.
Relatedly, we also find that partisans know less than what toplines of commercial
polls suggest.
An Unclear Gap: How Vague Response Options Produce Partisan Knowledge Gaps With Carrie Roush.
🗃 Replication Materials
Roush and Sood (2023) use a dataset of 162,083 responses to 187 items on 47 surveys
to find that partisan gaps are smaller and less frequent than commonly understood. The
average is a mere six and a half points and gaps’ “signs” run counter to expectations roughly
30% of the time. However, one exception is the size of gaps on retrospection items on
the ANES, which are considerably bigger. These retrospection items use vague response
options, e.g., ‘About the same.’ Vague response options can inflate partisan gaps by offering
partisans the opportunity to interpret the same data differently. We test this assumption
with a novel survey experiment. We present partisans data indicating a small improvement
in economic indicators and manipulate the partisan tint of the change by manipulating who
is responsible for the change. We find that significantly fewer partisans pick the option that
’[things] got better’ when presented with an out-partisan cue than a co-partisan cue. Our
findings suggest that vague options can induce knowledge gaps even when partisans have
the same information.
Measuring Perceptions of Numerical Strength of Salient and Stereotypical Groups
With Doug Ahler. Misinformation and Mass Audiences. 2018. University of Texas Press.
🗃 Appendix
PROVISION OF PUBLIC GOODS
StreetSense: Learning from Google Street View
With Suriyan Laohaprapanon and Kimberly Ortleb.
arXiv.org
🗃 Replication Materials
How good are the public services and the public infrastructure? Does their quality
vary by income? These are vital questions—they shed light on how well the government
is doing its job, the consequences of disparities in local funding, etc. But there is little
good data on many of these questions. We fill this gap by describing a scalable method
of getting data on one crucial piece of public infrastructure: roads. We assess the quality
of roads and sidewalks by exploiting data from Google Street View. We randomly sample
locations on major roads, query Google Street View images for those locations and code
the images using Amazon’s Mechanical Turk. We apply this method to assess the quality
of roads in Bangkok, Jakarta, Lagos, and Wayne County, Michigan. Jakarta’s roads have
nearly four times the potholes than roads of any other city. Surprisingly, the proportion of
road segments with potholes in Bangkok, Lagos, and Wayne is about the same, between
.06 and .07. Using the data, we also estimate the relation between the condition of the
roads and local income in Wayne, MI. We find that roads in more affluent census tracts
have somewhat fewer potholes.
AutoSense: Automated Street Condition Assessment
Get in Line: Waiting Times at the DMV
With Noah Finberg.
CRICKET
Elo Ratings of International Cricket Teams By Format
With Derek Willis.
PRESS: The Hindu
WAR Ratings for Cricketers
With Derek Willis.
Fairly Random: The Effect of Winning the Toss on Winning the Match
With Apoorva Lal, Derek Willis, and Avidit Acharya. Journal of Sports Analytics. 2023.
🗃 Replication Materials
RELATED: Fairly Random: Impact of Winning the Toss on the Probability of Winning With Derek Willis. | arXiv.org
PRESS: ESPN: How much does the toss really matter?
RELATED: ESPN: Why replacing the toss with an auction is the fair thing to do
OTHER
Scaling ML Products at Startups: A Practioner's Guide
With Atul Dhingra.
How do you scale a machine learning product at a startup? In particular, how do you
serve a greater volume, velocity, and variety of queries cost-effectively? We break down
costs into variable costs—the cost of serving the model and keeping it performant—and
fixed costs—the cost of developing and training new models. We propose a framework
for conceptualizing these costs, breaking them into finer categories, and limn ways to reduce costs.
Lastly, since in our experience, the most expensive fixed cost of a machine learning system is the
cost of identifying the root causes of failures and driving continuous improvement, we present a
way to conceptualize the issues and share our methodology for the same.
The Effect of Gender Quotas on Some Qualities of Elites
The Limits of Electoral Gender Quotas in Rural Local Bodies
With Varun K. R.
The Older Half: Spousal Age Gap in India
With Suriyan Laohaprapanon
Unlanded: Distribution of Land in Bihar
With Lucas Shen
Replication Files
|