Why Conjoint Studies Fail

What is commonly described as “conjoint failure” is rarely caused by respondent fatigue, minor design errors, or estimation technique. Those explanations persist because they are easy to see.

In practice, most failed conjoint studies are failures of decision support. The analysis may run, converge, and produce stable-looking outputs, yet still be incapable of supporting the pricing, portfolio, or launch decisions it is asked to inform.

The failure modes below describe how this happens — often silently, and often despite competent execution.

Failure Mode #0: Allowing the Method to Define the Decision

Recognition cue
During scoping, the decision is quietly narrowed to fit what the conjoint design can support. Certain questions are removed, trade-offs are softened, or outcomes are redefined so the model will “work.”

Root cause
Analytic governance fails upstream. Instead of defining the evidence required for the decision, the organization allows the technique to constrain the decision itself.

Decision consequence
The study can succeed technically while failing to support the original business decision. By the time results are reviewed, the most consequential uncertainties have already been designed out of scope.

Failure Mode #1: Silent Design Corruption by Automation

Recognition cue
The design is described as “random” or “balanced,” yet no one can explain why certain effects behave oddly or why results feel unstable — despite the design being generated, reviewed, and approved.

Root cause
Design-generation tools often apply post-processing steps (balancing, feasibility filters, realism constraints) that alter experimental properties without re-validating assumptions. These changes are silent.

Decision consequence
Key contrasts are no longer identified as intended. Estimation proceeds normally, but the resulting evidence cannot support the decision it was meant to inform.

Failure Mode #2: Structural Confounds in Experimental Inputs

Recognition cue
Key drivers move together in the design, yet results are interpreted as if their effects were independently identified.

Root cause
The experiment never provided independent variation for those inputs. The model cannot separate their effects, even though hierarchical estimation produces stable-looking outputs.

Decision consequence
Pricing and trade-off conclusions rest on relationships the data never isolated, producing confidence where ambiguity remains.

Failure Mode #3: Executability Mistaken for Validity

Recognition cue
The model runs, converges, and produces clean utilities. Test runs with simulated responses complete without errors, which is taken as reassurance.

Root cause
Hierarchical Bayesian choice models are designed to tolerate structural defects. They redistribute information through pooling and regularization rather than exposing missing identification.

Decision consequence
Confidence is placed in evidence that was never capable of supporting the decision. “It runs” is mistaken for “it works.”

Failure Mode #4: Power Miscalibration in Hierarchical Choice Designs

Recognition cue
The team can justify respondent count, but cannot explain whether the number of choice tasks was sufficient for the decision — or when task count should change.

Root cause
The analytic function lacks a working mental model of where information comes from in hierarchical choice models. Executability is treated as a proxy for power.

Decision consequence
The model appears stable, but fails when asked to support fine-grained trade-offs or decision stress. The experiment never had the information required.

Failure Mode #5: Substitution of “Importance” for Predictive Evidence

Recognition cue
The study explains what matters most, but cannot reliably forecast what will happen if something changes.

Root cause
Importance measures are post-estimation transforms that are not invariant under price, availability, or configuration changes, yet are treated as decision-ready evidence.

Decision consequence
Explanatory summaries are mistaken for predictive support, leading to decisions the model cannot safely defend.

Why These Failures Persist

These failures persist because they do not look like errors. They often comply with accepted best practices, produce clean diagnostics, stable utilities, and coherent narratives. The problem only emerges when the model is asked to carry real decision weight.

When that happens, organizations often discover that the issue is not execution, but whether the study was ever capable of supporting the decision at all — which is typically when rescue becomes the only remaining option.