A Systematic Framework for Research Design in Quantitative Social Science (Brief Draft) - From Phenomenon to Operationalization

Posted on 2026-01-09 In Research Methods , Social Science Views: Disqus: Word count in article: 4.3k Reading time ≈ 16 mins.

Introduction: Ontological and Epistemological Foundations of Research Questions

The effectiveness of quantitative social science research depends on the precision and testability of research questions. However, the most common methodological deficiency in empirical research stems not from technical errors in statistical inference, but from conceptual ambiguity and operational unmeasurability at the question level itself.

A research question that appears clear at the natural language level often reveals numerous implicit ontological commitments, untested causal assumptions, and unclear measurement models when entering the operationalization stage of empirical research.

This paper aims to systematically articulate the front-end epistemic work in research design: how to transform a phenomenological observation into an operable, measurable, and testable scientific question. This process includes four core components: question discovery, conceptual clarification, theoretical positioning, and operationalization modeling.

1. Research Question Discovery

1.1 Sources and Typology of Research Questions

Research question generation follows specific knowledge production logic. Based on the nature of question sources, research questions can be categorized as:

(1) Literature-driven questions

Theoretical gaps: phenomenal domains not covered by existing theoretical frameworks
Empirical gaps: theoretical propositions lacking empirical testing
Contradictory findings: conflicting conclusions across different studies
Methodological limitations: systematic deficiencies in measurement or inference in existing research

(2) Phenomenon-driven questions

Social reality observations: empirical patterns in practical domains requiring explanation
Policy evaluation needs: causal identification problems of intervention effects
Theoretical paradoxes: anomalous phenomena that existing theories cannot explain

(3) Method-driven questions

New data availability: research opportunities from large-scale data or new measurement technologies
New method applicability: application expansion of computational methods or statistical models
Replicability issues: re-examination and robustness testing of existing research

1.2 Structural Relationship Between Theory and Research Questions

A bidirectional constructive relationship exists between research questions and theory:

(1) Deductive approach

Deriving testable specific hypotheses from theoretical propositions
Confirmatory research design
Emphasizing internal validity and causal inference

(2) Inductive approach

Extracting theoretical patterns from empirical observations
Exploratory research design
Emphasizing external validity and concept discovery

(3) Abductive approach

Iterative cycling between theory and empirical evidence
Theory-driven data exploration
Balancing explanatory and predictive power

1.3 Types of Research Contributions

Clarifying the type of research question helps position its potential contribution:

Theoretical contribution: Proposing new concepts, mechanisms, or causal pathways
Empirical contribution: Providing new evidence, measurements, or facts
Methodological contribution: Developing new research designs or estimation strategies
Integrative contribution: Synthesizing scattered research conclusions into systematic knowledge

2. Refining Research Questions: From Vague to Precise

2.1 Formal Criteria for Good Research Questions

A researchable question must satisfy the following formal conditions:

(1) Specificity

Clearly define research objects, key variables, and relationship types
Avoid ambiguous concepts or overly broad expressions

(2) Answerability

Question can in principle be answered through empirical evidence
Viable research design and data collection plan exists

(3) Boundedness

Question has clear spatiotemporal boundaries and scope of applicability
Does not attempt to answer all related questions in a single study

(4) Operationalizability

Core concepts can be transformed into observable, measurable variables
Theoretical relationships can be converted into statistical models or computational procedures

2.2 Value Judgment Criteria for Research Questions

Beyond formal criteria, research questions must meet substantive value standards:

(1) Significance

Theoretical significance: Advancing disciplinary knowledge boundaries
Practical significance: Solving real-world problems or guiding policy
Methodological significance: Demonstrating new research paradigms

(2) Feasibility

Data availability: Whether required data exists or can be collected
Technical feasibility: Whether existing methods can support research design
Resource constraints: Whether time, funding, sample size are sufficient

(3) Originality

Avoiding simple repetition of existing research
Innovation in concepts, methods, or evidence

2.3 Iterative Process of Question Refinement

From preliminary interest to clear research question typically goes through these stages:

Stage 1: Broad interest → e.g., “How does social media affect mental health?”

Stage 2: Theoretical positioning → e.g., “Relationship between social media use and depression”

Stage 3: Mechanism specification → e.g., “Social comparison as mediating mechanism”

Stage 4: Boundary delimitation → e.g., “Effects of specific platforms in adolescent populations”

Stage 5: Operationalized question → e.g., “Does Instagram usage frequency increase adolescent depressive symptoms through upward social comparison?”

3. From Concept to Operation: The Epistemological Architecture of Operationalization

3.1 The Essence of Operationalization: Measurement Theory

Operationalization is not simply “variable selection,” but an epistemological process involving ontological commitments and measurement theory. Its core task is:

Under explicit assumptions, construct a formalized measurement-relationship model that maps conceptual entities to observable variables.

The complete definition of operationalization can be expressed as:

$$\text{Operationalization} = f_{\text{map}}: \mathcal{C} \to \mathcal{O} \mid \mathcal{A}$$

Where:

$\mathcal{C}$ = conceptual space
$\mathcal{O}$ = observational space
$f_{\text{map}}$ = mapping function
$\mathcal{A}$ = assumption set

3.2 Case Analysis: Dissecting a Research Question

Consider the following research question:

“Can cognitive behavioral therapy reduce depressive symptoms in adolescents?”

This question is clear at the natural language level but fundamentally unresearchable at the operational level.

Each key concept is ambiguously presented as a challenge, specifically:

(1) Who are “adolescents”?

Age range: 10-18 years? 12-17 years?
Population characteristics: Clinical sample? Community sample? School sample?
Inclusion criteria: Depression diagnosis required? Comorbidities allowed? Medication allowed?

(2) What is “cognitive behavioral therapy” (CBT)?

Specific type: Manualized CBT? Individual or group? Online or offline?
Treatment protocol: How many sessions? Duration? Therapist qualifications?
Fidelity: How to ensure implementation is actually CBT and not something else?

(3) What does “therapy” mean?

Exposure definition: Attendance? Completion? Minimum dose?
Control group: No treatment? Waitlist? Treatment as usual?
Time dimension: Immediate effects? 3-month follow-up?

(4) How to measure “depressive symptoms”?

Measurement tool: PHQ-9? BDI? CDI?
Reporter: Self-report? Clinician? Parent?
Scale nature: Continuous score? Clinical threshold?
Time point: Post-treatment? Change score? Trajectory?

(5) What causal structure does “reduce” imply?

Causal estimand: Average Treatment Effect (ATE)? Local effect?
Comparison object: Relative to what?
Confounding control: Randomization? Matching? Covariate adjustment?

3.3 From Natural Language to Operationalized Question

After operationalization, the original question might become:

“Among 13-17 year-old adolescents diagnosed with moderate depression, compared to treatment as usual, can a 12-session manualized individual CBT program reduce PHQ-9 scores at 12 weeks?”

Now:

Population is specified
Intervention is defined
Outcome is measurable
Causal contrast is set

3.4 Formalized Operationalization Process

We can formalize the operationalization process into five steps:

Step 1: Ontology specification

Identify all entities, actions, outcomes, contexts

Step 2: Measurement model

Construct observable variables for each concept
Specify measurement error and validity threats

Step 3: Action encoding

Quantify intervention, exposure, treatment
Define control conditions

Step 4: Structural relations

Specify hypothesized dependency paths and causal mechanisms
Construct Directed Acyclic Graph (DAG) or Structural Equation Model (SEM)

Step 5: Estimand definition

Define target causal or associational quantity
Specify identification assumptions

3.5 Hierarchical System of Operationalization: From Philosophy to Technology

The operationalization process can be understood as a multi-level epistemological system:

Level 1: Ontological foundation

What exists in our research domain?

Entity identification: populations, interventions, outcomes, contexts
Attribute definition: intrinsic vs. relational properties
Relation types: causal, correlational, constitutive
Philosophical stance: realism vs. constructivism, observable vs. latent constructs

Level 2: Measurement model

How do concepts map to observations?

Classical measurement theory

Classical Test Theory (CTT): true score + error
Item Response Theory (IRT): latent variable-item probability curve
Scale development: reliability, validity

Modern measurement methods

Multimodal measurement: behavioral, physiological, self-report, observational
Digital phenotyping: passive sensor data
Neuroimaging: fMRI, EEG as indicators of psychological processes

Latent variable modeling

Factor analysis: reducing multiple observed indicators to latent constructs
Structural Equation Modeling (SEM): measurement model + structural model

Level 3: Action encoding

How are interventions/treatments represented?

Discrete representation

Binary: treatment vs. control
Categorical: different treatment types

Continuous representation

Dose-response: continuous variation in treatment intensity
Time dimension: exposure duration

Multidimensional representation

Component analysis: treatment composed of multiple components
Vector representation: $t = [\text{duration}, \text{intensity}, \text{fidelity}, \text{component}_1, \ldots, \text{component}_k]$

Temporal structure

Single time-point treatment
Time-varying treatment: $T(t)$ describes intervention state at each moment
Adaptive interventions: adjusted based on intermediate outcomes

Level 4: Structural relations

What is the dependency structure among variables?

Causal graphs (DAG)

Classical confounding:    X → T → Y
                         ↑     ↑
                         └─────┘

Mediation mechanism:     T → M → Y

Moderation effect:       T × X → Y

Structural Causal Models (SCM)

1
2
3

T := f_T(X, U_T)
Y := f_Y(T, M, U_Y)
Counterfactual intervention: do(T=t)

Identification assumptions

Ignorability: $Y(1), Y(0) \perp T \mid X$ under $P(T|X)$
Consistency: If $T=t$ then $Y = Y(t)$
Positivity: $0 < P(T=t|X) < 1$
SUTVA: individual treatment effects not affected by other individuals

Level 5: Estimand definition

What exactly are we estimating?

Types of causal estimands

ATE (Average Treatment Effect): $E[Y(1) - Y(0)]$
ATT (Average Treatment effect on Treated): $E[Y(1) - Y(0) \mid T=1]$
CATE (Conditional ATE): $E[Y(1) - Y(0) \mid X=x]$

Effect quantification methods

Absolute difference: $\Delta = \bar{Y}{\text{treatment}} - \bar{Y}{\text{control}}$
Relative difference: $RR = \bar{Y}{\text{treatment}} / \bar{Y}{\text{control}}$
Standardized difference: Cohen’s $d = \Delta / \sigma_{\text{pooled}}$
NNT (Number Needed to Treat): $1/(P_{\text{response}T} - P{\text{response}_C})$

Inference target

Sample vs. population
Causal vs. associational
Prediction vs. explanation

4. After Operationalization: Choosing Research Designs

4.1 Critical Juncture: From Operationalization to Evidence Generation

After completing operationalization, researchers do not directly enter the “data analysis stage” but face the most critical decision point in research design:

How to generate credible evidence to answer the operationalized research question?

The core of this stage is not “choosing statistical methods” but selecting an evidence-generating mechanism. Different research designs represent fundamentally different:

Data source logic
Causal inference strategies
Types of validity threats
Generalization boundaries

4.2 Fundamental Divisions in Evidence Generation

First fork: Evidence agency

Active generation
- Researcher controls or manipulates key variables
- Typical: experiments, intervention studies
- Advantage: strong causal inference
- Cost: external validity, ethics, resources
Passive collection
- Utilizing already existing or naturally occurring data
- Typical: observational studies, surveys, secondary data
- Advantage: ecological validity, feasibility
- Cost: difficulty in confounding control

Second fork: Evidence temporality

Prospective
- Tracking from present to future
- Advantage: clear temporal ordering
- Typical: cohort studies, RCT
Retrospective
- Looking back from present to past
- Advantage: efficiency, low cost
- Typical: case-control, literature review

4.3 Systematic Classification of Research Designs

Design 1: Experimental and Quasi-Experimental Designs

(1) Randomized Controlled Trial (RCT)

Core features:

Random assignment to treatment conditions
Strongest causal inference capability
Highest internal validity

Design variants:

Parallel group design: participants randomly assigned to different groups
Crossover design: participants receive different treatments sequentially
Factorial design: testing multiple factors simultaneously (2×2, 2×3)
Cluster randomization: randomizing groups as units

Key decisions:

How to set control conditions?
- No treatment control
- Placebo control
- Treatment as usual (TAU) control
- Waitlist control
Treatment allocation ratio?
- 1:1
- 2:1 (more participants receive treatment)
- Adaptive randomization
Blinding strategy?
- Single-blind (participants unaware)
- Double-blind (participants and researchers unaware)
- Triple-blind (plus data analysts)
Sample size and power?
- Expected effect size
- α level (typically 0.05)
- 1-β power (typically 0.80)

(2) Quasi-Experimental Designs

When randomization is infeasible or unethical:

Difference-in-Differences (DiD)
- Difference in changes between treatment and control groups before/after treatment
- Core assumption: parallel trends
- Application: policy evaluation, natural experiments
Regression Discontinuity (RDD)
- Treatment assignment based on cutoff value of continuous variable
- Core assumption: similarity near discontinuity point
- Application: scholarship, admission policy evaluation
Instrumental Variables (IV)
- Finding variables that affect treatment but not directly outcome
- Core assumptions: relevance, exclusion, monotonicity
- Application: returns to education, medical treatment effects
Interrupted Time Series (ITS)
- Multiple time-point measurements before/after treatment
- Core: level or trend change at treatment time
- Application: policy intervention, public health

(3) Laboratory/Online Experiments

Controlled experimental environment
Online A/B testing (large-scale)
Behavioral economics experiments
Advantage: precise control, replicable
Disadvantage: ecological validity issues

Design 2: Survey Designs

(1) Cross-sectional Survey

Features:

Single time-point measurement
Describing current state
Exploring associational relationships

Key design elements:

Sampling strategy
- Probability sampling: simple random, stratified, cluster
- Non-probability sampling: convenience, quota, snowball
Survey mode
- Online questionnaire
- Telephone interview
- Face-to-face interview
- Mixed-mode
Questionnaire design
- Question types: open/closed
- Scale selection: validated instruments
- Order effect control
- Response bias detection

Inference limitations:

Cannot establish causality: correlation ≠ causation
Reverse causality: Y may affect X
Third variables: Z affects both X and Y

(2) Longitudinal Survey

Panel study
- Repeated measurements of same individuals
- Controls for unobserved heterogeneity
- Can test change and causality
Cohort study
- Tracking specific groups (e.g., birth cohorts)
- Prospective causal inference
- Suitable for developmental questions
Repeated cross-sections
- Different samples at different times
- Describing population trends
- Cannot track individual change

Key advantages:

Establishing temporal ordering
Separating within/between effects
Testing developmental trajectories

Design 3: Observational Studies

(1) Cohort Study

Prospective tracking from exposure to outcome
Can calculate incidence, relative risk
Suitable for rare exposure studies

(2) Case-Control Study

Retrospectively tracing exposure from outcome
High efficiency, suitable for rare diseases
Odds Ratio estimation

(3) Ecological Study

Population as unit of analysis
Using aggregated data
Beware of ecological fallacy

Design 4: Literature Review and Meta-Analysis

This is an independent research design, not just “preliminary work.”

(1) Systematic Review

Core steps:

PICO framework
- Population: study population
- Intervention: intervention measures
- Comparison: control conditions
- Outcome: outcome indicators
Literature search strategy
- Database selection
- Keyword combinations
- Time range
Inclusion/exclusion criteria
- Study design types
- Sample characteristics
- Quality assessment
Data extraction
- Study characteristic coding
- Effect size extraction
- Risk of bias assessment

(2) Meta-Analysis

Quantitative synthesis of multiple studies:

Core tasks:

Effect size standardization
- Cohen’s d
- Odds ratio
- Correlation coefficient r
Heterogeneity testing
- I² statistic
- Q test
- τ² estimation
Model selection
- Fixed-effect: assumes same true effect
- Random-effect: allows between-study variation
Publication bias
- Funnel plot
- Egger’s test
- Trim-and-fill

(3) Scoping Review

Exploratory literature mapping
Conceptual boundary mapping
Suitable for emerging fields

Design 5: Text and Document Analysis

(1) Content Analysis

Classical methods:

Manual coding
Codebook development
Inter-coder reliability (Cohen’s κ)

Modern methods:

Automatic text classification (NLP)
Supervised learning: training on labeled data
Unsupervised learning: clustering, topic models

(2) Discourse Analysis

Conversation analysis
Critical discourse analysis
Frame analysis

(3) Computational Text Analysis

Topic models:
- LDA (Latent Dirichlet Allocation)
- STM (Structural Topic Model)
Sentiment analysis:
- Dictionary methods
- Machine learning classification
- Deep learning (BERT, etc.)
Word embeddings:
- Word2Vec
- GloVe
- Contextual embeddings (transformers)
Text networks:
- Co-occurrence networks
- Semantic networks

Design 6: Secondary Data Analysis

(1) Existing Survey Data

Large public datasets: GSS, ANES, NHANES
Advantage: large sample, low cost
Disadvantage: variable constraints, cannot control measurement

(2) Administrative Records

Medical records (EMR/EHR)
Educational data (grades, attendance)
Government databases (census, tax)

(3) Big Data Sources

Social media: Twitter, Reddit, Weibo
Digital traces: search logs, browsing records
Sensor data: wearables, IoT
Transaction data: credit cards, e-commerce

Key challenges:

Unknown data generation mechanisms
Selection bias
Non-standardized measurement
Privacy and ethics

Design 7: Mixed Methods Designs

(1) Sequential Design

Exploratory sequential:
- Phase 1: Qualitative exploration (interviews, focus groups)
- Phase 2: Quantitative study based on qualitative findings
- Purpose: generate hypotheses → test hypotheses
Explanatory sequential:
- Phase 1: Quantitative survey/experiment
- Phase 2: Qualitative depth (explain quantitative results)
- Purpose: test hypotheses → understand mechanisms

(2) Concurrent Design

Convergent:
- Simultaneously collect qualitative and quantitative data
- Compare both types of results
- Purpose: triangulation
Embedded:
- Auxiliary method embedded in main method
- Example: interviews embedded in RCT
- Purpose: supplementary understanding

Design 8: Simulation and Computational Modeling

(1) Agent-Based Modeling (ABM)

Individual rules → emergent patterns
Suitable for: social processes, diffusion, collective behavior

(2) System Dynamics

Feedback loops, stocks and flows
Suitable for: policy simulation, macro processes

(3) Network Simulation

Social network evolution
Diffusion process simulation

(4) Monte Carlo Simulation

Sensitivity analysis
Uncertainty quantification

4.4 Decision Framework for Research Design Selection

Dimension 1: Research Question Type

Question Type	Priority Design
Causal effect (X → Y?)	Experiment, quasi-experiment
Descriptive (distribution of X?)	Survey, observational study
Exploratory (what patterns exist?)	Mixed methods, text analysis
Synthetic (existing evidence?)	Systematic review, meta-analysis
Mechanistic (how does it happen?)	Mixed methods, longitudinal study

Dimension 2: Feasibility Constraints

Constraint Type	Viable Design
Ethics prohibit randomization	Quasi-experiment, observational study
Cannot collect new data	Secondary data, literature review
Limited sample size	Case study, in-depth interviews
Time pressure	Cross-sectional survey, secondary data
Abundant resources	RCT, large-scale longitudinal study

Dimension 3: Validity Trade-offs

Priority Validity	Design Choice
Internal validity	Laboratory RCT
External validity	Field quasi-experiment, large survey
Construct validity	Mixed methods, multiple measurements
Statistical conclusion validity	Large sample, experimental control

Dimension 4: Inference Target

Inference Type	Design Requirements
Causal inference	Experiment or strong identification strategy
Associational inference	Survey, observational study
Descriptive inference	Representative sampling
Predictive inference	Big data, machine learning

4.5 Core Principles of Design Selection

Principle 1: Logical matching between design and RQ

Not every RQ deserves every design.

Descriptive RQ + RCT = resource waste
Causal RQ + cross-sectional survey = overclaiming

Principle 2: Assumption management

Every design has its core assumptions:

Experiment: assumes ethics and control feasibility
Survey: assumes measurement validity and extrapolation limits
Literature: assumes quality of existing research
Secondary data: assumes uncontrollable data generation process

Principle 3: Inferential humility

Must actively answer three questions:

Under this design, can my RQ still be fully answered?
Which part of the original RQ can my conclusion address?
Which claims must I actively abandon?

Principle 4: Design cannot “remedy” poor operationalization

If operationalization has already failed (vague concepts, invalid measurement),
even the best research design cannot salvage the study.
The premise of design selection is: operationalization is complete and reasonable.

4.6 Key Takeaways

At the research design selection stage, one must understand:

Heterogeneity of evidence generation logic
- Experiments generate “what if we intervene” evidence
- Surveys generate “what is the current state” evidence
- Literature generates “what is known” evidence
No “best” design, only “best matched” design
- RCT is not always the gold standard
- Observational studies are more appropriate in certain contexts
- Design selection is a constrained optimization problem
Design selection commits to specific assumptions
- Each design has different validity threats
- Identification assumptions must be explicit
- Conclusion boundaries must be acknowledged

5. Literature Review as a Tool for Question Discovery

5.1 Functions of Systematic Literature Review

Literature review is not merely “background introduction” but a core tool for research question discovery:

(1) Identifying knowledge gaps

Which questions remain unstudied?
Which theoretical mechanisms remain untested?
Which populations or contexts remain uncovered?

(2) Understanding contradictory findings

Why do different studies reach different conclusions?
Do contradictions stem from differences in conceptual definitions, measurement methods, or sample characteristics?
Are there moderating variables or boundary conditions?

(3) Positioning your contribution

How does your research advance existing knowledge?
Avoiding simple repetition of existing work
Clarifying your unique angle or incremental contribution

5.2 Systematic vs. Narrative Review

Systematic review

Explicit inclusion/exclusion criteria
Exhaustive literature search
Structured information extraction
Suitable for integrative research in mature fields

Narrative review

Selective literature coverage
Theory-oriented organization
Critical interpretation and synthesis
Suitable for conceptual mapping in emerging fields

5.3 Pathway from Literature to Questions

Excellent literature reviews should:

Not only summarize “what has been done” but point out “what is missing”
Not only list conclusions but analyze “why such conclusions”
Not only describe current state but propose “what should be studied next”

6. Completeness Checklist for Research Questions

For any research question, researchers should be able to answer the following six questions:

Who exactly?
- Clear population definition and selection rules
What exactly is done?
- Precise intervention, exposure, or treatment definition
Compared to what?
- Clear counterfactual or control conditions
Measured how?
- Specific measurement tools, reporters, time points
Over what time?
- Time range, follow-up periods, causal timing
Under which assumptions?
- Identification assumptions, measurement assumptions, causal assumptions

Key principle:

If you cannot explain how a concept is measured,
then you don’t yet have a true research question.

7. Integration of Classical and Modern Research Paradigms

7.1 Characteristics of Classical Paradigm

Theory-driven hypothesis testing

Deriving hypotheses from explicit theoretical frameworks
Confirmatory research design
Relying on existing constructs and measurement tools
Emphasizing internal validity and causal inference

7.2 Characteristics of Modern Paradigm

Data-driven pattern discovery

Exploring patterns from large-scale data
Exploratory analysis and machine learning
Computational methods enabling new questions
Emphasizing prediction accuracy and generalization capability

Modern quantitative social science needs to find balance between two paradigms:

(1) Dialogue between theory and data

Using theory to guide direction of exploratory analysis
Using data to test and refine theoretical predictions
Balancing explanatory and predictive power

(2) New questions from new methods

Large-scale text data → discourse analysis and opinion dynamics
Network data → social structure and diffusion processes
Digital traces → behavioral patterns and decision mechanisms
Computational simulation → mechanism exploration and counterfactual reasoning

(3) New evidence for classic questions

Re-examining classic theories with new data
Improving credibility of causal inference with new methods
Expanding scale and complexity of problems with computational power

8. Role of Ethical Considerations in Question Formation

8.1 Upstream Nature of Research Ethics

Ethical considerations are not an “additional step” in research but should be integrated into the question discovery stage:

(1) Vulnerable population protection

Does the research question involve children, patients, marginalized groups?
How to ensure the research process causes no additional harm?
How to design informed consent and privacy protection?

(2) Social consequences of research

How might research results be used or misused?
Could it reinforce stereotypes or stigma?
What potential impact on policy and practice?

(3) Data justice

Is data collection fair?
Are there systematic biases in algorithms?
Do research benefits reach the studied population?

8.2 Special Considerations in Clinical Psychology Research

In clinical psychology, question formation requires particular attention to:

Treatment equity: Does control group design deprive participants of effective treatment opportunities?
Pathologization risk: Does the research framework overly pathologize normal behavioral variation?
Cultural sensitivity: Are concepts and measurements applicable across different cultural backgrounds?
Long-term tracking: How to balance scientific value with participant burden in longitudinal research?

9. From Research Design to Analysis Strategy

9.1 Design Determines Boundaries of Analytical Possibilities

After selecting a research design, research enters the analysis design stage. But one must recognize:

Analysis methods cannot remedy fundamental design flaws.

The design has already determined:

Which causal claims are possible
Which confounders are controllable
Which assumptions must be relied upon
Which generalizations are reasonable

9.2 Mapping from Design to Analysis

Experimental design → Analysis strategy

ITT analysis (Intention-to-Treat)
PP analysis (Per-Protocol)
CACE analysis (Complier Average Causal Effect)
Subgroup analysis and heterogeneous treatment effects

Quasi-experimental design → Identification strategy

DiD: parallel trends test, robustness checks
RDD: bandwidth selection, continuity tests
IV: weak instrument tests, overidentification tests

Survey design → Inference strategy

Sampling weight adjustment
Non-response bias handling
Multilevel models (nested data)
Structural equation models

Observational data → Confounding control

Propensity score matching/weighting
Doubly robust estimation
Sensitivity analysis
E-value assessment

Text data → Validity verification

Coding reliability
Model robustness
Topic consistency
Semantic validity

9.3 Critical Decisions Before Analysis

Before actual analysis, must clarify:

(1) Specify estimation target

Not: Is CBT effective?
But:

ATE under ITT framework?
Effect among completers?
CATE under different baseline severity?

(2) Missing data handling

Missing mechanism: MCAR, MAR, MNAR
Handling strategy: deletion, imputation, full information maximum likelihood
Sensitivity analysis: result robustness under different assumptions

(3) Multiple comparison control

Primary vs. secondary outcomes
Confirmatory vs. exploratory analysis
FDR control, Bonferroni correction

(4) Heterogeneity handling

Pre-specified subgroup analysis
Exploratory heterogeneity testing
Machine learning to identify CATE

(5) Robustness checks

Model specification changes
Sample restriction changes
Measurement method changes

9.4 Bridge from Analysis to Interpretation

Analysis produces statistics, but interpretation gives them meaning:

Statistical significance ≠ Substantive importance

Effect size: Cohen’s d, R²
Clinical significance: MCID, NNT
Practical relevance

Association ≠ Causation

Are identification assumptions credible?
Is reverse causality possible?
How large is omitted variable bias?

Sample results ≠ Population truth

External validity threats
WEIRD sample problem
Context dependency

10. Conclusion: The Epistemological Chain of Research Design

10.1 Complete Picture of Research Design

The research design framework presented in this paper is a complete epistemological chain:

Phenomenal observation
    ↓ (abstraction)
Concept formation
    ↓ (theorization)
Semantic question
    ↓ (operationalization)
Operational model
    ↓ (design selection)
Evidence-generating mechanism
    ↓ (estimation)
Estimable object
    ↓ (interpretation)
Substantive conclusion

Each link is indispensable, irreplaceable epistemic work.

10.2 Core Insights

Insight 1: Operationalization is not a technical step

Operationalization is core epistemological work that:

Forces researchers to make implicit assumptions explicit
Makes conceptual ambiguity visible
Establishes bridges between theory and empirical evidence
Determines the valid interpretation range of research conclusions

Insight 2: Design selection is not tool selection

Research designs represent fundamentally different evidence-generating logic:

Different designs produce different types of evidence
Different designs carry different assumption commitments
Different designs have different validity threats
No “best” design, only “best matched” design

Insight 3: Analysis cannot remedy design flaws

Statistical analysis sophistication cannot remedy fundamental design flaws.

If operationalization fails, even advanced methods are futile
If design selection is wrong, causal inference is not credible
If measurement is invalid, results are meaningless

Insight 4: Inferential humility

Excellent researchers know:

Which claims can be made
Which claims are excessive
Which boundaries must be acknowledged
Which assumptions must be relied upon

10.3 Core Competencies of Researchers

An excellent quantitative researcher is not someone who masters statistical tools, but someone who can:

Transform phenomena into precise questions
- From vague observation to answerable question
- From natural language to operational definition
Transform concepts into measurable variables
- Specify ontological commitments
- Construct valid measurement models
Transform theory into testable hypotheses
- Identify causal structure
- Specify identification assumptions
Select matched evidence-generating mechanisms
- Understand design logic
- Acknowledge design limitations
Transform results into theoretical contributions
- Go beyond descriptive statistics
- Connect to broader knowledge systems

The complete path of research design can be formalized as:

$$\text{Phenomenon} \xrightarrow{\text{abstraction}} \text{Concept} \xrightarrow{\text{theory}} \text{Question} \xrightarrow{\text{operationalization}} \text{Model} \xrightarrow{\text{design}} \text{Evidence} \xrightarrow{\text{analysis}} \text{Conclusion}$$