Reflections on Partially Observable Markov Decision Processes and Decision Rationality
I am deeply interested in the Partially Observable Markov Decision Process (POMDP) model as developed in fields such as computational neuroscience and machine learning. Inspired by neuroscientific phenomena, this model captures how humans collect evidence, infer hidden states, and make decisions in an uncertain world.
Formally, the real world is assumed to be in a state ( s ), which evolves according to certain dynamics (( s \rightarrow s’ )). Based on observed phenomena (i.e., evidence), as well as the history of actions and inferred states, an agent estimates the true state of the world and forms a belief state ( b ). This belief state represents a probability distribution over possible true world states. Decisions are then made on the basis of this belief state ( b ), using a learnable policy to generate an action ( a ).
Beyond its explanatory power for evidence accumulation in decision-making and its relevance to neurobiological mechanisms (see Note 1), the POMDP framework may also offer computational insights into the formation of the subject itself. More concretely, subjects are formed through interactions with others, a process that often involves inferring the states of others (commonly referred to as Theory of Mind, ToM). By simultaneously modeling state inference and action selection, POMDP provides a highly meaningful computational model for understanding subject interaction and formation.
In the course of my research, I have also found myself reflecting on the possible existence of free will. In particular, I have been considering the feasibility of applying POMDP models to the prediction of epileptic seizures. This line of inquiry raises a deeper question: can humans truly control their own brain states? These states appear to evolve according to relatively fixed transition dynamics, perhaps perturbed only by certain forms of noise.
At the same time, I have recently been studying ethics. Certain perspectives on ethics and free will suggest that while state dynamics may be non-free, decision-making itself must be regarded as free. Without acknowledging freedom at the level of decision-making, ethical reasoning would lose its foundation. The dynamics of the world’s states may be non-free, governed by causal laws, but the level of decision must be treated as free; otherwise, ethical discussion becomes impossible. If an agent is, in principle, incapable of determining its own actions, then there is no basis upon which to make ethical evaluations of that agent.
Within the POMDP framework, an agent constructs a “phenomenal appearance” of an unknown and unknowable “reality” in the form of a belief state. While we may not be able to determine the underlying true states or their transition mechanisms, we can still possess decision rationality, embodied in a learnable policy. Ethics and meaning may thus lie in whether one can take appropriate action under conditions of limited information.
Awareness arises with intention; recognizing illusion enables detachment; one then turns toward decision rationality.
Note 1: The medial prefrontal cortex (mPFC) may be related to state inference [1].
References
[1] Hogeveen, J., et al. The neurocomputational bases of explore–exploit decision-making. Neuron, Volume 110, Issue 11, 1869–1879.e5.