【Thought】Multimodal Coupling in Multimodal Analytics Dashboards May Matter - A Systems Theory, Category Theory, Algebraic Geometry and Topology, and Information Theory Perspective
Author Note: This post is a personal speculative reflection prompted by reading the work of Alfredo, Mejia-Domenzain, Echeverria, Rahayu, Zhao, Alajlan, Swiecki, Käser, Gašević, and Martinez-Maldonado (2025). The authors of TeamTeachingViz have produced a careful, empirically grounded, and genuinely valuable contribution to the learning analytics community. Their paper is substantive and thoughtfully designed that it invites the kind of deeper structural questioning attempted below. The mathematical frameworks introduced in this note, category theory, sheaf theory, information geometry, and dynamical systems theory, are exploratory lenses offered in a spirit of intellectual curiosity and humble conjecture. The authors of TeamTeachingViz are not responsible for any of the speculative claims made here, and any errors or overreaches in the mathematical reasoning that follows are entirely my own.
Reference Paper: Alfredo, R., Mejia-Domenzain, P., Echeverria, V., Rahayu, D., Zhao, L., Alajlan, H., Swiecki, Z., Käser, T., Gašević, D., & Martinez-Maldonado, R. (2025). TeamTeachingViz: Benefits, Challenges, and Ethical Considerations of Using a Multimodal Analytics Dashboard to Support Team Teaching Reflection. In Proceedings of the 15th International Learning Analytics and Knowledge Conference (LAK 2025). ACM. https://doi.org/10.1145/3706468.3706475 | Monash University Repository
1. Starting Point: What “Multimodal Matters” Actually Claims
The paper TeamTeachingViz presents a dashboard integrating three data streams to support team teaching reflection in higher education classrooms: indoor positioning data (x-y coordinates of each educator at ~1 Hz via UWB sensors), voice activity detection (timestamped speaking/silent segments from individual microphones), and spatial pedagogy observation codes (human-coded behavioural categories such as Lecturing, One-to-one consultation, or Monitoring). The implicit design argument is clear and defensible: position alone is ambiguous, audio alone is ambiguous, but their combination together with theoretically grounded observation codes gives educators enough interpretive purchase to reflect meaningfully on what happened in a session.
Empirical feedback from educators partially confirms this. The dashboard did provoke genuine reflective dissonance, with one educator surprised to find their educator-to-educator interaction time nearly equalling their educator-to-student time, prompting a concrete reconsideration of classroom priorities. Yet educators consistently requested richer context, especially student-side data and speech content, suggesting that multimodality, as currently implemented, is necessary but not sufficient. The obvious interpretation is that more modalities would help. But there is a more interesting and structurally deeper interpretation: the limitation may lie less in the number of modalities present and more in how their coupling is handled.
The current dashboard treats integration as juxtaposition: a hexagonal heatmap (position + voice), a bar chart (observation codes), and a text panel (co-teaching strategy summaries) are displayed side by side. What this presents is essentially three marginal distributions made visually readable simultaneously. The information that lives between modalities, the joint structure, the dependencies, the cross-modal transitions, the contradictions between channels, is almost entirely invisible. This essay argues that this missing coupling structure is where the genuinely interesting pedagogical information resides, and that systems theory, category theory, algebraic topology, and information geometry together offer a rigorous and productive language for describing it.
2. Each Modality as a Geometry on a Different Manifold
The first step toward a formal treatment is to regard modalities as geometrically structured objects inhabiting different spaces, rather than treating them as interchangeable “data types.”
The positioning data of a single educator is a time-indexed trajectory in a two-dimensional Euclidean space. Considered over a session, it sweeps out a curve on (or near) a smooth submanifold of $\mathbb{R}^2 \times \mathbb{R}^+$, where the third dimension is time. The local geometry is flat and metrically well-behaved; distances between positions are Euclidean, transitions are smooth, and the trajectory has a natural tangent vector at every point describing velocity and direction of movement.
The voice activity data, when considered not as a raw binary signal but as a distribution of speaking events over the classroom floor, defines a measure on the same spatial domain. More precisely, at any location $x$ on the floor plan, one can ask: what is the probability density of the educator speaking at that location? The resulting object is a probability distribution over the classroom, and probability distributions on a fixed sample space form a statistical manifold whose natural geometry is the Fisher information metric, a Riemannian structure quite different from the Euclidean one. Geodesics on this manifold are curved; curvature is non-trivial; and the notion of “closeness” between two voice-activity profiles is fundamentally Riemannian.
The spatial pedagogy observation codes are categorical and discrete. They live in a combinatorial space with no natural metric unless one is explicitly constructed. Their temporal evolution can be modelled as a directed graph or a category whose objects are the behavioural states and whose morphisms are the allowed or observed transitions (e.g., the transition from Lecturing to One-to-one consultation, or from Monitoring to Surveillance). This is a genuinely different kind of mathematical object from both the trajectory manifold and the statistical manifold.
These three spaces, the trajectory manifold, the statistical manifold of voice distributions, and the categorical graph of behavioural transitions, are objects with genuinely distinct intrinsic dimensionalities, distinct notions of distance, distinct curvature structures, and distinct transformation groups. Superimposing their visualisations in a dashboard is, geometrically speaking, like drawing a Riemannian surface, a probability simplex, and a directed graph on the same piece of paper and calling the overlay “integration.” The question that matters is what the maps between these spaces look like, whether those maps preserve structure, and what their failure to do so tells us.
3. Cross-Category Mappings: Functors as the Language of Multimodal Coupling
This is precisely where category theory offers a productive entry point. Category theory is a branch of mathematics concerned with how mathematical structures of different kinds relate to one another. Its central objects are categories (collections of objects and the structure-preserving arrows, called morphisms, between them), and functors (maps between entire categories that respect their internal structure).
To appreciate why this language is useful here, it helps to first see how each modality can itself be described as a category. A morphism within a single category is a structure-preserving arrow between two objects inside that same category. Within the category $\mathcal{P}$ of spatial states, a morphism is a directed movement from one classroom position to another. Within the category $\mathcal{A}$ of audio states, a morphism is a transition in voice activity: the onset of speech, a cessation, or a shift in intensity. Within the category $\mathcal{S}$ of spatial pedagogy codes, a morphism is an observed behavioural transition, such as the shift from Lecturing to One-to-one consultation. Each of these morphisms lives entirely within its own category and respects only the structure that its own category tracks.
A functor $F: \mathcal{C} \to \mathcal{D}$ operates at a higher level: it maps between two different categories, sending objects of $\mathcal{C}$ to objects of $\mathcal{D}$ and morphisms of $\mathcal{C}$ to morphisms of $\mathcal{D}$, while strictly preserving identity and composition. The key structural demand is that the functor respects the relational fabric of both categories simultaneously: states must map to states, and transitions between states must map to corresponding transitions coherently.
Now consider what a mapping from $\mathcal{P}$ to $\mathcal{A}$ would require. The two categories have different objects, different morphisms, and different compositional structures. A mapping between them therefore takes the form of a functor: if an educator moves from location $A$ to location $B$ and then from $B$ to location $C$, the functor must map the composed path $A \to C$ to the composed audio transition corresponding to those two movements in sequence, preserving the compositional logic of both sides.
What would such a functor mean pedagogically? It would encode a systematic structural relationship between movement patterns and speech patterns: every time an educator transitions from Authoritative space toward Interactional space (a morphism in $\mathcal{P}$), there is a corresponding and systematic shift in voice activity (the image morphism in $\mathcal{A}$). A well-defined functor encodes something like a teaching style as a structural coupling between movement and speech, grounded in the compositional relationships between transitions rather than in marginal statistics. Different educators would define different functors. Comparing educators, or tracking an educator’s development over time, would then be a problem about the structure of those functors rather than about distribution means or percentage summaries.
4. Natural Transformations as Pedagogical Style Comparison
The category-theoretic framework becomes even more powerful when considering multiple educators simultaneously, which is precisely the team teaching context of this paper.
If educator $T_i$ defines a functor $F_i: \mathcal{P} \to \mathcal{S}$ mapping their movement transitions to their spatial pedagogy behavioural transitions, and educator $T_j$ defines $F_j: \mathcal{P} \to \mathcal{S}$ for the same domain, then a natural transformation $\eta: F_i \Rightarrow F_j$ is a collection of morphisms in $\mathcal{S}$, one for each object $X$ in $\mathcal{P}$, given by $\eta_X: F_i(X) \to F_j(X)$, subject to the naturality condition: for every morphism $f: X \to Y$ in $\mathcal{P}$, the square
$$F_j(f) \circ \eta_X = \eta_Y \circ F_i(f)$$
must commute. This naturality condition is not a technicality; it is the mathematical statement that the comparison between $T_i$ and $T_j$ is consistent across all spatial transitions, not just at individual positions. A natural transformation between educator-style functors is therefore a rigorously defined notion of pedagogical analogy: a systematic, structure-respecting way to translate one educator’s dynamical teaching pattern into another’s.
This allows one to say something far stronger than “educator $T_i$ and educator $T_j$ have similar distributions of spatial pedagogy codes.” It allows one to say that their dynamical patterns of teaching, the way their spatial transitions couple to their behavioural transitions, are related by a coherent transformation that is consistent across all moments in the session. A natural transformation is, in a precise sense, the mathematical object corresponding to the pedagogical question: in what systematic sense does one educator’s approach correspond to another’s?
Furthermore, the collection of all such functors and natural transformations between them forms a functor category, and the structural properties of this category, which functors are isomorphic (i.e., which teaching styles are essentially the same), which are not, and how they compose, constitute a rigorous taxonomy of teaching configurations in a team teaching context.
5. The Coupling Structure: Sheaves over the Classroom
Sheaf theory provides a precise language for one of the central questions raised by multimodal coupling: when can local data observed on different patches of a space be consistently assembled into a coherent global picture?
Formally, consider the classroom-time continuum as a topological space $X$ (a product of the floor plan and the session time interval). Each modality defines a sheaf $\mathcal{F}$ over $X$: for each open set $U \subseteq X$ (think: a spatial region during a specific time window), the sheaf assigns a set of data sections, the observations of that modality within $U$. The positioning sheaf $\mathcal{F}{\mathcal{P}}$ assigns position samples; the audio sheaf $\mathcal{F}{\mathcal{A}}$ assigns voice activity segments; the observation sheaf $\mathcal{F}_{\mathcal{S}}$ assigns behavioural codes.
The gluing axiom of sheaf theory states that if sections on overlapping open sets agree on their overlap, they can be uniquely glued into a section on the union. In the multimodal context, a joint sheaf $\mathcal{F}{\mathcal{P}} \times_X \mathcal{F}{\mathcal{A}} \times_X \mathcal{F}_{\mathcal{S}}$ formalises what it means for all three modalities to be mutually consistent across the classroom at every moment. The global sections of this joint sheaf are precisely the multimodal observations that cohere across all data streams simultaneously, which is exactly the coupling structure that current dashboards fail to make explicit.
Crucially, when sections fail to glue, the failure is itself informative. A non-trivial obstruction to gluing, measured by the Čech cohomology $\check{H}^1(X, \mathcal{F})$, signals a genuine structural inconsistency between modalities in some region of the classroom-time space. For example: the observation code says the educator is Monitoring (supervisory, non-verbal), but the voice activity data shows sustained speech in that region during that interval. This contradiction is invisible when modalities are displayed in separate panels, but it surfaces as a non-trivial cohomology class in the joint sheaf. Such contradictions are pedagogically meaningful: they may indicate observer coding errors, but they may also indicate moments where an educator’s behaviour does not fit neatly into any single category, precisely the moments that are most interesting for reflection.
A fiber bundle perspective complements this. Taking the classroom floor plan as the base space $B$ and attaching to each point the joint multimodal state as the fiber, the total space is a bundle $E \to B$ whose structure group encodes the rules by which the multimodal state transforms as the educator moves. Different educators, or the same educator across different sessions or semesters, define different bundle structures over the same base. Comparing these bundle structures is a problem in differential topology and is strictly richer than comparing summary statistics.
6. Information Geometry and the Geometry of the Joint Distribution
From an information-theoretic perspective, the coupling structure under consideration is formalised by the joint distribution $p(x, a, s)$ of position $x$, audio state $a$, and spatial pedagogy code $s$. The marginal distributions $p(x)$, $p(a)$, $p(s)$, together with the pairwise conditionals $p(a|x)$, $p(s|x)$, $p(s|a)$, and the full joint, form a hierarchy of increasingly rich descriptions of the multimodal state. The dashboard in the paper effectively presents only the marginals. The pairwise conditionals correspond to what an analyst might informally call “what does an educator’s voice activity look like given their position,” which is already more informative, but the full coupling information resides in the joint distribution and cannot be decomposed without loss.
The scalar mutual information $I(X; A) = D_{\mathrm{KL}}(p(x,a) | p(x)p(a))$ quantifies how far the joint is from independence, yet collapses the geometry into a single number. A richer treatment uses information geometry, in which distributions live on a statistical manifold equipped with the Fisher information metric $g_{ij} = \mathbb{E}\left[\frac{\partial \log p}{\partial \theta^i} \frac{\partial \log p}{\partial \theta^j}\right]$. The joint distribution $p(x, a, s; \theta)$ parameterised by some vector $\theta$ lives on a curved statistical manifold, and its geometry encodes the coupling structure through the curvature tensor.
The geodesic distance on this joint manifold between two teaching episodes gives a notion of pedagogical dissimilarity that respects the full coupling structure. Comparing educators, or comparing the same educator’s sessions across a semester, would then be a problem of geodesic interpolation on a curved statistical manifold, a framework considerably richer than comparing bar-chart percentages of spatial pedagogy code distributions. The curvature of the manifold near a particular teaching configuration tells you how “rigid” that style is, whether small perturbations in one modality necessarily propagate to others, and whether the joint structure is locally product-like (modalities are locally independent) or strongly coupled (modalities are tightly entangled).
The information bottleneck framework extends this further. If the goal of the dashboard is to surface information about team teaching dynamics that is relevant for pedagogical reflection, one can formalise this as: find a compressed representation $T$ of the joint multimodal state $(X, A, S)$ that maximises the mutual information with the pedagogically relevant target variable (e.g., student engagement proxy, co-teaching strategy label) while minimising the mutual information with irrelevant variation (e.g., idiosyncratic movement patterns unrelated to pedagogy). The coupling structure between modalities is essential to solving this optimisation, because the relevant signal may only exist in the joint and not in any marginal.
7. Systems Theory: Emergence from Coupling
A systems-theoretic framing unifies the above perspectives and connects them back to the empirical findings of the paper. In dynamical systems terms, each modality can be modelled as a subsystem with its own state space and evolution equations. The full multimodal teaching system is a coupled dynamical system in which the evolution of the positioning subsystem influences and is influenced by the audio subsystem, and both are coupled to the behavioural observation subsystem.
The key concept here is emergent structure: properties of the coupled system that cannot be inferred from the subsystems in isolation. A vivid example from the paper is the co-teaching strategy identification. The algorithm maps combinations of spatial pedagogy behaviours from pairs of educators to one of six co-teaching strategies. The strategy is not a property of any single educator’s trajectory or any single educator’s behavioural codes; it is a property of the joint configuration of two or more educators simultaneously. It emerges from the coupling. The current rule-based approach approximates this emergence coarsely; a systems-theoretic treatment would model the joint dynamical system and identify emergent configurations through the structure of its attractor landscape.
Educators’ feedback in the paper that the Co-Teaching Strategies Panel felt inconsistent with their mental model of team teaching is, from this perspective, a symptom of exactly this problem. The panel describes co-teaching as pairwise (two educators at a time) when the reality is a three-body coupled system whose emergent configurations are not reducible to pairwise descriptions without significant information loss. This is precisely the mathematical problem of characterising the attractors of a system with three coupled subsystems, which generally has different and richer structure than the union of its three pairwise subsystems.
8. Toward a Coupling-Aware Dashboard: What Would It Look Like?
The foregoing mathematical framework is not purely abstract. It has concrete design implications for next-generation multimodal teaching analytics dashboards.
Rather than visualising modalities side by side, a coupling-aware dashboard would surface the functorial relationships between them. Concretely: does an educator’s movement from Authoritative toward Interactional space systematically predict an increase in their voice activity? Does the transition from One-to-one consultation back to the front of the room reliably co-occur with a shift in the co-teaching strategy? These are questions about the conditional distributions $p(a | \Delta x)$ and $p(s_{\text{pair}} | s_{\text{individual}})$, and answering them requires computing and visualising the coupling structure, not the marginals.
The sheaf-theoretic perspective suggests a new kind of dashboard indicator: a cross-modal consistency score for each spatial-temporal region of the session. Regions where the positioning data, voice data, and observation codes are jointly consistent (global sections of the joint sheaf exist) would be displayed normally. Regions where consistency fails would be flagged as anomalies deserving reflective attention, because such inconsistencies are precisely the moments that carry the most information about the limits of the current coding framework and about the complexity of the teaching behaviour occurring there.
The statistical manifold perspective suggests that session comparison should be presented not as side-by-side bar charts but as distances on a curved space: how far apart are two sessions on the joint statistical manifold, and along what direction does the difference primarily lie? A geodesic connecting two session-points on the manifold represents the most efficient path of pedagogical change, and its tangent vector at the starting point indicates in what direction teaching behaviour should shift to move toward the other session’s profile, while respecting the full coupling structure.
The natural transformation perspective suggests that team comparison should be presented as a coherent transformation, not as two separate profiles. Given two co-teaching educators, the natural transformation between their style functors tells an administrator or a professional development coordinator not merely that educator $T_i$ lectures more than $T_j$, but that $T_i$’s entire dynamical pattern of transitions between spatial zones and behavioural states is systematically related to $T_j$’s in a structure-preserving way, and where that relationship breaks down.
9. A Note on Tractability and the Path Forward
It would be dishonest not to acknowledge that the mathematical framework sketched here is considerably more demanding than the current state of the art in learning analytics dashboards. Computing geodesics on statistical manifolds requires parameterising the joint distribution, which is high-dimensional and partially observed. Checking the gluing conditions of a sheaf requires formalising the overlap structure of the data, which is non-trivial for irregularly sampled sensor streams. Identifying natural transformations between educator-style functors requires a formal definition of the functor categories involved, which is currently absent from the TA literature.
However, these are tractable research problems rather than fundamental obstructions. Information-geometric methods are increasingly computationally accessible. Sheaf-theoretic data integration has been applied in sensor fusion and topological data analysis. Category-theoretic models of compositional structure have been developed in cognitive science, linguistics, and quantum mechanics, and the formal groundwork for applied category theory in learning analytics is beginning to be laid.
The deeper point is conceptual. The current generation of multimodal analytics dashboards, including the otherwise carefully designed TeamTeachingViz, treats “multimodal” primarily as an adjective modifying the quantity of data: more streams, more sensors, more panels. The framework proposed here treats “multimodal” as a relational and structural claim, locating the interesting information in a multimodal system within the functorial mappings between its categories, the global sections of its joint sheaf, the geodesic structure of its joint statistical manifold, and the emergent attractors of its coupled dynamics. Whether or not educators can ever directly interact with these mathematical objects, designing dashboards that make coupling structure visible is a research direction worth pursuing seriously.
10. Conclusion
The empirical finding that educators found TeamTeachingViz beneficial but insufficient, wanting more context, more modalities, richer student data, is typically interpreted as a call for more data. This essay has argued that it is better interpreted as a call for better coupling. What educators are reaching for when they say the data lacks context is, in structural terms, the joint information that lives between modalities rather than within each one. Their intuition that position without speech is uninterpretable, and speech without position is uninterpretable, but together they begin to make sense, is a lay expression of the mathematical fact that the joint distribution is not recoverable from the marginals.
Giving that intuition a rigorous home in systems theory (coupled dynamical systems and emergence), category theory (functors between modality categories and natural transformations between educator style functors), algebraic topology (sheaves over the classroom-time space and the cohomological obstruction to cross-modal consistency), and information geometry (geodesic distances on the joint statistical manifold) does not merely satisfy a taste for mathematical elegance. It produces a genuinely different and richer set of questions to ask of multimodal teaching data, and a correspondingly richer design space for the dashboards that present it.
The journey from TeamTeachingViz as it currently stands to a dashboard that operationalises these ideas is long. The conceptual reorientation proposed here, treating modalities as coupled geometries whose mapping structure is the primary object of interest rather than as parallel streams to be displayed side by side, seems like a meaningful step if multimodal analytics is to move beyond the additive paradigm and toward something that deserves the name in full.
This consideration note was developed in dialogue with a close reading of Alfredo et al. (2025). The mathematical frameworks invoked draw on differential geometry, category theory (Mac Lane, 1978), sheaf theory (Kashiwara & Schapira, 1990), information geometry (Amari, 2016), and dynamical systems theory. Their application to learning analytics remains largely an open research programme.