Learning From Choices:
-
During the post-step reflection process, Keymaker evaluates Q&A effectiveness using these metrics:
-
Shift in Node Confidence Scores: Measures whether an answer contributed to improved clarity in the MoM.
-
Vault Validation Alignment: Assesses whether the answer aligns with verified knowledge in the Vault.
-
Contradiction Reduction: Tracks resolution of conflicting interpretations, by count or by severity of inconsistency.
-
-
Each Q&A is retroactively assigned a utility value: High, Medium, or Low, guiding learning.
Reinforcement Adjustment:
-
Implements reinforcement learning updates in the style of Q-learning:
-
Dynamically adjusts the feature weightings in the scoring function based on downstream impact.
-
Weights associated with poorly performing selections decay gradually, reducing their future influence.
-
-
The meta-learning module aggregates performance across steps and updates its strategy at each MoM (Matrix of Meaning) completion, enhancing global optimization over time.