德普斯特-夏弗理论与集合推理

德普斯特-夏弗理论与集合推理
Dempster-shafer and reasoning about sets

原始链接: https://emiruz.com/post/2025-10-30-epistemics/

该方案处理了关于二元变量(X)的概率推理，信息以量化逻辑公式的形式表达。挑战在于估计X中子集的特定状态的概率，其中更新不是状态的直接概率，而是*给定*逻辑条件的概率。传统的贝叶斯方法由于陈述的抽象性和缺乏直接条件点而难以应对。 Dempster-Shafer (DS)理论提供了一种优雅的解决方案。DS为可能状态的*子集*分配概率（质量），允许重叠的信念并有效地处理不确定性。更新使用Dempster规则融合，结合来自多个来源的证据。信念和合理性函数随后为任何给定状态的概率提供上限和下限。一个在GNU SETL中实现的验证概念证明了该方法。给定关于变量(a, b, c, d, e)之间关系的一些信息更新，代码计算诸如“b为真的概率”之类的查询的信念和合理性。DS方法成功地提供了概率估计，类似于其在传感器融合应用中的使用。存在一种频率主义替代方案，但它不太优雅且可能计算成本较高。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录Dempster-shafer 和集合推理 (emiruz.com)5 分，usgroup 2 小时前 | 隐藏 | 过去 | 收藏 | 1 条评论 esafak 4 分钟前 [–] DS 理论从没流行起来。即使贝叶斯主义在 LLM 出现后也退居二线。DS 理论计算复杂度更高，因此更不受欢迎。回复考虑申请 YC 的 2026 年冬季批次！申请截止日期为 11 月 10 日指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

原文

Here is an interesting scheme I encountered in the wild, generalised and made abstract for you, my intrepid reader. Let \(X\) be a set of binary variables. We are given information about subsets of \(X\), where each update is a probability ranging over a concrete set, the state of which is described by an arbitrary quantified logic formula. For example, \[P\bigg\{A \subset X \mid \exists_{x_i, x_j \in A} \big(x_o \ne x_j))\bigg\} = p\] The above assigns a probability \(p\) to some concrete subset A, with the additional information that at least 1 pair of its members do not have the same value. Note that due to the existential quantification, the probability really ranges over the possible states of \(A\), since there could be many configurations which satisfy the form above.

At any point in time, using all information available, the task is to estimate a probability for some set of possible states also specified as an arbitrary quantified logic formula over a subset. For example,

\[ P\bigg\{ Q \subset X \mid \forall_{x\in Q} (x=1) \bigg\} \] which requires the probability for some specific concrete subset \(Q \subset X\) such that all its members have the value \(1\).

Part of the fascination for me is that it is hard to get a Bayesian solution off the ground because (1) the statements are not about the atoms of the domain, and (2) there are no true statements to condition on. There’s a Frequentist solution in principle at the end of this post, which could be made ostensibly Bayesian by arbitrarily adding priors, but I see little added value.

A Dempster-Shafer solution

Dempster-Shafer (DS) theory offers an elegant solution. DS is a theory of evidence from the 1960s in which probabilities are assigned directly to subsets of events. Formally, there is some set of events \(X\), and beliefs are assigned to the powerset with the requirement that \(\sum_{A \in 2^X} m(A) = 1\), where \(m: 2^X \to [0,1]\) (known as a mass assignment). Surprisingly, sets which have non-zero mass assignments are allowed to overlap. Further, it is required that \(m(\emptyset)=0\) (there is no assignment to the empty set) and that any probability not assigned to a subset is assigned to \(m(X)\) – known as a vacuous assignment – to make the powerset sum to \(1\). The theory defines \(bel(A) = \sum_{B\mid B \subseteq A} m(B)\) and \(pl(A) = \sum_{B\mid A \cap B \ne \emptyset} m(B)\): belief and plausibility. These roughly translate to mass directly supporting \(A\) and mass which does not contradict it: an upper and lower bound.

In the original theory, mass assignments from multiple sources are combined using Dempster’s rule: \[ m_{1,2}(A) = \frac{1}{1-K}\sum_{B \cap C=A \ne\emptyset}m_1(B)m_2(C) \] where \(K = \sum_{B\cap C = \emptyset} m_1(B)m_2(C)\). \(K\) is interpreted as a measure of conflict since it sums all mass from the two sources, the intersection of which is the empty-set. Dempster’s rule has no optimality guarantees or assurances against absurd results in the way that Bayes’ rule does, so it (or one of its many variants) has to be used with consideration of possible outcomes by design. Wikipedia gives an easy to read and more thorough introduction to the basics here.

Applied to the problem at hand, given that arbitrary logical formula can be used to specify information, our working set \(\bar{X}\) is all possible assignments to \(X\) and our mass assignment will therefore range over \(2^\bar{X}\) instead. Each new information update acts as a separate source with a mass assigned to a particular subset of \(\bar{X}\), and the rest assigned to the vacuous set. We can proceed by fusing the \(T\) updates using Dempster’s rule (\(\oplus\)) such that \(m_{\le T}=m_1 \oplus \dots \oplus m_T\). We can then use \(m_{\le T}\) to calculate belief and plausibility for any member of \(2^\bar{X}\) which ranges over all arbitrary logical formulae over a subset of \(\bar{X}\).

A proof of concept in GNU SETL

I recently picked up GNU SETL: a wonderful implementation of a set based language from the 1960s. It makes for a convenient language with which to prototype a small example of the scheme above. SETL reads a lot like formal set theory, so it should still be possible to follow along even if it is unfamiliar to you.

The whole code can be found here.

Problem statement: you are given the following information.

$ Global variables.
var 
  dom := {false,true},
  X   := {[a,b,c,d,e]: a in dom, b in dom, c in dom,
          d in dom, e in dom};

$ Information.
up1 := {[{[a,b,c,d,e] in X| a or b}, 0.7],
        [X, 0.3]};
up2 := {[{[a,b,c,d,e] in X| b or c}, 0.5],
        [X, 0.5]};
up3 := {[{[a,b,c,d,e] in X| b and d and e}, 0.8],
        [X, 0.2]};
up4 := {[{[a,b,c,d,e] in X| c and e}, 0.9],
        [X, 0.1]};

The task is to calculate belief and plausibility for the following sets.

$ Queries.
q1 := {[a,b,c,d,e] in X| b};
q2 := {[a,b,c,d,e] in X| c};
q3 := {[a,b,c,d,e] in X| d or e};
qs := [q1,q2,q3];

Lets define a Dempster’s rule operator, and belief and plausibility calculation procedures.

$ Procedures and operators.

proc fuse(a, m1, m2);
  k := +/[p1*p2: [b,p1] in m1,[c,p2] in m2| b*c={}] ? 0;
  m := +/[p1*p2: [b,p1] in m1,[c,p2] in m2| b*c=a] ? 0;
  return [a, m/(1-k)];
end;

op dempster(m1,m2);
  X_ := {fuse(b*c,m1,m2):
         [b,p1] in m1, [c,p2] in m2};
  return {[a,p] in X_| p>0};
end;

proc belief(q, m);
  return +/ [p: [a,p] in m| a subset q] ? 0;
end;

proc plause(q, m);
  return +/ [p: [a,p] in m| a*q/={}] ? 0;
end;

Lets combine the information using Dempster’s rule.

m_star := dempster/ [up1,up2,up3,up4];

Lets calculate the belief and plausibility for the queries.

(for i in [1..3])
  print('q', i, ', Belief', belief(qs(i), m_star),
        ', Plausibility', plause(qs(i), m_star));
end;

> q 1 , Belief 0.8 , Plausibility 1
> q 2 , Belief 0.9 , Plausibility 1
> q 3 , Belief 0.98 , Plausibility 1

An elegant solution; as required.

In the wild, DS is used for “sensor fusion”; a kind of task in which many sensors report overlapping beliefs about the same state of affairs, and the task is to combine them into an overall belief. I had not realised that DS theory would make for such a portable metaphor, so I am pleased to have discovered it.

A Frequentist solution

For completeness, here is a Frequentist solution in principle. Start with a probability on each member \(P(x_i=1) = p_0\). When given information, change the probabilities on the members implicated by the information to be compatible with it. This is non-identifiable in the general case because many assignments may be compatible with the information, so we can simulate, aiming to generate a representative random sample from the space of valid probability assignments. Each concrete assignment allows us to calculate \(P(Q \subset X \mid \dots)\), therefore at any given \(t\), we are able to produce \(\mathbb{E}_t\big\{ P(Q \subset X\mid \dots )\big\}\) – and confidence intervals – having used all available information as simulation constraints.

The main issue with this method for me is that it lacks elegance, but secondly, for non trivial state spaces, it will likely be computationally expensive if not intractable.