另一个马尔可夫不等式
The other Markov's inequality

原始链接: https://www.ethanepperly.com/index.php/2026/01/16/the-other-markovs-inequality/

## 马尔可夫不等式:多项式能有多大的摆动? 马尔可夫不等式解决了一个基本问题:给定一个被限制在盒子内的多项式,它的导数有多大? 不等式指出,对于映射到单位正方形的 *n* 次多项式,其最大导数受 *n²* 限制。 简单的幂函数无法达到这个界限,但像切比雪夫多项式这样的“更摆动”的多项式*可以*达到这个界限,证明了它的潜在紧致性。 一个推广,马尔可夫兄弟不等式,将其扩展到更高阶的导数。 这个不等式具有实际应用,尤其是在证明*多项式不可近似性*方面——确定近似一个函数到特定精度的所需最小多项式次数。 例如,要在较大的区间内将函数 *f(x)* 近似到 0.1 的误差范围内,需要至少 10 次多项式。 这种技术不仅适用于 *f(x)*,也适用于 *sin(x)* 和 *arctan(x)* 等函数。 有趣的是,马尔可夫不等式为*任何*近似提供了一个次数阈值,无论函数的平滑度如何,而是关注函数在小区间内的变化程度。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 另一个马尔可夫不等式 (ethanepperly.com) 6 分,由 tzury 发表于 1 小时前 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

If a polynomial function is trapped in a box, how much can it wiggle? This question is answered by Markov’s inequality, which states that for a degree-n polynomial p that maps [-1,1] into [-1,1], it holds that

(1)   \[\max_{x \in [-1,1]} |p'(x)| \le n^2. \]

That is, if a polynomial p is trapped within a square box [-1,1] \times [-1,1], the fastest it can wiggle—as measured by its first derivative—is the square of its degree.

How tight is this inequality? Do polynomials we know and love come close to saturating it, or is this bound very loose for them? A first polynomial which is natural to investigate is the power p(x) = x^n. This function maps [-1,1] into [-1,1], and the maximum value of its derivative is

    \[\max_{x \in [-1,1]} |p'(x)| = \max_{x \in [-1,1]} |nx^{n-1}| = n\ll n^2.\]

Markov’s inequality is quite loose for the power function.

To saturate the inequality, we need something wigglier. The heavyweight champions for polynomial wiggliness are the Chebyshev polynomials T_n, which are motivated and described at length in this previous post. For our purposes, what’s important is that Chebyshev polynomials really know how to wiggle. Just look at how much more rapidly the degree-7 Chebyshev polynomial (blue solid line) moves around than the degree-7 power x^n (orange dashed line).

In addition to seeming a lot more wiggly to the eye, the degree-n Chebyshev polynomials have much larger derivatives, saturating Markov’s inequality:

    \[\max_{x \in [-1,1]} |T_n'(x)| = n^2.\]

These two examples illustrate a good rule of thumb. The derivative of a polynomial which is “simple” or “power-like” will be of size about n, whereas the derivative of a “clever” or “Chebyshev-like” polynomial will be much higher at about n^2.

The inequality (1) was proven by Andrey Markov. It is much less well-known than Andrey Markov’s famous inequality in probability theory. A generalization of Markov’s inequality (1) was proven by Andrey’s brother Vladimir Markov, who proved a bound on the kth derivative of any polynomial p mapping [-1,1] into [-1,1]:

(2)   \[\max_{x \in [-1,1]} |p^{(k)}(x)| \le \max_{x \in [-1,1]} |T_d^{(k)}| = \frac{n^2(n^2-1^2)\cdots(n^2-(k-1)^2)}{1\times 3 \times \cdots \times (2k-1)}. \]

The inequality (1) is a special case of (2) with k=1. The inequality (2) is often called the Markov brothers’ inequality, and this name is sometimes also attached to the special case (1)—to help distinguish it from the probabilistic Markov inequality.

For the rest of this post, we will focus on the basic Markov inequality (1). This inequality is easily extended to polynomials trapped in a box [a,b] \times [c,d] of general sidelengths. Indeed, any polynomial p mapping [a,b]\to[c,d] can be transmuted to a polynomial \tilde{p} mapping [-1,1] to [-1,1] by an affine change of variables:

    \[\tilde{p}(x) = -1 + 2\cdot \frac{p(\ell(x)) - c}{d-c} \quad \text{for } \ell(x) = a + (b-a) \cdot \frac{x+1}{2}.\]

The precise form of this change of variables is not important. What’s important is that \tilde p maps [-1,1] to [-1,1] and, by the chain rule, the maximum value of its derivative is

    \[\max_{t \in [a,b]} |p'(t)| = \frac{d-c}{b-a} \cdot \max_{x \in [-1,1]} |\tilde p'(x)|.\]

Therefore, we obtain a general version of Markov’s inequality (1).

Markov’s inequality (general domain/codomain). Let p be a degree-n polynomial that maps [a,b] to [c,d]. Then

(3)   \[\max_{t \in [a,b]} |p'(t)| \le \frac{d-c}{b-a} \cdot n^2. \]

What’s Markov’s inequality good for? A lot, actually. In this post, we’ll see one application of this inequality: proving polynomial inapproximability. There are lots of times in computational math where its valuable to approximate some function like \mathrm{e}^{-t}, \sqrt{t}, or t^{-1} by a polynomial. There are lots of techniques for producing polynomial approximations and understanding the rate of convergence of the best-possible polynomial approximation. But sometimes we just want a quick estimate of the form, say, “a polynomial needs to be of degree at least n to approximate that function up to additive error 0.1“. That is, we seek lower bounds on the polynomial needed to approximate a given function to a specified level of accuracy.

The general Markov’s inequality (3) provides a direct way of doing this. Our treatment follows Chapter 5 of Faster Algorithms via Approximation Theory by Sushant Sachdeva and Nisheesh K Vishnoi. The argument will consist of two steps. First, we trap the function in a box. Then, we show it wiggles a lot (i.e., there is a point at which the derivative is large). Therefore, by Markov’s inequality, we conclude that the degree of the polynomial must be sufficiently large.

Let’s start with the function \mathrm{e}^{-t} and ask the question:

What polynomial degree n do we need to approximate this function to error 0.1 on the interval [0,b]?

We are interested in the case where the interval is large, so we assume b\ge 2. To address this question, suppose that p is a polynomial that satisfies

    \[|p(t) - \e^{-t}| \le 0.1 \quad \text{for all }t \in [0,b].\]

We will use this information to trap p in a box and show it wiggles.

We are ready for the conclusion. Apply the bullet point above and Markov’s inequality (3) to obtain

    \[0.33 \le p'(t^*) \le \max_{t \in [0,b]} |p'(t)| \le \frac{1.1-(-0.1)}{b-0}  \cdot n^2.\]

Rearrange to obtain

    \[n \ge \sqrt{\frac{0.33}{1.2}} \cdot \sqrt{b} > 0.5 \sqrt{b}.\]

We conclude that we need a polynomial of degree at least 0.5\sqrt{b} to approximate \e^{-t} on [0,b]. This rough estimate actually turns out to be pretty sharp. By using an appropriate polynomial approximation technique (say, interpolation at the Chebyshev points), a polynomial of degree \mathcal{O}(\sqrt{b}) also suffices to approximate \e^{-t} to this level of accuracy on [0,b].

We’ve illustrated with just a single example, but this same technique can also be used to give (sharp) inapproximability results for other functions we know and love like 1/x and \sqrt{x}. For a bit of fun, see if you can get results for approximating a power x^s on [-1,1] by a polynomial n of degree n \ll s. You may be surprised by what you find!

I find the Markov inequality technique for proving polynomial inapproximability to be pretty cool. As we saw in a previous post, we usually understand the difficulty of approximating a function in terms of the rate of convergence of the best (polynomial) approximation, which is tied to fine properties of the function and its smoothness. The Markov inequality approach answers a different question and uses entirely different information about the function. Rather than asking about the asymptotic rate of convergence, the Markov inequality approach tells you at what degree does a polynomial start approximating the function at all. And rather than using information about smoothness, the Markov approach shows that polynomial inapproximability is hard for any function that changes a lot over a small interval. As an exercise, you can show that the same argument for inapproximability of \e^{-t} also shows a polynomial of degree \Omega(\sqrt{b}) is necessary for approximating the ramp function

    \[f_{\rm ramp}(t) = \begin{cases} 1-t/2, & 0 \le t \le 2, \\ 0, & 2 < t \le b.\end{cases}.\]

From the perspective of rate of convergence, these two functions could not be more different from one another, as polynomial approximations to \e^{-t} converge at an exponential rate, whereas polynomial approximations to f_{\rm ramp} converge at the rate \order(1/n). But in terms of the polynomial degree to start approximating these functions, both functions require the same degree \Theta(\sqrt{b}). Pretty neat, I think.

Reference. This blog post is my take on an argument presented in Chapter 5 of Faster Algorithms via Approximation Theory by Sushant Sachdeva and Nisheesh K Vishnoi. It’s a very nice monograph, and I highly recommend you check it out!

联系我们 contact @ memedata.com