Inverse Function Theorem

> [!NOTE] Inverse Function Theorem \[MA263\] > Let $U \subset \mathbb{R}^n$ be [[Open Sets in Metric Space|open]] and assume $f: U \rightarrow \mathbb{R}^n$ is [[Continuous Differentiablilty|continuously differentiable ]] on $U$. > > For $a \in U$ assume further that $\operatorname{det}(D f(a)) \neq 0$ (alternatively $\operatorname{det}(\partial f(a)) \neq 0)$. Then there is an open neighborhood $V \subset U$ of $a$ and an open *neighborhood* $W \subset \mathbb{R}^n$ of $f(a)$ such that $f: V \rightarrow W$ has a continuous inverse $f^{-1}: W \rightarrow V$ which is continuously differentiable and > > $ > D\left(f^{-1}\right)(y)=\left(D f\left(f^{-1}(y)\right)\right)^{-1} \tag{*} > $ > > for all $y \in W$. ###### Proof Define $\lambda:=\text{}{Df}(a) \in L\left(\mathbb{R}^n\right)$. Since $\operatorname{det}(D f(a))=\operatorname{det}(\lambda) \neq 0$ we see that $\lambda$ is invertible. Note that the Inverse Function Theorem holding for $f$ is equivalent to it holding for $\lambda^{-1} \circ f$ (let $\tilde{f}=\lambda^{-1} \circ f$, then $f^{-1}=\tilde{f}^{-1}\circ \lambda^{-1}$ and $\tilde{f}^{-1}=f^{-1}\circ\lambda$). But we have by the chain rule $D\left(\lambda^{-1} \circ f\right)(a)=D\left(\lambda^{-1}\right)(f(a)) \circ D f(a)=\lambda^{-1} \circ D f(a)=I_n$where $I_n$ is the identity map on $\mathbb{R}^n$. Thus we can WLOG assume that $D f(a)=\lambda=I_n$. **Step 1:** $f$ is injective in a neighbourhood of $a$. Since $f$ is continuously differentiable on $U$ we can choose $\varepsilon>0$ (sufficiently small) such that on $B(a, \varepsilon) \subset U$ it holds $\begin{align} \operatorname{det}(D f(x)) &\neq 0, \quad \forall x \in B(a, \varepsilon) \tag{1} \\ \left|\partial_jf_i(x)-\delta_{ji}\right|=\left|\partial_jf_i(x)-\partial_jf_i(a)\right| &< \frac{1}{2n^2}\quad\forall1\leq i,j\leq n\mathrm{~and~}x\in B(a,\varepsilon) \tag{2} \end{align}$ Consider $g(x)=f(x)-x$ on $B(a,\varepsilon)$ (*a measure of how $f$ deviates from its affine approximation*). By $(2)$, $\left|\partial_jg_i(x)\right|<\frac1{2n^2}\quad\forall\:1\leq i,j\leq n\mathrm{~and~}x\in B(a,\varepsilon)\:.$We can then show (by taking a telescoping sum and using mean value theorem, for example) that $\|f(x)-x-(f(y)-y)\|=\|g(x)-g(y)\|\leq\frac12\|x-y\|\quad\forall\:x,y\in B(a,\varepsilon)\:.$and thus$\|x-y\|\leq2\|f(x)-f(y)\|\quad\forall\left.x,y\in B(a,\varepsilon)\right.,\tag{3}$which yields the desired injecitvity ($\lVert f(x)-f(y) \rVert\implies x=y$). Note that by continuity, $(3)$ extends to all $x,y\in\overline B(a,\varepsilon).$ **Step 2:** $f$ is surjective in a neighbourhood of $a$. Note that $(3)$ implies that for all $x\in\partial B(a,\varepsilon)$ (the boundary of the open ball i.e. $\lVert x - a \rVert=\varepsilon$) $\|f(x)-f(a)\|\geq\frac\varepsilon2=:d\:.$ Now let $W:=B(f(a),d/2).$ Thus if $w\in W$ and $x\in\partial B(a,\varepsilon)$ we have $\|w-f(a)\|<\frac d2\leq\|w-f(x)\|\:, \tag{4}$ since $f(x)\not\in B(f(a),d).$ **Claim**: for any $w\in W$, there exists a unique $x\in B( a, \varepsilon )$, such that $f( x) = w.$ Consider $g:\overline{B(a,\varepsilon)}\to\mathbb{R},\:g(x):=\|w-f(x)\|^2=\sum_{j=1}^n(w_j-f_j(x))^2\:.$ Note that $g$ is clearly continuous and thus attains its minimum on $\overline B(a,\varepsilon).$ If $x\in\partial B(a,\varepsilon)$, by $(4)$, we have $g(a)<g(x)$and thus the minimum is not attained on $\partial B(a,\varepsilon).$ But $g$ is clearly differentiable on $B(a,\varepsilon)$, so the fact that partial derivatives are zero at local minima yields that there is a point $x\in B(a,\varepsilon)$ (where $g$ attains its minimum) such that $\partial_ig(x)=0$ for all $i=1,\ldots,n$, i.e. $2\sum_{j=1}^n(w_j-f_j(x))\partial_if_j(x)=0\:.$ Let $p$ denote the column vector (i.e. the $n\times1$ matrix) with entries $w_j-f_j(x)$ then we see that we can write this equality as $2(\partial f(x))^Tp=0\:.$ But since by $(1)$, $\det((\partial f(x))^T)=\det(\partial f(x))\neq0$, this implies $p=0$ and thus $w=f(x).$ Uniqueness follows from $(3)$ . **Step 3:** Continuity and differentiability of $f^{- 1}.$ Let $V:=B(a,\varepsilon)\cap f^{-1}(W)$ (which is open, since $f$ is continuous). We have shown that $f:V\to W$ has an inverse $f^{-1}:W\to V.$ We can rewrite $(3)$ as $\|f^{-1}(v)-f^{-1}(w)\|\leq2\|v-w\|\quad\forall v,w\in W,\tag{4}$ and thus $f^-1$ is (Lipschitz) continuous. It remains to show that $f^-1$ is differentiable. Let $x\in V$ and $\mu:=Df(x)\in GL(\mathbb{R}^n)$. **Claim**: $f^{- 1}$ is differentiable at $w=f(x)$ and $D(f^-1)(w)=\mu^{-1}.$ $\begin{array}{l}\text{We first note that the claim that }D(f^{-1})(w)=\mu^{-1}=(Df(x))^{-1}\mathrm{~gives~}(4.1).\mathrm{~Note~that~since~}f\mathrm{~is}\\\text{continuously differentiable, }(4.1)\mathrm{~implies~that~}f^{-1}\mathrm{~is~continuously~differentiable~}(\mathrm{recall~Proposition}\end{array}$ 2.4.8). Furthermore, by definition $f(y)=.f(x)+\mu(y-x)+R(x,y-x)$ where (4.8) $\lim_{y\to x}\frac{\|R(x,y-x)\|}{\|y-x\|}=0\:,$ and thus $\mu^{-1}(f(y)-f(x))=y-x+\mu^{-1}(R(x,y-x))\:.$ Since $x=f^-1(w)$ and there is a unique $v\in W$ such that $y=f^-1(v)$, we can write this as $f^{-1}(v)=f^{-1}(w)+\mu^{-1}(v-w)-\mu^{-1}\left(R(f^{-1}(w),f^{-1}(v)-f^{-1}(w))\right)$ So we need to show that $\lim_{v\to w}\frac{\|\mu^{-1}R(f^{-1}(w),f^{-1}(v)-f^{-1}(w)\|}{\|v-w\|}=0\:.$ (4.9) Note that for $v\neq w$ we can estimate (4.10) $\begin{aligned}\frac{\|\mu^{-1}R(f^{-1}(w),f^{-1}(v)-f^{-1}(w)\|}{\|v-w\|}&\leq\|\mu^{-1}\|_{\mathrm{op}}\frac{\|R(f^{-1}(w),f^{-1}(v)-f^{-1}(w))\|}{\|f^{-1}(v)-f^{-1}(w)\|}\\&\cdot\frac{\|f^{-1}(v)-f^{-1}(w)\|}{\|v-w\|}\end{aligned}$ Note that since $f^-1$ is continuous we have $f^-1(v)\to f^{-1}(w)$ as $v\to w$ and so (4.8) yields $\lim_{v\to w}\frac{\|R(f^{-1}(w),f^{-1}(v)-f^{-1}(w))\|}{\|f^{-1}(v)-f^{-1}(w)\|}=0\:.$ But (4.7) implies that the second term on the RHS in (4.10) is bounded from above by 2, so we obtain (4.9)