This is an old revision of the document!
Table of Contents
Basics of detection theory
Estimation and Detection Theory based on a single observation, scalar or vector (Estimation of probabilities on a random sample of fixed size).
Decision Process
Every statistic decision problem has the same structure: A source $M$ generates m events $M_i$, i=0,1,…,m-1. We can assign in a unique way a signal $S_i \in$ $S$ to each of the m elements and we can send these signals to an observator. The transmitted signal will be changed by an interference and we will observe an altered signal. The decisor has to understand, seeing the received signal, what was the source of that signal and what message has been originally transmitted using all the informations avaiable (es: waveform of the generated signal, statistical behaviour of the channel, etc). The process described above is called Decision Process.
If $M$ is the set of possible events, $S$ is the space of the signals that contains the possible waveforms that can be transmitted to the observator. Normally we have a biunivocal correspondence (1:1) between the elements of $S$ and $M$. We can also indicate with $\Omega$ the space of the observable, that is the space of all the possible received signals. The elements of $\Omega$ are the possible realizations of the random vector $\vec{X}$, that is known the function of the probability density conditioned to the transmitted messages.
Due to the continuous nature of the noise, we can suppose that the dimensionality of $\Omega$ is infinite even if the size of $M$ and $S$ are not.
Finally we define the space of the possibile decisions $D$. Note that $D$ and $M$ must have the same dimensions due to the fact that the goal of the decision process is to extract the information contained in $M$.
Due to the fact that the noise is a random process, the law that decide the transaction from the event's space to the space of the observable have a probabilistic behaviour. In particular, if $\vec{X}$ is the vector of the recieving observable, we define:
\begin{equation} P(\vec{X} \mid M_i) , i=0,1,...,m-1 \end{equation}
as the probability density of the vector $\vec{X}$ when the event $M_i$ occurs (i.e. the corresponding signal $S_i$ is transmitted). The introduction of the probability distribution to describe the transaction from the space of the events and the space of the observable allow to define some decision procedures that can be of two types: parametric and non-parametric. We talk about parametric decision procedures when the probabilities of transition $p(x \mid M_i)$ , $i=0,...,m-1$ differs only from some values of one or more parameters. In this case the decision process coincides with finding the unknown parameters. We talk about non parametric decisions when the different hypotesis imply different behaviour of transaction probabilities $p(\vec(X) \mid M_i),i=0,...,m-1$. An example of paremetric decision between the events $M_0$ and $M_1$ could be:
\begin{equation} P(x \mid M_0)=\frac{1}{2\pi}e^{-x^2/2} \end{equation}
\begin{equation} P(x \mid M_1)=\frac{1}{2\pi}e^{-(x-3)^2/2} \end{equation}
note that these functions differ by the mean of the gaussian that is not null (and equal to 3) if the event $M_1$ occurs (the example is referred to a case with x scalar).
An example of a non parametric decision is when we have $P(\vec{X} \mid M_0)=f_0(\vec{X})$ and $P(\vec{X} \mid M_1)=f_1(\vec{X})$ where $f_0$ and $f_1$ are two different probability density but we will not consider them and we will focus on parametric decision.
Generally the vector of the observable is made from m real quantities and/or complex, misurable when recieving. In the radar case, these quantities are often the signals received in the dwell time and are associated to the target. If the radar is not coherent, the received pulses are associated with amplitudes (voltage values of the envelope); If the radar is coherent, the pulses are associated with amplitudes and phases.
Once we have the observables' vector $\vec{X}$ we have to keep a decision by interpreting it. The decision rules allow to go from the space of the observable $\Omega$ to the space of the decisions $D$. If the space of the events is continous the problem of decision become a problem of estimation. For example if $M$ is the space of the possible values of a parameter, to estimate this value we can use decision theory metodologies. The definition of decision rules determinates a partition of the observable's space. This partition is realized with rules that operates in the following way: Let's consider the case when the space of the events $M$ have a cardinality equal to two.
The above scheme (figure 2) consider the particular case of binary decision on single observation that is the case when the space of the events $M$ is only made by two elements $M_0$ and $M_1$; in the figure we omitted for simplicity the space of the signal $S$ because is supposed to be in a biunivocal correspondence (1:1) with the space of the events.
Let's suppose to have received the vector $\vec{X}$, the decision rule is:
$\text{if }\vec{X} \in \Omega_0 \Rightarrow M_0$ is true
$\text{if }\vec{X} \in \Omega_1 \Rightarrow M_1$ is true
As seen the decision theory tries to find rules for the partition of the space of observable $\Omega$; note that $\Omega_0$ and $\Omega_1$ are a partition of the space such that:
$\Omega_0 \cap \Omega_1 = \emptyset$ and $\Omega_0 \cup \Omega_1 = \Omega$
It can be observed that the transactions from $\Omega_0$ to $D_0$ and from $\Omega_1$ to $D_1$ are deterministic (these are the decision rules) while the transactions from $M_0$ to $\Omega_0$ and from $M_1$ to $\Omega_1$ are stochastic and characterized by the density of conditional probabilities:
$P(\vec{X} \mid M_0)$ , $P(\vec{X} \mid M_1)$
The radar detection procedures are particular cases of the decision problems and we are going to analize it; The scheme in figure 3 represents a radar detection scenario. From what we said until here, we have to decide considering two events:
$M_0$: the signal is generated by the noise.
$M_1$: the signal is generated by a target.
The hypothesis space or the space of decisions will be characterized by only two elements:
$H_0$ : the received signal contains only noise $\Leftrightarrow$ Target not present
$H_1$ : the received signal is the sum of an echo generated by a target and noise $\Leftrightarrow$ Target is present
Note that:
Since the transition from the space of the events $M$ to the space of the samples $\Omega$ has probabilistic nature, while the transaction from the space of the observables to the space of the decision has deterministic nature, there is always some probabilities to take the wrong decision. There are two type of errors that can occour:
1) The target is not present but the opposite decision is taken (false alarm);
2) The target is present but the opposite decision is taken (missed detection);
The probability to make a mistake like 1) is called Probability of False Alarm $P_{fa}$ and is defined as:
\begin{equation} p_{01} = P_\text{fa} = \alpha = \int_{\Omega_1}^{} P(\vec{X} \mid M_0)\, d\vec{X} \end{equation}
The probability to make a mistake like 2) is called Probability of Missed Detection and is defined as:
\begin{equation} p_{10} = \beta = 1 - P_d = \int_{\Omega_0}^{} P(\vec{X} \mid M_1)\, d\vec{X} \end{equation}
where $P_d$ is the probability of detection. The probabilties to take the correct decision are:
\begin{equation} p_{00} = \int_{\Omega_0}^{} P(\vec{X} \mid M_0)\, d\vec{X} \end{equation}
\begin{equation} p_{11} = P_d = \int_{\Omega_1}^{} P(\vec{X} \mid M_1)\, d\vec{X} \end{equation}
In these formulas we used this convention: $p_{ij}$ is the probability to select the event $j$ if the event $i$ happens
In some cases there may be available the a priori probability of the occurrence of the events:
$p_0 = p(M_0)$ , $p_1 = p(M_1)$
Based on the above definitions we must have: $p_0 + p_1 = 1$: probability of the cause M0 or M1.
$p_{00} + p_{01} = 1$: probability of a decision in the $M_0$ hypothesis.
$p_{10} + p_{11} = 1$: probability of a decision in the $M_1$ hypothesis.
The detection consists of a method to divide in an optimal way the space of possible observations $\Omega$, defining a Decision Criterion.
We can classify criteria in two categories: Bayesian and not Bayesian. The Bayesian criteria suppose the knowledge of a priori probabilities $p_0$ and $p_1$ while non-Bayesian does not suppose such knowledge and therefore requires “less information”. In any case, it's necessary to know $P(\vec{X} \mid M_0)$ and $P(\vec{X} \mid M_1)$
Bayes Criterion
These criteria utilize all the stochastic information like conditional and a priori probabilities. To create the partition in the space of the observables we use the following criterion: For every decision operation is associated a cost $L_{ij}$ that indicates the loss that occurs when the event $i$ reveals and the $j$ event is decided. Note that we can also associate a cost to the correct decisions.
Since we operate in an uncertainity condition we have to refer to the medium cost $E(L)$ (mean of the cost, called also as risk) that in the binary case is defined as:
\begin{equation} E(L) = \sum_{i=0}^1\sum_{j=0}^1 L_{ij}p_{ij}p_i = L_{00}p_{00}p_0 + L_{10}p_{10}p_1 + L_{01}p_{01}p_0 + L_{11}p_{11}p_1 \end{equation}
The space partition of observable must be realized in such a way that the average cost is minimized. From the definition and from the equalities $p_{00}=1-p_{01}$ and $p_{10}=1-p_{11}$ we have
\begin{equation} E(L) = L_{00}p_0 + L_{10}p_1 + p_0(L_{01} - L_{00})p_{01} + p_1(L_{11} - L_{10})p_{11} \end{equation}
where $p_0$ , $p_1$ , $p_{01}$ , $p_{11}$ are fixed and known.
If we substitute the definitions of $p_{01}$ and $p_{11}$ we have:
\begin{equation} E(L) = L_{00}p_0 + L_{10}p_1 + \int_{\Omega_1}^{} p_0(L_{01} - L_{00})P(\vec{X} \mid M_0)\, d\vec{X} + \int_{\Omega_1}^{} p_1(L_{11} - L_{10})P(\vec{X} \mid M_1)\, d\vec{X} \end{equation}
Using previous relationships:
\begin{equation} E(L) = \text{Quantity indipendent from } \Omega_1 + \\ + \int_{\Omega_1}^{} (p_0(L_{01} - L_{00})p(\vec{X} \mid M_0)\ - p_1(L_{10} - L_{11})p(\vec{X} \mid M_1))\, d\vec{X} \end{equation}
We remember that to estabilish the rules of the decision we have to find $\Omega_1$ knowing the weight values $L_{ij}$ and the functions $p(\vec{X} \mid M_i)$.
The space $\Omega_1$ is defined such that the result of (11) is minimum. We are going to analyze this problem in the unidimensional case for semplicity: Let's consider the geneirc function $f(x)$ in figure 5.
We want to define the integration domain $\Omega_1$ of $f(x)$ such that
\begin{equation} \int_{\Omega_1}^{} f(x)\, dx \end{equation}
is minimum. From the above figure we can observe that the integral is minimum (considering value and sign) if we take as integration domain $\Omega_1$ that is defined by the intervals where $f(x)$ has negative values.
The space $\Omega_1$ is composed by the elements $\vec{X} \in \Omega$ such that:
\begin{equation} \Omega_1 = \{\vec{X} :[ p_0(L_{01} - L_{00})p(\vec{X} \mid M_0)\ - p_1(L_{10} - L_{11})p(\vec{X} \mid M_1) ] < 0\} \end{equation}
An equivalent expression is:
\begin{equation} \Omega_1 = \left\{\vec{X} : \frac{P(\vec{X} \mid M_1)}{P(\vec{X} \mid M_0)} > \frac{p_0(L_{01} - L_{00})}{p_1(L_{10} - L_{11})} \right\} \end{equation}
For example, if we suppose the unidimensional case (x real and scalar), and that the functions $p(\vec{X} \mid M_0)$ and $p(\vec{X} \mid M_1)$ are Gaussian (like in the figure below), once fixed the value of
\begin{equation} \frac{p_0(L_{01} - L_{00})}{p_1(L_{10} - L_{11})} \end{equation}
we can define the two decision regions $\Omega_1$ and $\Omega_0$ due to the fact that the ratio between the two densities is monotone. If $x < x_a$ we decide that $M_0$ is true, while if $x > x_a$ we decide that $M_1$ is true.
Definition:
the ratio
\begin{equation} l(\vec{X}) = \frac{P(\vec{X} \mid M_1)}{P(\vec{X} \mid M_0)} \end{equation}
is called Likelihood Ratio.
Using this notation, the rule of decision in the binary case [eq (13)] is:
\begin{equation} l(\vec{X}) \geq \eta \Rightarrow M_1 \text{ hypotes is true} \end{equation}
where $\eta$ is a threshold value.
The decision space is divided in two regions $\Omega_1$ and $\Omega_0$ defined by:
\begin{equation} \Omega_1 = \{ \vec{X} : l(\vec{X}) \geq \eta \} \end{equation}
\begin{equation} \Omega_0 = \Omega - \Omega_1 \end{equation}
Due to the fact that the main difficulty in the application of the Bayes method consists in defining the values of costs $L_{ij}$ and in the knowledege of the a priori probability $p_0$ and $p_1$ in general this method is not used for radar detection while other methods are preferred.
Maximum a Posteriori Probability (MAP)
Using the error probability criterion, the definition of the regions $\Omega_0$ and $\Omega_1$ is obtained such that the error probability $p_E$
\begin{equation} p_E = p_{01}p_0 + p_{10}p_1 = P_{fa}p_0 + (1-P_d)p_1 \end{equation}
is minimized.
This is the particular case of equations (10) and (11) when $L_{01} - L_{00} = L_{10} - L_{11}$ or when $L_{00} = L_{11} = 0$ and $L_{01} = L_{10}$.
If we expand the equation (20), using the previous relations we can write that
\begin{equation} p_E = p_{0}p_{01} + p_1 \int_{\Omega_0}^{} P(\vec{X} \mid M_1)\, d\vec{X} \end{equation}
Using the relation $p_{01}=1-p_{00}$ we can express the probability of decision of (21) in terms of only one decision region:
\begin{equation} p_E = p_{0} + \int_{\Omega_0}^{} [-p_0 p(\vec{X} \mid M_0) + p_1 p(\vec{X} \mid M_1) ]\, d\vec{X} \end{equation}
Reasoning like in the Bayes Criterion, the region $\Omega_0$ is defined as:
\begin{equation} \Omega_0 = \left\{\vec{X} : \frac{P(\vec{X} \mid M_0)}{P(\vec{X} \mid M_1)} > \frac{p_1}{p_0} \right\} \end{equation}
Conversely, the region $\Omega_1$ is defined as:
\begin{equation} \Omega_1 = \left\{\vec{X} : \frac{P(\vec{X} \mid M_1)}{P(\vec{X} \mid M_0)} > \frac{p_0}{p_1} \right\} \end{equation}
Using (23) we have
\begin{equation} \Omega_1 = \left\{\vec{X} : \frac{P(\vec{X} \mid M_0)}{P(\vec{X} \mid M_1)} \leq \frac{p_1}{p_0} \right\} \end{equation}
and we can obtain (24). The probability error criterion consist in comparing the likelihood ratio with the ratio $p_1 / p_0$.
Alternatively, we can derive (24) from (14) imposing $L_{01} - L_{00} = L_{10} - L_{11}$, that means that the probability error criterion is a particular case of the Bayes criterion.
This criterion is also known as Maximum A-Posteriori Probability (MAP) and we can observe that: if $p(\vec{X})$ is a marginal density of the observable, we have $p(\vec{X}) = p(\vec{X} \mid M_0)p_0 + p(\vec{X} \mid M_1)p_1$ (note that $p_0 = p(M_0) , p_1 = p(M_1)$) and we can write:
\begin{equation} \begin{array} \text{p}(\vec{X} \mid M_0) = p(\vec{X}) p(M_0 \mid \vec{X}) &\text{,}& p(\vec{X} \mid M_1) = p(\vec{X}) p(M_1 \mid \vec{X}) \end{array} \end{equation}
and so we can express the decision criterion of (23) as:
\begin{equation} \Omega_0 = \left\{\vec{X} : \frac{p(M_0 \mid \vec{X})}{p(M_1 \mid \vec{X})} > 1 \right\} \end{equation}
The functions $p(M_0 \mid \vec{X})$ , $p(M_1 \mid \vec{X})$ are called a posteriori probability density. We can say that MAP criterion is based on the comparison of probabilities of the causes conditioned to the effect (a posteriori probability). The MAP criterion can be extended to the m-ary case, with m > 2. To make it possible we have to determine the Bayes cost in the particular case of:
\[L_{ij} = 0 \text{ if } i = j \]
\[L_{ij} = 1 \text{ if } i \ne j\]
We obtain that the cost is minimum if the decision region for each hypotesis ${M_k}$ , ${k=1,2,...,m}$ is:
\begin{array} p(\vec{X} \mid M_k)p_k > p(\vec{X} \mid M_j)p_j &&&& \text{for each } j \ne k \end{array}
Therefore, the MAP decisor is: the calculation of the likelihood function $p(\vec{X} \mid M_k)$ for each hypotesis, multiplied by a priori probability of that hypotesis and finally the choise of the maximum.
Due to the fact that in general we don't have the a priori probabilities $p_0$ and $p_1$, the MAP method is not used for radar detection.
Maximum Likelihood Criterion (ML) and Neyman-Pearson Criterion
Analisys of Neyman-Pearson's criterion to allow maximization of the probability of correct detection Pd (or minimization of the probability of false alarm Pfa).
We can use the same procedure as before for these criteria. Knowing the probability functions $p(\vec{X} \mid M_0)$ and $p(\vec{X} \mid M_1)$ we can calculate the likelihood ratio
\begin{equation} l(\vec{X}) = \frac{p(\vec{X} \mid M_1)}{p(\vec{X} \mid M_0)} \end{equation}
and we assume that $H_1$ is true if
\begin{equation} l(\vec{X}) \geq \eta \end{equation}
where $\eta$ is a parameter that depends from the costs and from the a priori probabilities. Due to the fact that often we don't know such values, in the hypotesis of maximum uncertainity, we can assume $\eta = 1$ (Maximum Likelihood Criterion (ML)).
If we assume $\eta = 1$ (maximum uncertainity) we can rewrite the decision rule (29) as:
\begin{equation} p(\vec{X} \mid M_1) \geq p(\vec{X} \mid M_0) \Rightarrow M_1 \text{ is chosen} \end{equation}
From the radar detection point of view the ML criterion is not sufficient because it doesn't allow to check the values of the two error probabilities (false alarm and detection). To control these two values we can use the $\eta$ parameter. We fix $\eta$ such that the probability of false alarm $P_{fa}$ is:
\begin{equation} P_\text{fa} = \int_{\Omega_1}^{} p(\vec{X} \mid M_0)\, d\vec{X} \leq \alpha_0 \end{equation}
where $\alpha_0$ is a prefixed value.
Once we have the value of $\eta$ (that is a function of $\alpha_0$; in general, for a radar application, we assume $\alpha_0 = P_{fa}$) it is necessary to minimize the second type error probability $\beta = 1-P_d$ (or equally, we maximize the probability of correct detection $P_d$). This procedure is always used in the radar detection field and it's called Neyman-Pearson criterion. We are going to describe the Neyman-Pearson procedure;
we define:
\begin{equation} \int_{\Omega_1}^{} p(\vec{X} \mid M_0)\, d\vec{X} = \alpha_0 \end{equation}
The probability of correct detection is:
\begin{equation} P_d = 1 - \beta = \int_{\Omega_1}^{} p(\vec{X} \mid M_1)\, d\vec{X} \end{equation}
To maximize the probability of correct detection $P_d$, for a fixed $P_{fa}$, using the theory of Lagrange's multipliers we have to determine the region $\Omega_1$ of the decisions' space that maximizes the quantity
\begin{equation} Q = P_d - \lambda(P_{fa} - \alpha_0) \end{equation}
or equally
\begin{equation} Q = \int_{\Omega_1}^{} p(\vec{X} \mid M_1)\, d\vec{X} - \lambda \int_{\Omega_1}^{} p(\vec{X} \mid M_0)\, d\vec{X} + \lambda \alpha_0 = \\ = \int_{\Omega_1}^{} [p(\vec{X} \mid M_1) - \lambda p(\vec{X} \mid M_0)] \, dx + \text{cost} \end{equation}
Reasoning similar as Bayes criterion and MAP, the $\Omega_1$ region is defined as:
\begin{equation} \Omega_1 = \{\vec{X} : [p(\vec{X} \mid M_1) - \lambda p(\vec{X} \mid M_0)] > 0\} = \{\vec{X} : l(\vec{X}) \geq \lambda\} \end{equation}
and therefore
\begin{equation} \Omega_0 = \{\vec{X} : l(\vec{X}) < \lambda\} \end{equation}
The value of the Lagrange's multiplier $\lambda$ is obtained solving the equation
\begin{equation} p(\vec{X} \in \Omega_1 \mid M_0) = \alpha_0 \end{equation}
The Bayes criterion, MAP and Neyman-Pearson are equavalent from the algorithmic point of view and are exploited comparing the likelihood ratio with a threshold chosen conveniently, the difference between them is how this threshold value is defined.
In general the most used criterion is the Neyman-Pearson's.
Due to the structure of the final element of the radar chain, it is necessary to define $P_{fa}$ at design time. Once $\alpha_0$ is fixed we can calculate the threshold value $\lambda$. As an example we can consider a monodimensional case. If $N(\mu,\sigma)$ is a Gaussian random variable with mean $\mu$ and standard deviation $\sigma$
\begin{equation} p(\vec{X} \mid M_0)=N(0,1) , p(\vec{X} \mid M_1)=N(\mu,1) \end{equation}
This problem is a tipical case of a deterministic signal as we suppose that $\mu$ is constant. We need to identify the regions $\Omega_0$ and $\Omega_1$. The problem is to find the threshold value $X_T$ that delimits the two decision regions.
If we adopt the maximum likelihood criterion (for which $\lambda = 1$) the threshold value $X_T$ is the point $X_T=\mu/2$ that is at the intercption point of the two curves defined by $p(x \mid M_0)$ and $p(x \mid M_1)$.
If we use the Neyman-Pearson criterion, $X_T$ must be calculated based on the fixed value of $P_{fa}$. If we fix $P_{fa}$ we have:
\begin{equation} X_T = \Phi^{-1} (1-P_{fa}) \end{equation}
where the fucntion $\Phi(x)$ is $\Phi(x) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{x}e^{-t^2/2} \,dt$.
If we use the Bayes criterion we will obtain that
\begin{equation} X_T = \frac{\mu}{2} + \frac{ln(k)}{\mu} \end{equation}
where in the constant $k$ are contained the values of the costs and the a priori probabilities.
Concluding, the calculation method of the threshold changes in function of the adopted criterion. Sometimes it's more convenient to express the likelihood ratio using the function natural logarithm. Consequently, due to the fact that the logarithm is an increasing function, the criterion becomes:
\begin{equation} \Lambda = ln(l(\vec{X}))^{<}_{>} ln(\lambda) \end{equation}
This type of formulation allows to simplify the calculation in case that the density functions are exponentials, normal, Rayleigh or similar.
Neyman-Pearson (single pulse + fixed target)
Application of Neyman-Pearson's criterion for detection of a fixed target using a single pulse.
Let's suppose that we have a target which produce an echo with a constant amplitude $A$ (the received signal is a sine with amplitude $A$) and the RMS voltage of the gaussian noise is $\sigma$. We suppose to take a decision on a single pulse recived in output from an enevelope detector. Using the theory illustrated in the previous section we can say that:
a) If only noise is present ($M_0$ hypotesis) the probability density function of the noise envelope is a Rayleigh:
\begin{equation} p(x \mid M_0) = \frac{x}{\sigma^2}e^{\frac{-x^2}{2\sigma^2}}u(x) \end{equation}
where $u(\cdot)$ is the step function.
b) If the target is present ($M_1$ hypotesis) the probability density associated to the envelope of the received signal is of type Rice
\begin{equation} p(x \mid M_1) = \frac{x}{\sigma^2}e^{\frac{-(x^2 + A^2)}{2\sigma^2}}I_0(\frac{Ax}{\sigma^2})u(x) \end{equation}
The likelihood ratio is:
\begin{equation} l(x)=\frac{p(x \mid M_1)}{p(x \mid M_0)} = e^{\frac{-A^2}{2\sigma^2}}I_0(\frac{Ax}{\sigma^2}) \end{equation}
The decisor will decide for the hypotesis that the target is present (event $M_1$) if we have:
\begin{equation} l(x) \ge \eta \Leftrightarrow I_0(\frac{Ax}{\sigma^2}) \ge \eta e^{\frac{A^2}{2\sigma^2}} \end{equation}
where $\eta$ is a threshold that must be properly calculated. In the Neyman-Pearson criterion the threshold $\eta$ is calculated based on the desired $P_{fa}$. As the amplitude $A$ of the target and the RMS noise voltage $\sigma$ are known the condition (46) can be expressed also like this:
\begin{equation} I_0 \left(\frac{Ax}{\sigma^2} \right) \ge \text{constant} \end{equation}
Since the Bessel function is monotonic, it's possible to apply the reverse Bessel function to both members of (47) obtaining an equivalent decision rule:
\begin{equation} x \ge T \end{equation}
where $T$ is a threshold voltage.
The value $T$ can be calculated once we choose the decision criterion. If the decision criterion is the Neyman-Pearson's, $T$ must be calculated such that
\begin{equation} P_{fa} = \alpha_0 = \int_{T}^{\infty} \frac{x}{\sigma^2}e^{\frac{-x^2}{2\sigma^2}} \, dx = e^{\frac{-T^2}{2\sigma^2}} \end{equation}
and we obtain
\begin{equation} T = \sigma \sqrt{-2ln(P_{fa})} \end{equation}
Note that the calculation indicated in the previous equations is optimal due to the Neyman-Pearson criterion such that it give us the maximum $P_d$. Due to the fact that the value of $T$ is not dependent from $A$ the procedure is optimal also if $A$ is unknown. So, in case of fixed target when the SNR ratio is unknown (remember that $SNR = \frac{A^2}{2\sigma^2}$), the Neyman-Pearson preocedure is optimal only if the RMS voltage of the noise $\sigma$ is known.
Neyman-Pearson (single pulse + SW2 target)
Application of Neyman-Pearson's criterion for detection of a fluctuating target modeled as Swerling 2 using a single pulse.
The Neyman-Pearson criterion can be succesfully applied also in the case of a fluctuating target (type SW2) operating on a single pulse and with a non coherent detection. We assume $x(t)$ is the amplitude of the received voltage, sum of the interesting signal $s(t)$ and the noise $n(t)$, $\sigma$ is the variance of the noise process and $\sigma^2_s$ the power of the signal.
\begin{equation} x(t) = s(t) + n(t) \end{equation}
The power $s^2$ of the signal $x(t)$ is equal to
\begin{equation} s^2 = \sigma^2_s + \sigma^2 \end{equation}
The probability densities of the possible causes (only noise and interesting signal + noise) are:
\begin{equation} p(x \mid M_0) = \frac{x}{\sigma^2}e^{\frac{-x^2}{2\sigma^2}}u(x) \end{equation}
\begin{equation} p(x \mid M_1) = \frac{x}{s^2}e^{\frac{-x^2}{2s^2}}u(x) \end{equation}
The likelihood ratio is: \begin{equation} l(x) = \frac{\sigma^2}{s^2}e^{\frac{x^2}{2}(\frac{1}{\sigma^2} - \frac{1}{s^2})} \end{equation}
Consecutively the decision rule is:
\begin{equation} {e^{\frac{x^2}{2}(\frac{1}{\sigma^2} - \frac{1}{s^2})} }^{{>}^{H_1}}_{<_{H_0}} \eta \frac{s^2}{\sigma^2} \end{equation}
Since $\frac{1}{\sigma^2} - \frac{1}{s^2} > 0$ and the log function is monotonically increasing, the relation (56) can be expressed taking the natural log; So, we can write the quadratic detector:
\begin{equation} {x^2}^{{>}^{H_1}}_{<_{H_0}} T \end{equation}
(Obviusly it is possible to take the square root of the last equation and use a linear detector, if we decide on the single pulse).
The decision threshold $T$ must be chosen such that:
\begin{equation} P_{fa} = \int_{v_T}^{\infty} p(x \mid M_0) \, dx = \alpha_0 \end{equation}
where $\alpha_0$ is the desired value of $P_{fa}$ so
\begin{equation} v_t =\sigma \sqrt{-2ln(\alpha_0)} \end{equation}
finally we can write:
\begin{equation} T = v_T^2 = -2\sigma^2 ln (\alpha_0) \end{equation}
N-Pulses Detection
Detection of a target with n-dimension gaussian using n pulses
To apply the criterion of Neyman-Pearson to the case when we have n pulses (fixed, or SW1, or SW2, etc.) we have to introduce some mathematical concepts. In particular, to analyze the case of fixed target and coherent detection we need to introduce the Gaussian Multivariate, real and complex. We start from the bivariate. The Gaussian probability density function with mean $\eta$ and variance $\sigma^2$ is: \begin{equation} f_x(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\eta)^2}{2\sigma^2}} \end{equation}
The characteristic function associated to this equation is:
\begin{equation} \Phi_x(\omega) = E[e^{j \omega x}] = \int_{-\infty}^{\infty} f_x(x) e^{j \omega x} \,dx = exp(j \omega x)exp(-\sigma^2 \omega^2 / 2) \end{equation}
We consider the random variables pair $(y_1,y_2)$ and we suppose that they are independent, Gaussian with zero mean and variance equal to one. So we can write the joint density of $y_1$ and $y_2$
\begin{equation} f_{y_1y_2}(y_1,y_2) = \frac{1}{2\pi}exp[-(y_1^2+y_2^2)/2] \end{equation}
Applying the linear transformation
\begin{equation} x_1 = \frac{\sigma}{\sqrt{2}}(\alpha y_1 + \beta y_2) \ ,\ x_2=\frac{\sigma}{\sqrt{2}}(-\alpha y_1 + \beta y_2) \end{equation}
If we multiply the vector $\begin{bmatrix} y_1 \\ y_2 \end{bmatrix}$ for the matrix $P = \frac{\sigma}{\sqrt{2}} \begin{bmatrix} \alpha & \beta \\ -\alpha & \beta \end{bmatrix}$ , where $\alpha = \sqrt{1-\rho} \ \text{,}\ \beta=\sqrt{1+\rho}$, it's easy to verify the the pair of random gaussian variables $(x_1,x_2)$ have a correlation cohefficient equal to $\rho$ and every variable has a variance equal to $\sigma^2$ . So we have
\begin{equation} E[x_1 \cdot x_2] = \frac{\sigma^2}{2}(-\alpha^2 + \beta^2) = \rho \sigma^2 \end{equation}
\begin{equation} E[x_1^2] = E[x_2^2] = \frac{\sigma^2}{2}(\alpha^2 + \beta^2) = \sigma^2 \end{equation}
The joint density of $(x_1,x_2)$ is:
\begin{equation} f_{x_1x_2}(x_1,x_2)=\frac{1}{\left| J(x_1,x_2) \right|} f_{y_1y_2}\left(y_1(x_1,x_2),y_2(x_1,x_2)\right) \end{equation}
where
\begin{equation} \left| J(x_1,x_2) \right| = \left| \begin{matrix} \frac{\partial x_1}{\partial y_1} & \frac{\partial x_1}{\partial y_2} \\ \frac{\partial x_2}{\partial y_1} & \frac{\partial x_2}{\partial y_2} \end{matrix} \right| = \left| \begin{matrix} \frac{\sigma}{\sqrt{2}}\alpha & \frac{\sigma}{\sqrt{2}}\beta \\ -\frac{\sigma}{\sqrt{2}}\alpha & \frac{\sigma}{\sqrt{2}}\beta \end{matrix} \right| = \sigma^2 \alpha \beta = \sigma^2 \sqrt{1- \rho^2} \end{equation}
and:
\begin{equation} y_1(x_1,x_2)=\frac{x_1-x_2}{\sqrt{2} \sigma \sqrt{1 - \rho}} \ ,\ y_2(x_1,x_2)=\frac{x_1 + x_2}{\sqrt{2} \sigma \sqrt{1 + \rho}} \end{equation}
so:
\begin{equation} f_{x_1x_2}(x_1,x_2) = \frac{1}{2 \pi \sigma^2 \sqrt{1 - \rho^2}} exp\left(-\frac{x_1^2 - 2 \rho x_1 x_2 + x_2^2}{2 \sigma^2 (1 - \rho^2)}\right) \end{equation}
Generally in the case of non zero mean we can write the “gaussian bevariate” as:
\begin{equation} f_{x_1x_2}(x_1,x_2) = A exp \left( -\frac{1}{2(1-\rho^2)}\left[ \frac{(x_1 - \eta_1)^2}{\sigma_1^2} - \frac{2\rho (x_1-\eta_1)(x_2-\eta_2)}{\sigma_1 \sigma_2} + \\ + \frac{(x_2-\eta_2)^2}{\sigma_2^2} \right] \right) \end{equation}
Since
\begin{equation} \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} f_{x_1x_2}(x_1,x_2) \,dx_1dx_2 = 1 \end{equation}
The normalization costant A is:
\begin{equation} A = \frac{1}{2 \pi \sigma_1 \sigma_2 \sqrt{1 - \rho^2}} \end{equation}
Once we know the means $\mu_1$ and $\mu_2$ the standard deviations $\sigma_1$ and $\sigma_2$ and crosscorrelation coefficient $\rho$ associated to the random variables $x_1$ and $x_2$, the joint density probability is completely specified. The function of probability density of one of the two random variables is obtained by expressing the equation (72) among the remaining random variable using the relation
\begin{equation} f_{x_1}(x_1) = \int_{-\infty}^{\infty} f_{x_1x_2}(x_1,x_2) \,dx_2 \end{equation}
In that way we can demostrate that $f(x_i)$ $i=1,2$ is gaussian. Then, from the (72) if the cross-correlation coefficient $\rho$ between the two random variables $x_1$ and $x_2$ is null then
\begin{equation} f_{x_1x_2}(x_1,x_2) = f_{x_1}(x_1)f_{x_2}(x_2) \end{equation}
this means that the two random variables are independent. For the gaussian random variable, independece and scorrelation coincide.
The cross-correlation coefficient $\rho$, gives an idea of the dipendence between the two variables. In particular, if $x_1$ and $x_2$ are for example a representation of a measure's process, if $\left| \rho \right| \cong 1$ the pair $(x_1,x_2)$ tend to thicken along a straight line as indicated in figure 9a. If instead $\rho=0$ we can have the situation shown in the figure 9b.
The characteristic function assiociated to the gaussian bivariate is:
\begin{equation} \Phi(\omega_1,\omega_2) = exp(j(\omega_1 \eta_1 + \omega_2 \eta_2))exp \left(-\frac{\omega_1^2 \sigma_1^2 +2 \rho \sigma_1 \sigma_2 \omega_1 \omega_2 + \omega_2^2 \sigma_2^2}{2} \right) \end{equation}
Let's consider a random vector $\vec{X} = [x_1,x_2,...,x_n]^T$ where $x_i$ are random variables. The vector $\vec{X}$ is described by the joint density function
\begin{equation} f_{\vec{X}}(x_1,...,x_n) = f_{\vec{X}}(\vec{X}) \end{equation}
The characteristic function associated to the random vector $\vec{X}$ is defined by
\begin{equation} \Phi(\vec{\Omega}) = E \left[exp(j \vec{\Omega}^T \vec{X}) \right] \end{equation}
where $\vec{\Omega}=[\omega_1,\omega_2,...,\omega_n]^T$
We write $\vec{X} = \boldsymbol{\vec{P}} \vec{Y}$ where:
$\vec{Y}$ is the vector with independent and gaussian components with zero mean and variance equal to one, that means that the density is of type negative exponential (except for the normalization constant) and the quadratic form is $\vec{Y}^T \vec{Y} = \sum_{i=1}^{n} y_i^2$;
$\boldsymbol{\vec{P}}$ is a coefficients matrix and is obtained, knwoing that $\vec{Y} = \vec{P}^{-1} \vec{X}$
$\vec{Y}^T \vec{Y} = \vec{X}^T \boldsymbol{\vec{P}^{-T}} \boldsymbol{\vec{P}^{-1}} \vec{X} = X^T \boldsymbol{Q} \vec{X} $
We can repeat the procedure used to obtain the equation (70) and (71); the resulting density function is, except of a normalization constant, the exponential of the squared form: $X^T \boldsymbol{\vec{Q}} \vec{X}$ where $\boldsymbol{\vec{Q}} = \boldsymbol{P^{-T}} \boldsymbol{P^{-1}}$.
The covariance matrix of $\vec{X}$ (remember that the components of $\vec{Y}$ are scorrelated ) is: \begin{equation} \boldsymbol{\vec{M}_{\vec{x}}} = E \left[\vec{X} \vec{X}^T \right] = \boldsymbol{\vec{T}} \boldsymbol{\vec{M}_y} \boldsymbol{\vec{P}^T} = \boldsymbol{\vec{P}} \boldsymbol{\vec{P}^T} \end{equation}
and so
\begin{equation} \boldsymbol{\vec{Q}} = \boldsymbol{\vec{M}_x^{-1}} \end{equation}
We can define that: the random vector $\vec{X}$ is a gaussian multivariate if any linear combination $v$ of the random variables $x_i$
\begin{equation} v = a_1 x_1 + a_2 x_2 + ... + a_n x_n = \vec{A}^T \vec{X} \end{equation}
is gaussian.
From that definition result that the characteristic function of a gaussian random vector is expressed (if the means vector $\left< \vec{X} \right>$ is null) as:
\begin{equation} \Phi(\vec{\Omega}) = exp \left(-\frac{\vec{\Omega}^T \boldsymbol{\vec{M}} \vec{\Omega}}{2} \right) \end{equation}
where $\boldsymbol{\vec{M}}$ is covariance matrix:
\begin{equation} \boldsymbol{\vec{M}} = E \left[ \left( \vec{X} - \left< \vec{X} \right> \right) \left( \vec{X} - \left< \vec{X} \right> \right)^T \right] \end{equation}
The matrix $\boldsymbol{\vec{M}}$ is symmetrical and is semidefined positive.
Instead, if $\left< \vec{X} \right> \neq 0$
\begin{equation} \Phi(\vec{\Omega}) = exp \left( -\frac{\vec{\Omega}^T \boldsymbol{\vec{M}} \vec{\Omega}}{2} \right) exp \left( j \vec{\Omega}^T \left< \vec{X} \right> \right) \end{equation}
If we antitransofrm using Fourier the equation (84) we obtain the density function of the gaussian multivariate.
\begin{equation} f_X(X) = \frac{1}{( 2\pi)^{\frac{N}{2}} \Delta^{\frac{1}{2}}} exp \left( -\frac{ \left( \vec{X} - \left< \vec{X} \right> \right)^T \ \boldsymbol{\vec{M}^{-1}} \ \left(\vec{X} - \left< \vec{X} \right> \right)}{2} \right) \end{equation}
where $\Delta = det\left( \boldsymbol{\vec{M}} \right)$
We can say that a gaussian random vector $\vec{X}$ is completely defined by the matrix $\boldsymbol{\vec{M}}$ and by the vector of the means $\left< \vec{X} \right>$ .
The matrix $\boldsymbol{\vec{M}}$ can be estimated for example starting from experimental data. The tipic form of the estimator of $\boldsymbol{\vec{M}}$ for $\left< \vec{X} \right> = 0$, (sample covariance matrix) is:
\begin{equation} \boldsymbol{\vec{\hat{M}}} = \frac{1}{v} \sum_{i=1}^{v} \vec{X_i} \vec{X_i}^T \end{equation}
where $v$ is the number of observation of the vector $\vec{X}$.
The gaussian multivariate can be also obtained generalizing the procedure that allow to define the gaussian bivariate to the n-dimensional case.
We can start from $n$ variables $y_i$ $i=1,...,n$ independent (vector $\vec{Y}$) that has a density function as
\begin{equation} f_{\vec{Y}}(\vec{Y}) = \frac{1}{ \left( 2 \pi \right)^{\frac{n}{2}} \left( det \vec{\Lambda} \right)^{\frac{1}{2}} } exp \left[ \frac{ -\left( \vec{Y} - \left< \vec{Y} \right> \right)^T \vec{\Lambda}^{-1} \left( \vec{Y} - \left< \vec{Y} \right> \right) }{2} \right] \end{equation}
where $\vec{\Lambda}$ is a diagonal matrix that has variance of the random variables $y_i$ as elements, that means $\lambda_{ii}=\text{var}(y_i) \ i=1,...,n$.
If we apply the linear transformation
\begin{equation} \left( \vec{X} - \left< \vec{X} \right> \right) = \boldsymbol{\vec{U}} \left( \vec{Y} - \left< \vec{Y} \right> \right) \end{equation}
where $\boldsymbol{\vec{U}}$ is a unitary orthonormal matrix, using the fundamental theorem for the random variables transformation, we have, supposing zero means:
\begin{equation} f_X(\vec{X}) = f_Y(\vec{Y} = \boldsymbol{\vec{U}} \vec{X}) = \\ = A exp \left \{ -\frac{1}{2} \left( \boldsymbol{\vec{U}} \vec{X} \right)^T \boldsymbol{\vec{\Lambda}^{-1}} \left( \boldsymbol{\vec{U}} \vec{X} \right) \right \} = \\ = A exp \left\{ -\frac{1}{2} \vec{X}^T \ \vec{U}^T \ \boldsymbol{\vec{\Lambda}^{-1}} \ \boldsymbol{\vec{U}} \ \vec{X} \right\} = \\ = A exp \left\{ -\frac{1}{2} \vec{X}^T \boldsymbol{\vec{M}}^{-1} \vec{X} \right\} \end{equation}
where $\boldsymbol{M^{-1}}$ is the inverse of the covariance matrix $\boldsymbol{\vec{M}}$ of $\vec{X}$ that can be spectral decomposed as $\boldsymbol{\vec{M}} = \boldsymbol{\vec{U}} \ \boldsymbol{\vec{\Lambda}} \ \boldsymbol{\vec{U}^T} $.
The probability density of the vector $\vec{X}$ is:
\begin{equation} f_{\vec{X}}(\vec{X}) = \frac{1}{(2 \pi)^{\frac{n}{2}} (det \Lambda)^{\frac{1}{2}} } exp \left( -\frac{1}{2} \left( \vec{X} - \left< \vec{X} \right> \right)^T \ \boldsymbol{\vec{U}} \ \boldsymbol{\vec{\Lambda}^{-1} \ \boldsymbol{\vec{U}^T}} \ \left( \vec{X} - \left< \vec{X} \right> \right) \right) \end{equation}
To find the equation (90) we used the fact that since $\boldsymbol{\vec{U}^{-1}} = \boldsymbol{\vec{U}^T}$ we can write
\begin{equation} \left( \vec{Y} - \left< \vec{Y} \right> \right) = \boldsymbol{\vec{U}^T} \left( \vec{X} - \left< \vec{X} \right> \right) \end{equation}
and also $\text{det} \boldsymbol{\vec{U}} = 1$
If we calculate the mean of the product $\left( \vec{X} - \left< \vec{X} \right> \right)\left( \vec{X} - \left< \vec{X} \right> \right)^T$ we obtain that the covariance matrix of $\vec{X}$ is equal to
\begin{equation} \boldsymbol{\vec{M_x}} = E \left\{ \vec{U} \left( \vec{Y} - \left< \vec{Y} \right> \right) \left( \vec{Y} - \left< \vec{Y} \right> \right)^T \vec{U^T} \right\} = \boldsymbol{\vec{U}} \ \boldsymbol{\vec{\Lambda}} \ \boldsymbol{\vec{U}^T} \end{equation}
So, the gaussian multivariate distribution of $\vec{X}$ is expressed as
\begin{equation} f_X(\vec{X}) = \frac{1}{ \left( 2 \pi \right)^{\frac{n}{2}} \left( det \boldsymbol{\vec{M_x}} \right)^{\frac{1}{2}} } exp \left( -\frac{1}{2} \left( \vec{X} - \left< \vec{X} \right> \right)^T \ \boldsymbol{\vec{M_x}^{-1}} \left( \vec{X} - \left< \vec{X} \right> \right) \right) \end{equation}
All we have done is necessary to extend the Neyman-Pearson criterion to the case when we have multiple pulses (or samples) associated to the target. Since normally we elaborate the components in phase and quadrature $I$ and $Q$, we have to work with complex numbers and consecutively is necessary to define the complex multivariate gaussian vectors.
We indicate with $\vec{Z}$ the vector such that its $n$ components are samples of the complex envelope of the received signal. In radar applications this signal can be modelled as a stationary process and has narrow band.
A complex random vector $\vec{Z}$ can be obviously written as:
\begin{equation} \vec{Z} = \vec{X} + j \vec{Y} \end{equation}
where $\vec{X}$ and $\vec{Y}$ are two real gaussian vectors, and we can consider instead of $\vec{Z}$, the real vector with a double dimension:
\begin{equation} \vec{Z^{'}} = \left[ \begin{matrix} \vec{X} & \vec{Y} \end{matrix} \right]_{2n \ \times \ 1}^T \end{equation}
The covariance matrix of $\vec{Z}$, if the hypotesis of narrow band stationary process (see above) with zero mean is valid, is equal to
\begin{equation} \boldsymbol{\vec{M}} = E \left[ \begin{bmatrix} \vec{X} \\ \vec{Y} \end{bmatrix} \begin{bmatrix} \vec{X^T} & \vec{Y^T} \end{bmatrix} \right] = \begin{bmatrix} \boldsymbol{\vec{V}} & - \boldsymbol{\vec{W}} \\ \boldsymbol{\vec{W}} & \boldsymbol{\vec{V}} \end{bmatrix}_{2n \ \times \ 2n} \end{equation}
The $\boldsymbol{\vec{M}}$ matrix contains blocks that are equal two by two, so the information on the $\vec{Z}$ process is only related to the knowledge of $2n^2$ parameters and not $4n^2$ because we need to have only the $\boldsymbol{\vec{V}}$ and $\boldsymbol{\vec{W}}$ matrixes. $\boldsymbol{\vec{V}}$ is the covariance matrix of $\vec{X}$ and $\vec{W}$ is the mutual covariance matrix of $\vec{X}$ and $\vec{Y}$.
Note that the relations
\begin{equation} \boldsymbol{\vec{V^T}} = \boldsymbol{\vec{V}} \ \ \ \text{and} \ \ \ \boldsymbol{\vec{W^T}} = -\boldsymbol{\vec{W}} \end{equation}
are valid.
The covariance matrix of a complex vector can be written in the conventional way as:
\begin{equation} \boldsymbol{\vec{M}} = \boldsymbol{\vec{V}} + j \boldsymbol{\vec{W}} \end{equation}
and by definition:
\begin{equation} \boldsymbol{\vec{M}} = E \left[ \left( \vec{Z} - \left< \vec{Z} \right> \right)^{*} \left( \vec{Z} - \left< \vec{Z} \right> \right)^T \right] \end{equation}
where $\left< \vec{Z} \right>$ is the vector of the mean values.
The (99) extends the concept of the gaussian multivariate to the case of complex random vectors.
In case of a complex process the matrix $\boldsymbol{\vec{M}}$ is Hermitian and so:
\begin{equation} \boldsymbol{\vec{M^{*}}} = \boldsymbol{\vec{M}^{T}} \end{equation}
and also it must be semidefined positive, so the eigenvalues (real) are positive or null.
If the eigenvalues of $\boldsymbol{\vec{M}}$ are known it is possible to operate spectral decomposition of $\boldsymbol{\vec{M}}$
\begin{equation} \boldsymbol{\vec{M}} = \boldsymbol{\vec{U}} \ \boldsymbol{\vec{\Lambda}} \ \boldsymbol{\vec{U}^H} \end{equation}
where $\boldsymbol{\vec{U}}$ is the matrix of eigenvectors and $\boldsymbol{\vec{\Lambda}}$ is the diagonal matrix containing the eigenvalues of the matrix $\boldsymbol{\vec{M}}$. The apex H indicates an operation of transposition and conjugation.
The density function of the complex gaussian random vector is:
\begin{equation} f_{\vec{Z}}(\vec{Z}) = \frac{1}{(\pi)^N \ \Delta} exp \left( -\left( \vec{Z} - \left< \vec{Z} \right> \right)^T \ \boldsymbol{\vec{M^{-1}}} \ \left( \vec{Z} - \left< \vec{Z} \right> \right)^{*} \right) \end{equation}
where $\Delta = \text{det} \boldsymbol{\vec{M}}$.
Note that the (102) must be seen as the density function of the random vector $\vec{Z} = \vec{X} + j \vec{Y}$.
Sometimes the covariance matrix $\boldsymbol{\vec{M}}$ is defined as:
\begin{equation} \boldsymbol{\vec{M}} = \frac{1}{2} E \left[ \left( \vec{Z} - \left< \vec{Z} \right> \right)^{*} \ \left( \vec{Z} - \left< \vec{Z} \right> \right)^{T} \right] \end{equation}
and so we have:
\begin{equation} f_{\vec{Z}}(\vec{Z}) = \frac{1}{ (2 \pi)^N \Delta } exp \left( -\frac{1}{2} \left( \vec{Z} - \left< \vec{Z} \right> \right)^{T} \ \boldsymbol{\vec{M}^{-1}} \ \left( \vec{Z} - \left< \vec{Z} \right> \right)^{*} \right) \end{equation}
We are going to indicate some properties of gaussian processes
1) Linear transformations of gaussian processes are gaussian processes too. If $\vec{Z_0}$ is a gaussian random vector with zero mean, the vector
\begin{equation} \vec{Z} = \boldsymbol{\vec{A}} \vec{Z_0} \end{equation}
where $\boldsymbol{\vec{A}}$ is a generic matrix, is still a gaussian vector with zero mean and covariance matrix like
\begin{equation} \boldsymbol{\vec{M}} = \boldsymbol{\vec{A^{*}}} \boldsymbol{\vec{M_0}} \boldsymbol{\vec{A}^T} \end{equation}
2) If $\vec{Z}$ is a random vector with zero mean, after a linear transformation like
\begin{equation} x = \vec{w}^T \vec{Z} \end{equation}
where $\vec{w}$ is a vector of weights, the power of the scalar value $x$ is:
\begin{equation} E \left[ \left| x \right|^2 \right] = \vec{w}^H \ \boldsymbol{\vec{M}} \ \vec{w} \end{equation}
where $\boldsymbol{\vec{M}}$ is the covariance matrix of the process $\vec{Z}$.
This formula is useful when we have to work with energies (for example to evaluate the Signal/Noise ratio after filtering).
Proof of (82) (in case of a random vector with zero mean):
We set $\vec{A} = \vec{\Omega}$ and we can say that $v = \vec{W}^T \vec{X}$ where $\vec{X}$ is a zero mean vector. Using (108) we have:
$\sigma_v^2 = E \left[ v^2 \right] = \vec{\Omega}^T \vec{M} \vec{\Omega} $
where $\vec{M} = E \left[ \vec{X} \vec{X}^T \right]$.
Since $v$ is a Gaussian random variable with zero mean and $\sigma_v^2$ variance, its characteristic function evaluated in $\omega = 1$ is:
$\Phi_v \left( \omega \right) |_{\omega = 1} = exp \left( - \frac{\sigma_v^2 \omega^2}{2} \right) |_{\omega = 1} = exp \left( - \frac{\sigma_v^2}{2} \right)$
and we obtain:
$\Phi_{X}\left(\vec{\Omega}\right) = E \left[ e^{j \vec{\Omega^T} \vec{X}} \right] = exp \left( - \frac{1}{2} \vec{\Omega}^T \vec{M} \vec{\Omega} \right)$
that is equal to (82).
Coherent detector and Discrete-Time optimal processor
Application of Neyman-Pearson criterieon on a coherent radar detector.
Let's suppose to have $N > 1$ pulses associated to the target, and to be able to write the signal $s(t)$ associated to the echo of the target such that
\begin{equation} s(t) = A(t) cos (2 \pi (f_0 + f_d)t + \Phi(t)) \end{equation}
where $f_D$ is the doppler frequency and $A(t)$ is the amplitude and $\phi(t)$ is a phase. Let's suppose that the target is deterministic $A(t)=A_0 = \text{const}$ and $\phi(t) = \Phi_0 = \text{const}$. If we use the complex representation of $s(t)$ we can say that
\begin{equation} s(t) = Re \left[ A(t) e^{ j 2 \pi (f_d t + \Phi(t)) } e^{ j 2 \pi f_0 t } \right] \end{equation}
and the useful signal in base band $s'(t)$ (complex envelope) is like
\begin{equation} s'(t) = A e^{ j 2 \pi ( f_d t + \Phi(t) ) } \end{equation}
The coherent detector of the radar (see figure 10) extracts exactly $s'(t)$;
so if we use a coherent radar we have I and Q samples taken at multiple instants of the PRT. These type of samples can be considered as a complex vector
\begin{equation} \vec{Z} = \vec{I} + j \vec{Q} = (x_i + j y_i) |_{i=1}^N \end{equation}
where $N$ is the number of the samples (or pulses in the dwell time). The samples contained in the vector $\vec{Z}$ have a component of the useful signal and a component of noise. Let's suppose that the useful signal is represented by (111) and that the noise, produced for example from external factors (clutter), is additive Gaussian with zero mean and with covariance matrix $\boldsymbol{\vec{M}}$.
Note that we don't make the hypotesis that the noise is white, this is why in the radar field there is not only the contribution of the white noise, but can exist also noises caused by clutter's echos (clutter: echo caused by ground, sea, rain, animals/insects, chaff and atmospheric turbulences, etc.) that have some correlation between them; This correlation can be expressed in the covariance matrix $\boldsymbol{\vec{M}}$ assuming that the matrix is not diagonal. Anyway, in the case of noise caused by clutter it's right to say that the distribution is Gaussian, at least for some kind of clutters. The vector of the observable $\vec{Z}$ is
\begin{equation} \vec{Z} = \vec{s} + \vec{n} \end{equation}
where $\vec{s}$ is the vector containing the samples associated to the useful signal and $\vec{n}$ is the vector that contains the noise samples. The noise is such that $E(n) = 0$ and so $E(\vec{Z}) = \left< \vec{Z} \right> = E(\vec{S})$. We can write the covariance matrix $\boldsymbol{\vec{M}}$ as
\begin{equation} \boldsymbol{\vec{M}} = E \left[ \left( \vec{Z} - \left< \vec{Z} \right> \right)^{*} \left( \vec{Z} - \left< \vec{Z} \right> \right)^{T} \right] = E \left[ \vec{n}^{*} \vec{n}^{T} \right] \end{equation}
where the random vector $\vec{Z}$ is made by the sum of the noise clutter and the useful signal. If the useful signal is statistically independent from the additive noise and if it's a process with zero mean, the probability density function of the vector $\vec{Z}$, in the case of absence and presence of the target, are n-dimensional Gaussian.
a) In the absence of target we have:
\begin{equation} p(\vec{Z} \mid M_0 ) = \frac{1}{\pi^n \ det \boldsymbol{\vec{M}}} exp \left( -\vec{Z}^T \ \boldsymbol{\vec{M}^{-1}} \ \vec{Z}^{*} \right) \end{equation}
where we suppose that the vector containing the means of the noise is null.
b) In cases in which the target is deterministic (that are known in addition to the carrier $f_0$, the values of $f_D$, and $A$) is possible to obtain the analytical expression of the useful signal $\vec{s}$. Since the process of the noise has zero mean $\vec{s}$ can also be seen as the vector of the mean values of the n-dimensional Gaussian, that statistically describes the vector $\vec{Z}$. So we can write
\begin{equation} p(\vec{Z} \mid M_1 ) = \frac{1}{\pi^n \ det \boldsymbol{\vec{M}}} exp \left( - \left( \vec{Z} - \vec{s} \right)^T \ \boldsymbol{\vec{M}^{-1}} \ \left( \vec{Z} - \vec{s} \right)^{*} \right) \end{equation}
Once we know the functions $p(\vec{Z} \mid M_0)$ and $p(\vec{Z} \mid M_1)$ we can apply the decision criterion of Neyman-Pearson. The likelihood ratio is equal to:
\begin{equation} l(\vec{Z}) = \frac{ p(\vec{Z} \mid \vec{M_1}) }{ p(\vec{Z} \mid \vec{M_0}) } = \frac{ exp \left( - \left( \vec{Z} - \vec{s} \right)^T \ \boldsymbol{\vec{M}^{-1}} \ \left( \vec{Z} - \vec{s} \right)^{*} \right) } { exp \left( -\vec{Z}^T \ \boldsymbol{\vec{M}^{-1}} \ \vec{Z}^{*} \right) } \end{equation}
If we utilize the logarithmic likelihood ratio we can write that
\begin{equation} \Lambda(\vec{Z}) = ln[l(\vec{Z})] = -\vec{Z}^T \ \boldsymbol{\vec{M}^{-1}} \ \vec{Z}^{*} \ - \ \vec{s}^T \boldsymbol{\vec{M}^{-1}} \vec{s}^{*} + \\ + \vec{s}^T \boldsymbol{\vec{M}^{-1}} \vec{Z}^{*} + \vec{Z}^T \ \boldsymbol{\vec{M}^{-1}} \ \vec{s}^{*} + \vec{Z}^T \ \boldsymbol{\vec{M}^{-1}} \ \vec{Z}^{*} \end{equation}
and simplifying we obtain
\begin{equation} \Lambda(\vec{Z}) = - \vec{s}^T \boldsymbol{\vec{M}^{-1}} \vec{s}^{*} + \vec{s}^T \boldsymbol{\vec{M}^{-1}} \vec{Z}^{*} + \vec{Z}^T \boldsymbol{\vec{M}^{-1}} \vec{s}^{*} \end{equation}
\begin{equation} \Lambda(\vec{Z}) = - \vec{s}^T \boldsymbol{\vec{M}^{-1}} \vec{s}^{*} + \sum_{ij}{m_{ij} s_i {z_j}^{*}} + \sum_{ij}{m_{ij} z_i {s_j}^{*}} \end{equation}
where $m_{ij}$ are the elements of the matrix $\boldsymbol{\vec{M}^{-1}}$. Since $\boldsymbol{\vec{M}^{-1}}$ is Hermitian (because $\boldsymbol{\vec{M}}$ is Hermitian) we have $m_{ij} = m_{ji}^{*}$ and $\vec{s}^T \boldsymbol{\vec{M}^{-1}} \vec{Z}^{*}$ is the complex conjugate of $\vec{Z}^T \ \boldsymbol{\vec{M}^{-1}} \ \vec{s}^{*}$, so $m_{ij} = m_{ji}^{*}$ and consecutively (119) becomes:
\begin{equation} \Lambda(\vec{Z}) = - \vec{s}^T \boldsymbol{\vec{M}^{-1}} \vec{s}^{*} + 2 Re \left[ \vec{Z}^T \boldsymbol{\vec{M}^{-1}} \vec{s}^{*} \right] \end{equation}
According to the Neyman-Pearson criterion the logarithmic likelihood ratio must be compared with a threshold $\lambda$ which is directly related to the value of the desired $P_{fa}$. The addendum $- \vec{s}^T \boldsymbol{\vec{M}^{-1}} \vec{s}^{*}$ is independent from the vector of the observable $\vec{Z}$ and can be incorporated in the threshold term. Since the the target is deterministic, we know the vector $\vec{s}$; consecutively, knowing the matrix $\boldsymbol{\vec{M}}$, if we define
\begin{equation} \vec{k} = \boldsymbol{\vec{M}^{-1}} \vec{s}^{*} \end{equation}
we can rewrite (121) as
\begin{equation} \Lambda(\vec{Z}) = - \vec{s}^T \boldsymbol{\vec{M}^{-1}} \vec{s}^{*} + 2 Re \left[ \vec{Z}^T \vec{k} \right] \end{equation}
and finally, the decision rule can be written as
\begin{equation} Re \left[ \vec{Z}^T \vec{k} \right] \ {}^{{>}^{M_1}}_{<_{M_0}} \ \lambda + \frac{1}{2} \vec{s}^T \boldsymbol{\vec{M}^{-1}} \vec{s}^{*} \end{equation}
The (124) describes the structure of the optimal Neyman-Pearson reciever in case we have some coherent samples. Note that the optimal decision procedure according to the Neyman-Pearson criterion requires to carry out the calculation of the scalar
\begin{equation} w = \vec{Z}^T \vec{k} \end{equation}
where $\vec{k} = \boldsymbol{\vec{M}^{-1}} \vec{s}^{*}$, and to compare the result with the threshold.
Note that $\vec{s}$ is the vector associated with the received signal in absence of noise. In the case of white noise (except for a constant factor), $\vec{k} = \vec{s}^{*}$.
If the vector $\vec{s}$ is considered as the impulsive response of a discrete type of FIR filter, there is a filtering operation that provides an output value at a precise instant.The filter described by the vector $\vec{s}^{*}$ assumes the form of discrete-time matched (to the signal waveform emitted from the target in the case of white noise) filter. In the continous case, if $s(t)$ is the waveform emitted by the target, the adaptive filter to this waveform (remember that this filter allow us to obtain the maximum signal/noise ratio in output) is expressed as
\begin{equation} h(t) = s^{*}(-t) \end{equation}
Note that it is necessary to know the parameters of the target values $A$, $f_D$, $\Phi$. If the density functions are not Gaussian, the likelihood ratio may not be written easily.
Maximization of the Signal / Noise ratio
Finding the maximum Signal / Noise ratio using optimal linear processor for a known deterministic signal. Application of the optimal filtering for an unfluctuating moving target. Optimal linear processor for a deterministic signal with unknown phase.
Optimal processor for a known deterministic signal
In the general case, we prefer to operate with a suboptimal decision criterion based on the research of the maximum signal/noise ratio. In case of Gaussian random variables we will see that we will arrive to the same results as the Neyman-Pearson's one.
Suppose to have $n$ samples of the received signal. These samples must be processed in a linear manner (by using a FIR filter) in such a way that at a certain instant (instant of decision), is maximized the signal/noise ratio at the output.
Let $\vec{h}$ be the vector of coefficients of the FIR filter and $v$ is the filter output at the instant of decision, that is (when the sequence of n samples received, is aligned with the sequence of the coefficients) equal to
\begin{equation} v = \vec{h}^T \vec{Z} \end{equation}
where $\vec{Z}$ is the vector containing the observations. Obviusly $\vec{Z} = \vec{n}$ in case the target is not present and $\vec{Z} = \vec{s} + \vec{n}$ if the target is present. The variable $v$ can be seen as the sum of the useful signal contribution $v_s$ and the noise contribution $v_n$. So
\begin{equation} v = v_s + v_n \end{equation}
The power $N$ associated to the noise is defined as:
\begin{equation} N = E[v_n^{*} v_n] = E \left[ \left( \vec{h}^T \vec{n} \right)^{*} \left( \vec{h}^T \vec{n} \right)^T \right] = \vec{h}^H \boldsymbol{\vec{M}} \vec{h} \end{equation}
where $\boldsymbol{\vec{M}}$ is the covariance matrix of the noise which has zero mean. The power of the noise in output of the filter defined by the vector $\vec{h}$, can be expressed as a function of the filter coefficients. Let's suppose that the covariance matrix of the noise $\boldsymbol{\vec{M}}$ is known (or estimated). Let's consider for now the case of the deterministic signal $\vec{s}$; The power of the useful signal $S$ at the output of the processor is, by definition, equal to
\begin{equation} S = \left| v_s \right|^2 = \left| \vec{h}^T \vec{s} \right|^2 \end{equation}
where $\vec{s}$ is the vector of the samples of the useful signal. The indicated quantities are obtained sampling in time (and so in Range) the radar signal in the point where we expect that the target (point) echo is maximum. Since the useful signal is deterministic, the signal/noise ratio (SNR) at the output of the linear processor is expressed as:
\begin{equation} SNR = \left( \frac{S}{N} \right) = \frac{ \left| \vec{h}^T \vec{s} \right|^2 }{ \vec{h}^H \boldsymbol{\vec{M}} \vec{h} } \end{equation}
note that $N = E(v_n^T v_n^H) = \vec{h}^T \boldsymbol{\vec{M}} \vec{h}^{*}$.
Note:
In general, the useful signal can be a random process. In this case, it can be described by the vector of the expected values and with the $\boldsymbol{\vec{M_s}}$ covariance matrix. If the vector of the expected values is null, the signal/noise ratio at the filter output is equal to
\begin{equation} \left( \frac{S}{N} \right) = \frac{ \vec{h}^H \boldsymbol{\vec{M_s}} \vec{h} }{ \vec{h}^H \boldsymbol{\vec{M}} \vec{h} } \end{equation}
If we are in the case where the useful signal is modeled as a Gaussian random process with zero mean, and with covariance matrix $\boldsymbol{\vec{M_s}}$, the optimal filtering corresponds to the minimization of the ratio (called Rayleigh ratio):
\begin{equation} \frac{ \vec{h}^H \boldsymbol{\vec{M}} \vec{h} }{ \vec{h}^H \boldsymbol{\vec{M_s}} \vec{h} } \end{equation}
Using Lagrange multipliers, this problem is solved by finding the eigenvector $\vec{h_0}$ that corresponds to the minimum eigenvalue $\lambda_m$ of $(\boldsymbol{\vec{M}}$,$\boldsymbol{\vec{M_s}})$:
\begin{equation} \boldsymbol{\vec{M}}\vec{h_0} = \lambda_m \boldsymbol{\vec{M_s}} \vec{h_0} \end{equation}
In case of the useful signal is “white” (this corresponds to the standard definition of the improvement factor) $\boldsymbol{\vec{M_s}} = \boldsymbol{\vec{I}}$, solving the problem of the optimal filtering is equal to the problem of finding the eigenvalues of the covariance matrix of the noise (clutter + thermal noise) $\boldsymbol{\vec{M}}$, and in particular, to find the vector of optimal coefficients $\vec{h_0}$, that is the eigenvector of $\boldsymbol{\vec{M}}$ associated to the minimum eigenvalue.
Knwoing that, we can express this theorem:
Theorem: The vector of coefficients $\vec{h}$ that maximizes the signal/noise ratio at the output of the linear combiner in the case of known useful signal $\vec{s}$ is:
\begin{equation} \vec{h} = \mu \boldsymbol{\vec{M}^{-1}} \vec{s}^{*} = \mu \vec{k} \end{equation}
where $\mu$ is a non relevant constant.
Note that (135) is the same as (122), result already obtained by applying the Neyman-Pearson criterion in the case in which the statistics are Gaussian.
Now, if we apply the threshold decision criterion using the observable in output from the linear system, the probability of correct detection is not always maximized (this is true only if the signals can be modelized as n-dimensional Gaussians), so the result of the theorem is general but don't guarantee that $P_d$ is maximized.
For example, if we compare the probability of detection $P_d$ in the two cases: criterion of Neyman-Pearson and based on maximization of SNR in case of non gaussian statistic, we will observe that the value of $P_D$ obtained in the case of maximization of SNR is lower.
Proof:
Since the matrix $\boldsymbol{\vec{M}}$ is Hermitian using spectral decomposition, it can be written as
\begin{equation} \boldsymbol{\vec{M}} = \sum_{i=1}^{N}{\lambda_i \boldsymbol{\vec{U_i} \boldsymbol{\vec{U_i^H}}}} = \boldsymbol{\vec{U}} \ \boldsymbol{\vec{\Lambda}} \ \boldsymbol{\vec{U}^H} \end{equation}
where $\lambda_i$ are eigenvalues, $\vec{V_i}$ are eigenvectors of $\boldsymbol{\vec{M}}$ , $\boldsymbol{\vec{U}}$ is the matrix (unitary) of the eigenvectors and $\boldsymbol{\vec{\Lambda}}$ is the diagonal matrix of the eigenvalues. So we can factorize $\boldsymbol{\vec{M}}$ as
\begin{equation} \boldsymbol{M} = \boldsymbol{\vec{A}} \boldsymbol{\vec{A}^H} \end{equation}
with
\begin{equation} \boldsymbol{A} = \boldsymbol{\vec{U}} \boldsymbol{\vec{\Lambda}^{\frac{1}{2}}} \end{equation}
where $\lambda_i$ are the eigenvalues and $\vec{U_i}$ are the eigenvectors of $\boldsymbol{\vec{M}}$.
$\boldsymbol{\vec{M}^{-1}}$ can be factorized as (using the rule: $\left( \boldsymbol{\vec{A}} \boldsymbol{\vec{B}}|^{-1} = \boldsymbol{\vec{B}} \boldsymbol{\vec{A}}^{-1} \right)$ ) such that:
\begin{equation} \boldsymbol{\vec{M}^{-1}} = \left( \boldsymbol{\vec{A}^{-1}} \right)^H \boldsymbol{\vec{A}}^{-1} \end{equation}
Substituting these last relations in the expression $S=\left|\vec{h}^{T} \vec{s} \right|^2$ that is at the numerator of (131) we have
\begin{equation} \left|{\vec{h}}^{T} \vec{s} \right|^2 = \left| {\vec{h}}^T \boldsymbol{{\vec{A}}^{*}} \left( \boldsymbol{{\vec{A}}^{-1}} \right)^{*} \vec{s} \right|^2 = \left| \left( \boldsymbol{{\vec{A}}^{H}} \vec{h} \right)^T \left( \boldsymbol{{\vec{A}}^{-1}} {\vec{s}}^{*} \right)^{*} \right|^2 = \left| {\vec{v_1}}^T \cdot {\vec{v_2}}^{*} \right|^2 \end{equation}
where
\begin{equation} \vec{v_1} = \boldsymbol{{\vec{A}}^{H}} \vec{h} \ \ , \ \ \vec{v_2} = \boldsymbol{{\vec{A}}^{-1}} {\vec{s}}^{*} \end{equation}
Using the Schawartz equation we can write:
\begin{equation} \left| {\vec{v_1}}^T \cdot {\vec{v_2}}^{*} \right|^2 \le \left| {\vec{v_1}}^2 \right| \left| {\vec{v_2}}^2 \right| = \left( \boldsymbol{{\vec{A}}^{H}} \vec{h} \right)^H \left( \boldsymbol{{\vec{A}}^{H}} \vec{h} \right) \left( \boldsymbol{{\vec{A}}^{-1}} {\vec{s}}^{*} \right)^H \left( \boldsymbol{{\vec{A}}^{-1}} {\vec{s}}^{*} \right) = \\ = {\vec{h}}^H \boldsymbol{\vec{A}} \boldsymbol{{\vec{A}}^{H}} \vec{h} {\vec{s}}^T \boldsymbol{{\vec{A}}^{-H}} \boldsymbol{{\vec{A}}^{-1}} {\vec{s}}^{*} \end{equation}
This means that the upper bound for the power of the signal in output at the filter, $S$ is equal to
\begin{equation} \left|{\vec{h}}^{T} \vec{s} \right|^2 \le {\vec{h}}^H \boldsymbol{\vec{M}} \vec{h} {\vec{s}}^T \boldsymbol{{\vec{M}}^{-1}} {\vec{s}}^{*} \end{equation}
So we have this upper bound for the signal/noise ratio:
\begin{equation} SNR = \frac{ \left|{\vec{h}}^{T} \vec{s} \right|^2 }{ {\vec{h}}^H \ \boldsymbol{\vec{M}} \ \vec{h} } \le {\vec{s}}^T \ \boldsymbol{\vec{M}^{-1}} \ {\vec{s}}^{*} = SNR_0 \end{equation}
We found that the SNR expressed in (131) admits a maximum (equal to $\vec{s}^T \ \boldsymbol{\vec{M}^{-1}} \ \vec{s}^{*}$) that is obtained if
\begin{equation} \vec{h} = \boldsymbol{{\vec{M}}^{-1}} \vec{s}^{*} \end{equation}
Q.E.D.
In conditions of Gaussian statistics the optimal processor that maximizes the signal/noise ratio, or MMSE (Minimum Mean Square Error) coincides with the Neyman-Pearson. We will use indifferently $\vec{h}$ and $\vec{k}$ symbols to indicate the vector of coefficients.
Note that if the noise statistics are not gaussian the Neyman-Pearson criterion needs complex processors, usually non linear, that can be much complicated.
Applications of the optimal filtering: moving target / unfluctuating
If the noise is white ad stationary the matrix $\boldsymbol{\vec{M}^{-1}}$ is diagonal with elements equal to $\frac{1}{\sigma^2}$ and so the signal/noise ratio in output from the optimal filter is
\begin{equation} \left( \frac{S}{N} \right)_{OUT} = \frac{ \vec{s}^T \vec{s}^{*} }{\sigma^2} = \frac{ \sum_{i = 1}^{n} \left| s_i \right|^2 }{\sigma^2} = n \frac{ \frac{1}{n} \sum_{i = 1}^{n} \left| s_i \right|^2 }{\sigma^2} \end{equation}
where $\sigma$ is the RMS value of the noise's voltage. The term $\frac{1}{n} \sum_{i = 1}^{n} \left| s_i \right|^2$ represents the average over the observation time of the power value of each pulse.
So, the term $\frac{ \frac{1}{n} \sum_{i = 1}^{n} \left| s_i \right|^2 }{\sigma^2}$ is the signal/noise ratio in input.
When the noise is white the gain that we obtain using an optimal processor is equal to $n$ (number of pulses on the target). What we said until now is true if we know the analytic form and the parameters of the useful signal. We can suppose that the target is moving and so it will have a doppler frequency $f_D \ne 0$; also in this case we can see that we will have a gain of the integration of the pulses equal to the number $n$ of pulses, if the noise is white.
Let's consider, for $n=2$, this covariance matrix that can be referred only to the clutter, or to the sum of white noise (with covariance matrix equal to $\boldsymbol{\vec{M_{\text{noise}}}} = \sigma_{\text{noise}}^2 \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}$ ) and clutter (with covariance matrix equal to $\boldsymbol{\vec{M_{c}}} = {\sigma_{c}}^2 \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}$);
we have:
\begin{equation} \boldsymbol{\vec{M}} = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \sigma^2 \end{equation}
where $\rho$ is the correlation coefficient and $\sigma$ represents the power of the noise.
$\rho = \rho(\tau)$ by definition, represents the correlation coefficient between two consecutive pulses, one observed at time $t=0$ and the other at time $t=T=PRT$ (see figure 11)
Suppose that the normalized power spectrum of the noise (this means that $\int_{-\infty}^{\infty} S(f) \,df = 1$, so if $R(\tau) = \int_{-\infty}^{\infty} S(f)e^{+j 2 \pi f \tau} \,df$, we have $R(0) = 1$) is
\begin{equation} S(f) = \frac{1}{\sqrt{2 \pi} \sigma_f} e^{-\frac{f^2}{2 {\sigma_f^2}}} \end{equation}
The equation (148) is utilized, in general, in the metereological radars to represent the spectrum of the rainy phenomena. If we antitransform the (148) we have the expression of the correlation coefficient $\frac{R(\tau)}{R(0)}$ to which we will refer as $\rho(\tau)$
\begin{equation} \rho(\tau) = exp(-2 \pi^2 {\sigma_f}^2 \tau^2) \end{equation}
and so
\begin{equation} 2 \pi^2 {\sigma_f}^2 = \frac{1}{2 {\sigma_t}^2} \ \ \ \ \ \ \ \ \ \ \ \ \sigma_t = \frac{1}{2 \pi \sigma_f} \end{equation}
\begin{equation} \rho(PRT) = exp(-2 \pi^2 \ {\sigma_f}^2 \ \text{PRT}^2) \end{equation}
Suppose that the noise is characterized by a null average doppler frequency, the spectral situation can be represented like in figure 12.
In the figure 12 the Dirac pulse represents the useful signal; his analytic representation in base band (assuming doppler frequency $f_D = 0$) is
\begin{equation} s(t) = A e^{j \ \Phi(t)} \end{equation}
Since the white thermal noise contribution is scorrelated from the clutter contribution (produced for example by the atmospheric agents), it can be added in power. Consecutively the covariance matrix $\boldsymbol{\vec{M}}$ of the overall noise process ($\sigma_n^2$ for the noise, $\sigma_c^2$ for the clutter and $\rho_c$ as correlation coefficient) is:
\begin{equation} \boldsymbol{\vec{M}} = {\sigma_n}^2 \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} + {\sigma_c}^2 \begin{bmatrix} 1 & \rho_c \\ \rho_c & 1 \end{bmatrix} = {\sigma_{tot}}^2 \begin{bmatrix} 1 & {\rho_c}^r \\ {\rho_c}^r & 1 \end{bmatrix} \end{equation}
where
\begin{equation} {\sigma_{\text{tot}}}^2 = {\sigma_c}^2 + {\sigma_n}^2 = C + N \end{equation}
with
\begin{equation} r = \frac{{\sigma_c}^2}{{\sigma_c}^2 + {\sigma_n}^2} = \frac{C/N}{(C/N)+1} \end{equation}
that can be written as:
\begin{equation} \boldsymbol{\vec{M}} = \boldsymbol{\vec{M_c}} + \boldsymbol{\vec{M_{noise}}} = {\sigma}^2 \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \end{equation}
The inverse of this matrix is equal to:
\begin{equation} \boldsymbol{\vec{M}^{-1}} = \frac{1}{{\sigma}^2 (1 - \rho^2)} \begin{bmatrix} 1 & -\rho \\ -\rho & 1 \end{bmatrix} \end{equation}
Assuming that the useful signal has a doppler frequency equal to zero and that $A = 1$ and $\Phi = 0$, the vector of the signal samples taken in temporal distance equal to T = PRT has the expression
\begin{equation} \vec{s}^T = \begin{bmatrix} 1 \ & \ 1 \end{bmatrix} \end{equation}
using (135) the optimal filter coffieceints are
\begin{equation} \vec{h_{opt}} = \boldsymbol{\vec{M}^{-1}} \vec{s}^{*} = \frac{1}{{\sigma}^2 (1 - \rho^2)} \begin{bmatrix} 1 & -\rho \\ -\rho & 1 \end{bmatrix} \begin{bmatrix} 1 \\ 1 \end{bmatrix} = \frac{1}{{\sigma}^2 (1 + \rho)} \begin{bmatrix} 1 \\ 1 \end{bmatrix} \end{equation}
knwoing the vector of the observed samples $\vec{z}^T = \begin{bmatrix} z_1 \ & \ z_2 \end{bmatrix}$, the output of the filter is
\begin{equation} w = \vec{h_{opt}} \vec{z} = \frac{1}{{\sigma}^2 (1 + \rho)}(z_1 + z_2) \end{equation}
and so the filter is a simple adder.
To see the gain introduced by the optimal linear filter we observe that before the processing, the signal/noise ratio on a single pulse $(S/N)_1$ is equal to
\begin{equation} \left ( \frac{S}{N} \right)_{1} = \frac{1}{\sigma^2} \end{equation}
and at the output we have:
\begin{equation} \left ( \frac{S}{N} \right)_{OUT} = \vec{s}^T \ \boldsymbol{\vec{M}^{-1}} \ \vec{s}^{*} = \frac{2}{{\sigma}^2 (1 + \rho)} \end{equation}
We observe that the improvement obtained is lower compared to the case when we have only white noise, this is caused by the contribution of a colored noise. As much as the coefficient $\rho$ is lower, the improvement will be bigger, if $\rho = 0$ we have an improvement equal to 2 (in general equal to $n$). If $\rho \rightarrow 1$, the spectrum of the noise and the spectrum of the useful signal tend to be overlapped and the improvement tends to zero. Note that the spectral situation in figure 13, where we are supposing that the useful signal has a doppler frequency $f_D$ such that the representative line of the signal is not overlapped with the spectrum of the noise. In this case, the output gain improvement of the optimal linear processor can be very high and this is due to the fact that the spectrums are not overlapped. For example, consider the case where the Doppler frequency of the useful signal is equal to half of the PRF value (see figure 13)
In this case we have
\begin{equation} \vec{s}^T = \begin{bmatrix} 1 \ & \ -1 \end{bmatrix} \end{equation}
so
\begin{equation} \vec{h_{opt}} = \boldsymbol{\vec{M}^{-1}} \vec{s}^{*} = \frac{1}{{\sigma}^2 (1 - \rho^2)} \begin{bmatrix} 1 & -\rho \\ -\rho & 1 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \end{bmatrix} = \frac{1}{{\sigma}^2 (1 - \rho)} \begin{bmatrix} 1 \\ -1 \end{bmatrix} \end{equation}
and consecutively the output of the optimal linear filter is
\begin{equation} w = \vec{h_{opt}} \vec{z} = \frac{1}{{\sigma}^2 (1 - \rho)}(z_1 - z_2) \end{equation}
if we indicate with $\left( \frac{S}{N} \right)_1$ the signal/noise ratio of a single pulse, that is defined as
\begin{equation} \left( \frac{S}{N} \right)_1= \frac{1}{\sigma^2} \end{equation}
then after the processor we will have
\begin{equation} \left( \frac{S}{N} \right)_{OUT} = \vec{s}^T \boldsymbol{\vec{M}^{-1}} \vec{s}^{*} = \\ = \frac{1}{{\sigma}^2 (1 - \rho^2)} \begin{bmatrix} 1 & -1 \end{bmatrix} \begin{bmatrix} 1 & -\rho \\ -\rho & 1 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \end{bmatrix} = \frac{2}{1 - \rho} \left( \frac{S}{N} \right)_1 \end{equation}
Looking at (167) we observe that if $\rho \rightarrow 0$ (noise = white noise) then $(S/N)_{OUT} \rightarrow 2(S/N)_1$, so we have obtained an improvement equal to $n=2$ as in the previous case. If $\rho \rightarrow 1$ then $(S/N)_{OUT} \rightarrow \infty$ because the noise is completely canceled by the filter;
If the useful signal is completely known and the noise is additive, the optimal linear processor (according to the criterion of maximizing the signal/noise ratio) has the following scheme (see figure 14)
In this scheme we perform the scalar product (filtering) of the received vector $\vec{z}$ with the vector $\vec{k} = \boldsymbol{\vec{M}^{-1}} \vec{s}^{*}$ (see equation (135)) and then we take its real part $w_x$. To decide whether the target is present or not, $w_x$ must be compared with a threshold $\Lambda_x$, chosen according to the $P_{fa}$ value:
\begin{equation} \rho = Re (w_x)_{>}^{<} \Lambda_x \end{equation}
To perform the test is necessary to calculate $w$ and consequently it is necessary to know the initial phase $\Phi_0$ associated with the target signal. If the base band representation of the useful signal is
\begin{equation} s(t) = A e^{j(2 \pi \ f_d \ t \ + \ \Phi_0 )} \end{equation}
then the signal samples vector (taken in temporal distance equal to $k$ PRT) is
\begin{equation} \vec{s}^T = e^{j \Phi_0} [a_0, a_1 e^{j \phi_1}, a_2 e^{j \phi_2}, ..., a_{n-1} e^{j \phi_{n-1}} ] = \vec{s_0}^T e^{j \Phi_0} \end{equation}
where $\Phi_i$ is the difference of phase between the sample $i$ and the first sample (with $i=0$) and it's related to the doppler effect. In particular we have:
\begin{equation} \Phi_i = 2 \pi \ f_d \cdot PRT \cdot i \end{equation}
Optimal processor for deterministic signal with $\Phi_0$ unknown
the phase $\Phi_0$, that is independent from the doppler frequency (see (170)), is not known a priori.
We are going to analize what is the optimal test to do when the initial phase $\Phi_0$ is unknown.
Note that the term $w$ ($w=\vec{k}^T \vec{z}$) can be written as:
\begin{equation} w = w_x + j w_y = \rho e^{j \phi} \end{equation}
So, the likelihood ratio can be written as
\begin{equation} l(w) = \frac{ p(\rho, \Phi \mid M_1) }{ p(\rho, \Phi \mid M_0) } \end{equation}
that depends from $\Phi_0$ because $\vec{k}$ depends from $\Phi_0$.
Note that the joint probability density of $\rho$ and $\Phi$, given the event $M_0$, (that represents the absence of useful signal) can be written as
\begin{equation} p(\rho, \Phi \mid M_0) = p_2(\Phi \mid M_0)p_1(\rho \mid M_0) = \frac{1}{2 \pi} p_1(\rho \mid M_0) \end{equation}
assuming that $\Phi$ is a random variable,uniformly distributed in $[0, 2 \pi)$, independent from $\rho$ (to remove the dependence from $l(w)$). We can now calculate the likelihood ratio in function of $\Phi_0$ removing the dependence from $\Phi_0$.
Starting from (173) we can write
\begin{equation} E_{\Phi_0} [l(w)] = \int_{0}^{2 \pi} l(w) \ p_0(\Phi_0) \, d\Phi_0 = \frac{1}{2 \pi} \int_{0}^{2 \pi} l(w) \,d\Phi_0 = \\ = \frac{1}{p_1(\rho \mid M_0)} \int_{0}^{2 \pi} p(\rho, \Phi \mid M_1) \,d\Phi_0 \end{equation}
where $\rho$ is the modulus of the voltage received caused only by the noise. Since the density function $p_1(\rho | M_0)$ is not dependent from $\Phi_0$ we have this important result:
\begin{equation} E_{\Phi_0} [l(w)] = \frac{p_1(\rho \mid M_1)}{p_1(\rho \mid M_0)} \end{equation}
so, starting from (123), the expression of the likelihood ratio is
\begin{equation} l(z, \Phi_0) = exp\left(- \vec{s}^T \boldsymbol{\vec{M}^{-1}} \vec{s}^{*}\right) exp\left(2 Re \left( \vec{z}^T \ \boldsymbol{\vec{M}^{-1}} \vec{s}^{*} \right) \right) \end{equation}
If we assign (see (170)):
\begin{equation} \vec{s} = \vec{s_0} e^{-j \Phi_0} \ \ \ , \ \ \ \vec{k} = \vec{k_0} e^{j \Phi_0} \end{equation}
where
\begin{equation} \vec{k} = \boldsymbol{\vec{M}^{-1}} \vec{s}^{*} \ \ \ , \ \ \ \vec{k_0} = \boldsymbol{\vec{M}^{-1}} \vec{s_0}^{*} \end{equation}
so we have
\begin{equation} l(z , \Phi_0) = exp\left(-\vec{s_0}^T \vec{k_0}\right)exp\left(2 Re\left[ \vec{z}^T \vec{k_0} e^{j \Phi_0} \right] \right) \end{equation}
and, if we assign
\begin{equation} \vec{z}^T \vec{k_0} = \rho e^{j \alpha} = w \end{equation}
the term $\text{exp} \left( 2 \text{Re} \left( {\vec{z}}^{T} \ \vec{k_0} \ {\text{e}}^{j \Phi_0} \right ) \right)$ become $\text{exp}(2 \rho \text{cos}(\alpha + \Phi_0))$ and so the expression of the likelihood ratio can be expressed as:
\begin{equation} E_{\Phi_0}(l(z,\Phi_0)) = exp(-\vec{s_0}^{T} \vec{k_0}) \frac{1}{2 \pi} \int_{0}^{2 \pi} exp(2 \rho cos(\alpha + \Phi_0)) \,d\phi_0 \end{equation}
Since
\begin{equation} I_0(x) = \frac{1}{2 \pi} \int_{0}^{2 \pi} exp(x cos(\Phi)) \, d\Phi \end{equation}
we remind that the quantity $\alpha + \Phi_0$ is uniformly distributed in $[0, 2 \pi)$ and we have:
\begin{equation} E_{\Phi_0}(l(z,\Phi_0)) = exp\left( -\vec{s_0}^T \vec{k_0} \right) \ I_0 \ \left(2 \left|\vec{z}^T \vec{k_0} \right| \right) \end{equation}
and so, the test based on the average likelihood ratio is:
\begin{equation} I_0 \ \left( 2 \left|\vec{z}^T \vec{k_0} \right| \right) {}_{<_{H_0}}^{>^{H_1}} \ \eta \ exp\left\{ \vec{S_0}^T \vec{k_0} \right\} \end{equation}
Since $I_0(\cdot)$ is a monotonic function, the test is equal to
\begin{equation} \left|\vec{z}^T \vec{k_0} \right| {}_{<_{H_0}}^{>^{H_1}} \ \lambda \end{equation}
with $\lambda = \text{threshold}$ calculated according to the desired $P_{fa}$. In the figure 15 is represented the corresponding block scheme.
When the phase information is not known, the optimum test consists of linear signal processing that maximizes the S / N ratio (consistent integration), followed by envelope detection and threshold comparison. This type of detection is less efficient than the completely coherent detection (where we suppose to know the initial phase) (see figure 14)
Unknowing the phase factor $\Phi_0$ results in a loss of sensitivity of the receiver that can be quantified by calculating the probability of correct revelation in both cases (knowledge of $\Phi_0$ and not).
The figure 16 represents the probability of detection in the two cases: $\Phi_0$ known (continous curve) and $\Phi_0$ unknown (dotted curve) and we can see that: for the same SNR, if $\Phi_0$ is unknown, the probability of detection is lower. The variation of the signal/noise ratio $\Delta SNR$ in the two cases, for the same probability of detection (and the same $p_{fa}$), indicates the loss of sensitivity of the reciever.
