Recall that the likelihood of a model is the probability of the data set given the model (P(D|θ)).
The deviance of a model is defined by
D(θ,D)=2(log(P(D|θs))−log(P(D|θ)))
where θs is the saturated model which is so named because it perfectly fits the data.
In the case of normally distributed errors the likelihood for a single prediction (μi) and data point (yi) is given by
P(yi|μi)=1σ√2πexp(−12(yi−μiσ)2) and the log-likelihood by
log(P(yi|μi))=−log(σ)−12(log(2π))−12(yi−μiσ)2
The log-likelihood for the saturated model, which is when μi=yi, is therefore simply
log(P(yi|μsi))=−log(σ)−12(log(2π))
It follows that the unit deviance is
di=2(log(P(yi|μsi))−log(P(yi|μi)))
di=2(12(yi−μiσ)2)
di=(yi−μiσ)2
As the deviance residual is the signed squared root of the unit deviance,
ri=sign(yi−μi)√di in the case of normally distributed errors we arrive at ri=yi−μiσ which is the Pearson residual.
To confirm this consider a normal distribution with a ˆμ=2 and σ=0.5 and a value of 1.