next up previous
Next: Newton-Raphson Up: Multinomial Logistic Regression Previous: The Model

Parameter Estimation

For each population, the dependent variable follows a multinomial distribution with $ J$ levels. Thus, the joint probability density function is:

$\displaystyle f(\boldsymbol{y}\vert\boldsymbol{\beta}) = \prod_{i=1}^N \biggl[\...
...\prod_{j=1}^J y_{ij}!} \cdot \prod_{j=1}^J \pi_{ij}^{\phantom{ij}y_{ij}}\biggr]$ (26)

When $ J=2$, this reduces to Eq. 2. The likelihood function is algebraically equivalent to Eq. 26, the only difference being that the likelihood function expresses the unknown values of $ \boldsymbol{\beta}$ in terms of known fixed constant values for $ \boldsymbol{y}$. Since we want to maximize Eq. 26 with respect to $ \boldsymbol{\beta}$, the factorial terms that do not contain any of the $ \pi_{ij}$ terms can be treated as constants. Thus, the kernel of the log likelihood function for multinomial logistic regression models is:

$\displaystyle L(\boldsymbol{\beta}\vert\boldsymbol{y}) \simeq \prod_{i=1}^N \prod_{j=1}^J \pi_{ij}^{\phantom{ij}y_{ij}}$ (27)

Replacing the $ J^{th}$ terms, Eq. 27 becomes:


    $\displaystyle \prod_{i=1}^N \prod_{j=1}^{J-1} \pi_{ij}^{\phantom{ij}y_{ij}} \cdot \pi_{iJ}^{\phantom{iJ}n_i - \sum_{j=1}^{J-1} y_{ij}}$  
  $\displaystyle =$ $\displaystyle \prod_{i=1}^N \prod_{j=1}^{J-1} \pi_{ij}^{\phantom{ij}y_{ij}} \cd...
...rac{\pi_{iJ}^{\phantom{iJ}n_i}}{\pi_{iJ}^{\phantom{iJ}\sum_{j=1}^{J-1} y_{ij}}}$  
  $\displaystyle =$ $\displaystyle \prod_{i=1}^N \prod_{j=1}^{J-1} \pi_{ij}^{\phantom{ij}y_{ij}} \cd...
...ac{\pi_{iJ}^{\phantom{iJ}n_i}}{\prod_{j=1}^{J-1} \pi_{iJ}^{\phantom{iJ}y_{ij}}}$ (28)

Since $ a^{x+y}=a^xa^y$, the sum in the exponent in the denominator of the last term becomes a product over the first $ J-1$ terms of $ j$. Continue by grouping together the terms that are raised to the $ y_{ij}$ power for each $ j$ up to $ J-1$:

$\displaystyle \prod_{i=1}^N \prod_{j=1}^{J-1} \biggl( \frac{\pi_{ij}}{\pi_{iJ}} \biggr)^{y_{ij}} \cdot \pi_{iJ}^{\phantom{iJ}n_i}$ (29)

Now, substitute for $ \pi_{ij}$ and $ \pi_{iJ}$ using Eq. 24 and Eq. 25:


    $\displaystyle \prod_{i=1}^N \prod_{j=1}^{J-1} (e^{\sum_{k=0}^{K} x_{ik}\beta_{k...
...( \frac{1}{1 + \sum_{j=1}^{J-1}e^{\sum_{k=0}^K x_{ik}\beta_{kj}}} \biggr)^{n_i}$  
  $\displaystyle =$ $\displaystyle \prod_{i=1}^N \prod_{j=1}^{J-1} e^{y_{ij}\sum_{k=0}^K x_{ik}\beta...
...cdot \biggl(1 + \sum_{j=1}^{J-1}e^{\sum_{k=0}^K x_{ik}\beta_{kj}}\biggr)^{-n_i}$ (30)

Taking the natural log of Eq. 30 gives us the log likelihood function for the multinomial logistic regression model:

$\displaystyle l(\boldsymbol{\beta}) = \sum_{i=1}^N \sum_{j=1}^{J-1} \biggl( y_{...
... - n_i \log \biggl(1 + \sum_{j=1}^{J-1}e^{\sum_{k=0}^K x_{ik}\beta_{kj}}\biggr)$ (31)

As with the binomial model, we want to find the values for $ \boldsymbol\beta$ which maximize Eq. 31. We will do this using the Newton-Raphson method, which involves calculating the first and second derivatives of the log likelihood function. We can take the first derivatives using the steps similar to those in Eq. 11:


$\displaystyle \frac{\partial l(\beta)}{\partial \beta_{kj}}$ $\displaystyle =$ $\displaystyle \sum_{i=1}^N y_{ij} x_{ik} - n_i \cdot \frac{1}{1 + \sum_{j=1}^{J...
...beta_{kj}} \biggl(1 + \sum_{j=1}^{J-1}e^{\sum_{k=0}^K x_{ik}\beta_{kj}} \biggr)$  
  $\displaystyle =$ $\displaystyle \sum_{i=1}^N y_{ij} x_{ik} - n_i \cdot \frac{1}{1 + \sum_{j=1}^{J...
...kj}} \cdot \frac{\partial}{\partial \beta_{kj}} \sum_{k=0}^{K} x_{ik}\beta_{kj}$  
  $\displaystyle =$ $\displaystyle \sum_{i=1}^N y_{ij} x_{ik} - n_i \cdot \frac{1}{1 + \sum_{j=1}^{J...
...=0}^K x_{ik}\beta_{kj}}} \cdot e^{\sum_{k=0}^{K} x_{ik}\beta_{kj}} \cdot x_{ik}$  
  $\displaystyle =$ $\displaystyle \sum_{i=1}^N y_{ij} x_{ik} - n_i \pi_{ij} x_{ik}$ (32)

Note that there are $ (J-1) \cdot (K+1)$ equations in Eq. 32 which we want to set equal to zero and solve for each $ \beta_{kj}$. Although technically a matrix, we may consider $ \boldsymbol\beta$ to be a column vector, by appending each of the additional columns below the first. In this way, we can form the matrix of second partial derivatives as a square matrix of order $ (J-1) \cdot (K+1)$. For each $ \beta_{kj}$, we need to differentiate Eq. 32 with respect to every other $ \beta_{kj}$. We can express the general form of this matrix as:


$\displaystyle \frac{\partial^2 l(\beta)}{\partial \beta_{kj} \partial \beta_{k^\prime j^\prime}}$ $\displaystyle =$ $\displaystyle \frac{\partial}{\partial\beta_{k^\prime j^\prime}} \sum_{i=1}^N y_{ij} x_{ik} - n_i \pi_{ij} x_{ik}$  
  $\displaystyle =$ $\displaystyle \frac{\partial}{\partial\beta_{k^\prime j^\prime}} \sum_{i=1}^N - n_i x_{ik} \pi_{ij}$  
  $\displaystyle =$ $\displaystyle - \sum_{i=1}^N n_i x_{ik} \frac{\partial}{\partial\beta_{k^\prime...
...{ik}\beta_{kj}}}{1 + \sum_{j=1}^{J-1}e^{\sum_{k=0}^K x_{ik}\beta_{kj}}} \biggr)$ (33)

Applying the quotient rule of Eq. 14, note that the derivatives of the numerator and denominator differ depending on whether or not $ j^\prime = j$:


$\displaystyle f^\prime(a) = g^\prime(a) = e^{\sum_{k=0}^K x_{ik}\beta_{kj}} \cdot x_{ik^\prime}$ $\displaystyle \qquad$ $\displaystyle j^\prime = j$  
$\displaystyle f^\prime(a) = 0 \qquad g^\prime(a) = e^{\sum_{k=0}^K x_{ik}\beta_{kj^\prime}} \cdot x_{ik^\prime}$ $\displaystyle \qquad$ $\displaystyle j^\prime \neq j$ (34)

Thus, when $ j^\prime = j$, the partial derivative in Eq. 33 becomes:


    $\displaystyle \frac{\big(1 + \sum_{j=1}^{J-1}e^{\sum_{k=0}^K x_{ik}\beta_{kj}}\...
...{ik^\prime}}{\big(1 + \sum_{j=1}^{J-1}e^{\sum_{k=0}^K x_{ik}\beta_{kj}}\big)^2}$  
$\displaystyle =$   $\displaystyle \frac{e^{\sum_{k=0}^K x_{ik}\beta_{kj}} \cdot x_{ik^\prime} \big(...
..._{kj}}\big)}{\big(1 + \sum_{j=1}^{J-1}e^{\sum_{k=0}^K x_{ik}\beta_{kj}}\big)^2}$  
$\displaystyle =$   $\displaystyle \pi_{ij} x_{ik^\prime} (1-\pi_{ij})$ (35)

and when $ j^\prime \neq j$, they are:


    $\displaystyle \frac{0 - e^{\sum_{k=0}^K x_{ik}\beta_{kj}} \cdot e^{\sum_{k=0}^K...
...{ik^\prime}}{\big(1 + \sum_{j=1}^{J-1}e^{\sum_{k=0}^K x_{ik}\beta_{kj}}\big)^2}$  
$\displaystyle =$   $\displaystyle - \pi_{ij} x_{ik^\prime} \pi_{ij^\prime}$ (36)

We can now express the matrix of second partial derivatives for the multinomial logistic regression model as:


$\displaystyle \frac{\partial^2 l(\beta)}{\partial \beta_{kj} \partial \beta_{k^\prime j^\prime}}$ $\displaystyle =$ $\displaystyle - \sum_{i=1}^N n_i x_{ik} \pi_{ij} (1-\pi_{ij}) x_{ik^\prime} \qquad j^\prime = j$  
  $\displaystyle =$ $\displaystyle \phantom{-} \sum_{i=1}^N n_i x_{ik} \pi_{ij} \pi_{ij^\prime} x_{ik^\prime} \qquad \qquad j^\prime \neq j$ (37)


next up previous
Next: Newton-Raphson Up: Multinomial Logistic Regression Previous: The Model

Scott Czepiel
http://czep.net/contact.html