next up previous
Next: Parameter Estimation Up: Binomial Logistic Regression Previous: Binomial Logistic Regression

The Model

Consider a random variable $ Z$ that can take on one of two possible values. Given a dataset with a total sample size of $ M$, where each observation is independent, $ \boldsymbol{Z}$ can be considered as a column vector of $ M$ binomial random variables $ {Z_i}$. By convention, a value of 1 is used to indicate ``success'' and a value of either 0 or 2 (but not both) is used to signify ``failure.'' To simplify computational details of estimation, it is convenient to aggregate the data such that each row represents one distinct combination of values of the independent variables. These rows are often referred to as ``populations.'' Let $ N$ represent the total number of populations and let $ \boldsymbol{n}$ be a column vector with elements $ {n_i}$ representing the number of observations in population $ i$ for $ i$ = 1 to $ N$ where $ {\sum_{i=1}^{N} n_i = M}$, the total sample size.

Now, let $ \boldsymbol{Y}$ be a column vector of length $ N$ where each element $ {Y_i}$ is a random variable representing the number of successes of $ Z$ for population $ i$. Let the column vector $ \boldsymbol{y}$ contain elements $ {y_i}$ representing the observed counts of the number of successes for each population. Let $ \boldsymbol{\pi}$ be a column vector also of length $ N$ with elements $ {\pi_i}$ = $ P(Z_i=1\vert i)$, i.e., the probability of success for any given observation in the $ i^{th}$ population.

The linear component of the model contains the design matrix and the vector of parameters to be estimated. The design matrix of independent variables, $ \boldsymbol{X}$, is composed of $ N$ rows and $ K+1$ columns, where $ K$ is the number of independent variables specified in the model. For each row of the design matrix, the first element $ x_{i0} = 1$. This is the intercept or the ``alpha.'' The parameter vector, $ \boldsymbol{\beta}$, is a column vector of length $ K+1$. There is one parameter corresponding to each of the $ K$ columns of independent variable settings in $ \boldsymbol{X}$, plus one, $ {\beta_0}$, for the intercept.

The logistic regression model equates the logit transform, the log-odds of the probability of a success, to the linear component:

$\displaystyle \log \displaystyle\biggl(\frac{\pi_i}{1-\pi_i}\biggr) = \sum_{k=0}^{K} x_{ik}\beta_k \qquad i = 1, 2, \ldots, N$ (1)


next up previous
Next: Parameter Estimation Up: Binomial Logistic Regression Previous: Binomial Logistic Regression

Scott Czepiel
http://czep.net/contact.html