Symbols count in article: 7.2kReading time ≈7 mins.
Naive Bayes
Suppose our training set consists of data samples , where are realizations of a random sample that follows unknown joint distribution .
Assumptions:
Features are conditionally independent (Naive bayes assumption):
MLE assumption: Random sample is identically distributed.
Positional independence: The position of features does not matter (used in Multinomial case).
By applying bayes rule (applying on distribution to make things general), we have:
By substituting the assumption:
Since the probability distribution characterised by is constant for any given , we can drop it from the equation because it only changes by a proportion:
Our goal is to find a class that maximize the probability given input :
Bernoulli Naive Bayes
In order to solve for , we need to first find and . Assume that our features are discrete and takes values and we have classes, . Then it is nature to also assume that the conditional distribution of one feature given follows a bernoulli distribution with unknown parameters . That is, for th input () the pmf of th feature given can be written as:
Besides parameter , we have parameter which is defined as the prior probability of class (This is valid because this is part of the joint distribution):
Then, the likelihood function can be written as :
subject to constraint
Then the log likelihood function:
subject to constraint
Taking partial derivative w.r.t :
Taking partial derivative w.r.t and constraint (Lagrange multiplier):
By substituting with :
The result is same if we assume follows Multinomial distribution with constraint on and .
Multinomial Naive Bayes
In multinomial case, our features are counts .
The derivation of Multinomial NB is similar to Bernoulli case, the only difference is that, in multinomial, we assume the conditional distribution of given follows a multinomial distribution. That is, the conditional pmf for th input is defined as:
Where, represents the total counts for sample .
By positional independence:
Then, we can write the likelihood function as:
subject to constraints:
By solving the contrained maximization problem, we have the estimates:
Categorical Naive Bayes
In some cases, features are nominal and have no explicit meaning for their numerical values. Thus, in this case, we can model the pmf of th feature of th example given by categorical pmf:
The likelihood function is therefore:
subject to constraints:
The estimates are:
Gaussian Naive Bayes
Suppose features are continuously distributed , then we can use gaussian distribution to model the conditional distribution of feature:
Then by following similar procedure, we have ML estimates:
deffit(self, X, y): ifisinstance(X, pd.DataFrame): X = X.values
ifisinstance(y, pd.Series): y = y.values
c_ = np.unique(y) n_c = c_.size NT, N_d = X.shape
self.theta = np.zeros((n_c, N_d)) self.class_prior = np.array(np.unique(y, return_counts=True), dtype=np.float64).T self.class_prior[:, 1] = self.class_prior[:, 1] / NT self.class_map = {} for index, c inenumerate(c_): N_c = X[y == c].sum() self.class_map[index] = c for i inrange(N_d): N_ci = X[y == c, i].sum() self.theta[index, i] = np.log((N_ci + self.alpha) / (N_c + self.alpha * N_d))
return self
defpredict(self, X): ifisinstance(X, pd.DataFrame): X = X.values
pred_results = np.zeros(X.shape[0]) class_size = len(self.class_map) for index, i inenumerate(X): temp_lst = np.zeros(class_size) for c inrange(class_size): temp_lst[c] = np.log(self.class_prior[:, 1][c]) for d inrange(X.shape[1]): temp_lst[c] = temp_lst[c] + self.theta[c][d] * i[d]