Analysis of Text Patterns Using Kernel Methods
The first step of the kernel approach is to embed the data items (e.g., documents) into a Euclidean space where the pattens can be represented by a linear relation.
This step reduces many complex problems to a class of linear problems, and algorithms used to solve them are efficient and well understood.
The second step is to detect relations within the embedded data set, using a robust and efficient pattern analysis algorithm.
The main idea is quite similar to SVM (Support Vector Machine).
The first step of SVM is to apply a kernel function and transfer a non-linear problem to linear problem.
Then do classification or regression, and either of them is linear.
Linear algorithms are preferred because of their efficiency and indeed they are well understood, both from a statistical and computational perspective.
Using efficient kernels, we can look for linear relations in very high dimensional spaces at a very low computational cost.
If it is necessary to consider a non-linear map, we are still provided with an efficient way to discover non-linear relations in the data, by using a linear algorithms in a different space.
It is important to highlight that the feature space is not uniquely determined by the kernel function.