latent Dirichlet allocation (LDA) is a generative probabilistic model of a corpus that allows sets of observations to be explained by unobserved groups which explain why some parts of the data are similar. The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words. It assumes that words are generated by topics and that those topics are infinitely exchangeable within a document, i.e. axchangeability assumption.
It uses variational EM to estimate parameters. Also, it introduces Dirichlet smoothing to avoid the "zero frequency problem" called smoothed LDA. Exact inference is intractable for LDA, but any or a large suite of approximate inference algorithms for inference and parameter estimation can be used with the LDA framework. It Can be viewed as a dimensionality reduction technique.
沒有留言:
張貼留言