Research #83

Text Probability Model from Corpus

Added by Kuiyu Chang almost 3 years ago. Updated over 1 year ago.

Status:Rejected Start date:2009-02-09
Priority:Normal Due date:
Assignee:Kuiyu Chang % Done:

0%

Category:- Spent time: -
Target version:- Estimated time:20.00 hours

Description

Although we do not have labeled data, we have enormous amount of review snippets, which we should use to generate various incremental text probability models, which can later be used in classification (spam and sentiment)

Models to generate include:

n-gram (n=1,2,3)
matrix (hops=1,2,3)

Classical IR does not have the luxury of these large number of text, so most of the work done before was based on unigram models.

History

Updated by Kuiyu Chang over 1 year ago

  • Status changed from New to Rejected

Also available in: Atom PDF