Research #83
Text Probability Model from Corpus
| Status: | Rejected | Start date: | 2009-02-09 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | % Done: | 0% |
||
| Category: | - | Spent time: | - | |
| Target version: | - | Estimated time: | 20.00 hours |
Description
Although we do not have labeled data, we have enormous amount of review snippets, which we should use to generate various incremental text probability models, which can later be used in classification (spam and sentiment)
Models to generate include:
n-gram (n=1,2,3)
matrix (hops=1,2,3)
Classical IR does not have the luxury of these large number of text, so most of the work done before was based on unigram models.
History
Updated by Kuiyu Chang over 1 year ago
- Status changed from New to Rejected