Research #121

TFKL

Added by Kuiyu Chang over 2 years ago. Updated over 1 year ago.

Status:Closed Start date:2009-07-01
Priority:Normal Due date:2009-12-18
Assignee:Thanh Tam Nguyen % Done:

100%

Category:- Spent time: -
Target version:- Estimated time:50.00 hours

Description

I am opening up this issue to track the progress of the TFKL paper.

As discussed, we need to

  • redo the 2-class problems for TFRF, using 2 classifiers
  • replot the curve for TF.KL using absolute value
  • rewrite the intro, shifting the focus from sentiment analysis to supervised term weighting approaches, which is actually moving towards a feature-selection problem. You should also include some unsupervised term weighting approach like "A compartative study on feature selection in Text Categorization" by yiming yang and jan o pedersen.
  • read and include some more related papers like:
    1. "an improved term weighting scheme for vector space model" by yue-heng sun pi-lian he, zhi-gang chen
    1. seach for "term absence" and text classification in google scholar
  • emphasize the key advantages (selling point) of TF.KL
    1. it considers both term presence and term absence from the class distribution, and
    1. in the case of 2-class problems, only 1 classifier need to be trained, unlike tf.rf, which requires you to train C classifiers for C classes even when C=2.
  • do more experiments on C>2
  • try out TFKL on the (C choose 2) summation formula you showed me for C>2
  • in the literature, compare with other supervised term weighting approach

History

Updated by Thanh Tam Nguyen over 2 years ago

  • % Done changed from 0 to 20

Updated by Kuiyu Chang about 2 years ago

any updates?

Updated by Kuiyu Chang about 2 years ago

  • Due date changed from 2009-07-15 to 2009-12-18

Updated by Kuiyu Chang over 1 year ago

  • Status changed from New to Closed
  • % Done changed from 20 to 100

Also available in: Atom PDF