Feature #1
Prepare review datasets
| Status: | Closed | Start date: | 2008-09-03 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | 2008-11-05 | |
| Assignee: | % Done: | 0% |
||
| Category: | - | Spent time: | - | |
| Target version: | - | Estimated time: | 20.00 hours |
Description
Download some, e.g. Bin Liu has a simple review dataset on mp3 players, etc.
Create or find a dataset containing review and non-review webpages for a few products
Use google or some search engines to download/save a number of webpages given search key "mobile phones", come up with few models, e.g. iphone, nokia
History
Updated by Kuiyu Chang over 3 years ago
- Assignee set to Thanh Tam Nguyen
Updated by Kuiyu Chang over 3 years ago
- Due date changed from 2008-08-27 to 2008-09-10
- Start date changed from 2008-08-20 to 2008-09-03
see if you can get the same dataset from wiebe 2002, and achieve 93.9%
Updated by Kuiyu Chang over 3 years ago
Use a search engine to search for a few models of a particular product, e.g. ipod or nokia phones or obama, osama bin laden, then manually label the subjective docs.
use GATE or Eric Brill's POS tagger or other tools for user annotation.
Updated by Kuiyu Chang over 3 years ago
- Due date changed from 2008-09-10 to 2008-10-30
Using Yahoo Boss, save 10-20 sets of product query hit results (100 for each product) onto your harddrive.
Explore the dataset to come up with summary stats:
e.g. how many products? how many docs altogether?
We need at least 10 products x 100 results = 1000 pages.
Updated by Thanh Tam Nguyen over 3 years ago
I have been using Y!BOSS to search for 12 products (more products will be saved). For each query, Y!BOSS returns 100 search results.
- There are some dead links so the number of search results which has been saved is around 50-100.
- The search results may be in difference languages.
I also tried to use opinionfinder on this draw dataset. The accuracy is not good.
Updated by Kuiyu Chang over 3 years ago
- Due date changed from 2008-10-30 to 2008-11-05
As discussed, please annotate your data with your custom XML tags for just one query (i.e., approx. 50-100 files)
a) XML-annotate all query words in each file, e.g. N95
<ntu query="N95">N 95</query>
b) All true opinion sentences / fragments
<ntu opinion polarity="positive">I hate N95</opinion>
Then we run opinion finder on this dataset (see other issue)
Updated by Kuiyu Chang about 3 years ago
Any updates on your creation of dataset?
Updated by Thanh Tam Nguyen about 3 years ago
Dataset has been collected. Still working on annotation using GATE.
Updated by Kuiyu Chang over 2 years ago
- Status changed from New to Closed
switched focus. closing issue.