AI-Assisted Dataset Simplifies Vietnamese Review Analysis

🔊

💬 Ask

A new dataset, called ViMRHP, aims to boost research into understanding how helpful online reviews are in Vietnamese. This is important because most existing datasets focus on English or Indonesian, leaving a gap for languages like Vietnamese. Researchers from the University of Information Technology, Ho Chi Minh City, Vietnam, created the dataset, which covers four different product categories and contains 46,000 reviews for 2,000 products.

One of the key innovations is the use of AI to help label the data, which significantly reduces the time and cost involved. The process involves a two-step approach: first, an AI generates annotations; then, human experts verify and refine these annotations. This method cuts down the annotation time from 90-120 seconds per review to just 20-40 seconds, saving around 65% in costs.

To understand the data, imagine a review for a Vietnamese coffee maker. The review text might say, “The coffee maker brews quickly and is easy to clean. The design is also very sleek!” Accompanying images might show the coffee maker in action and highlight its user-friendly features. The ViMRHP dataset annotates this review based on three key criteria:

Key Aspects: Does the review mention important features of the product? (e.g., brewing speed, ease of cleaning, design). Each aspect is assigned a score, where 1 means no key aspects and 5 means 4 or more key aspects were mentioned.
Decision-Making Advice: Does the review offer advice on whether to buy the product? (e.g., recommending it, cautioning against it, specifying who it’s suitable for). Again, scores from 1 (no advice) to 5 (clear recommendation) are used.
Image Helpfulness: How helpful are the images accompanying the review? Factors like relevance, clarity, illustrative value, and engagement are considered. Like the other criteria, the scores range from 1 to 5.

The dataset’s authors found that while AI assistance greatly speeds up the process, human verification is crucial. This is because AI can sometimes struggle with complex annotation tasks or nuanced contextual understanding. Comparing models trained on AI-generated versus human-verified labels, the team found that human labels yielded better performance in review helpfulness prediction, meaning that the output of the models aligned better with the perceived quality of a given review.

The study highlights the advantages and limitations of using large language models (LLMs) and human annotators in building datasets for natural language processing tasks. The publicly available ViMRHP dataset will enable further exploration in the field of review helpfulness prediction and enhance consumer decision-making in e-commerce platforms within the Vietnamese context.

AI Papers Reader

Personalized digests of latest AI research

AI-Assisted Dataset Simplifies Vietnamese Review Analysis

Chat about this paper