Human DCG vs Engagement DCG: Evaluating the Wisdom of the Crowd
Posted: 16 Nov 2021
Date Written: September 21, 2021
Human subject matter expert (SME) relevance ratings have long been used to derive various search relevance precision and recall evaluations. Another source of derived relevance data is based on end customer-facing or wisdom of the crowd result selection. Fortunately, both methods can be compared directly using conventional search precision metrics like Discounted Cumulative Gain (DCG), Precision(k), Mean Average Precision, and Mean Relevance Rank.
Human relevance testing (HRT) is performed by human subject matter experts in blind studies rating query document pairs. Ratings results are then tallied and analyzed both at the query / document pair level and across the entire dataset of all query/document pairs. Wisdom of the crowd metrics assume that customer engagement with a particular document is proportional to that document's relevance. To measure engagement, a model is defined that uses the type and amount of activity performed by the user with the document to create an equivalent relevance rating for the query document pair. This engagement-based rating is used to then calculate the same relevance precision metrics as are calculated in HRT studies.
Over time, assumptions have been made that hDCG and eDCG may be correlated via some additive offset or multiplier. Unfortunately, this has been shown not to be true on a consistent basis. Until now, there has been no known study to compare hDCG and eDCG related search precision results. Using the Search Test Framework (STF), developed at LexisNexis, the technical challenges involved in carrying out such a study have been addressed and resolved. A process for directly comparing the eDCG/hDCG results has been developed and used on actual data. This presentation will discuss the issues involved, the metrics employed, the data used, and an analysis of the results of the study. As a conclusion, we will review the advantages and disadvantages of both methods and make suggestions for creating overall metrics that consider both type of results.
Keywords: Search Relevance Precision, Discounted Cumulative Gain, Subject Matter Experts, Wisdom of the Crowd, Search Precision
Suggested Citation: Suggested Citation