By Judith Aquino (Source: AdExchanger.com)
Data science is the backbone of numerous ad-tech firms competing in an increasingly technology-driven environment. Explaining what data scientists do is difficult, however, given that their work is often described as a mix of art and science and varies with each company.
AdExchanger asked data scientists from Decide.com, a price research firm acquired by eBay, and location analytics company Placed to describe a problem they’ve solved in a snapshot look at the roles data scientists play. Read on for their (lightly edited) responses.
Who: David Hsu, Principal Engineer at eBay
The challenge: Identifying variants in similar products
“At Decide.com, I led a team of data scientists that worked on a broad range of projects including a prediction algorithm for how product prices will change in the future, a smart product-rating system, a system for finding and associating product news from the Web with relevant products and internal systems (such as product categorization) for organizing our product catalog.
One of the more subtle problems that the data-science team resolved [was] determining which products are variants of each other. In the product catalog of most ecommerce sites, many items can be considered variants of each other such as different-colored versions of baby seats or laptop products that vary in the amount of memory installed or size of the hard drive.
Detecting these variants can improve the shopping experience in multiple ways. For example, within the search results page, variants of the same product can be collapsed together to reduce clutter and increase the diversity of products shown. In addition, product reviews for different variants may be pooled together so that variants receive the same overall rating. Two refrigerators that only differ in color, for example, should not have different review ratings.
Thinking through the product aspects of this problem, two things became apparent that influenced how we tackled the problem:
a. There are no hard and fast rules as to what constitutes a variant. This differs by category and can change over time.
b. The cost of false positives (saying two products are variants of each other when they aren’t) is much worse than the cost of false negatives (missing two potential variants).
Consequently, we decided to pursue an interactive machine-learning approach [along with manually verifying the algorithms] to solve the problem. This approach had the advantage of scalability across categories and accuracy. Developing this system required both developing an automated algorithm for solving the first step, and an interactive dashboard that data specialists can use to effectively perform the second step.
I won’t go into the details of the learning algorithm that we used for the automated clustering, but we ended up using a clustering approach known as hierarchical agglomerative clustering that can find clusters of similar products as long as there is a way to generate a numeric score for how similar two products are. Our similarity score was generated via a machine-learned classifier that was trained on pairs of known variants and known dissimilar products.
The variant relationships our system produced ended up improving the experience of shopping for products on the Decide.com website in a few ways. First, we used these generated variant relationships to ensure that product ratings that Decide.com generated would be common across all variants of the same product. Second, we altered the product search results page to collapse variants of the same products into a single search result entry.
Before if someone searched our site for the ‘Canon DSLR camera,’ we might return hundreds of nearly identical cameras that only differed via what additional accessories were included. Afterwards, we would return one entry for each major product line offered by Canon, along with the option of drilling down on the different variants of each product line.
Finally, the variant relationships that the system discovered powered the ‘other variants’ section of the product detail pages for the decide website. For example, if a customer was looking at our product page for a Canon EOS Rebel T3i SLR camera, they could navigate to additional camera bundles based on the Rebel T3i from the variant pop-up on the product page.”
Who: Weilie Yi, principal scientist at Placed, Inc.
The challenge: Bringing Web-like consumer measurements to the offline world
“With today’s advancements in mobile location technology, understanding the places people visit in the physical world should seem like an easy feat. But that’s far from the case. In fact, … READ MORE…