![]() It also became evident to me that this was a hard problem as there were many other “distractors” that looked very similar to MPNs but were not marked as the target (for example, in the above Graphics Processing Unit product, “Gddr5x” looks a lot like other MPNs that existed in the training set). Upon inspection of the data, I found that the MPN was, in almost all cases, present in either the product title or description, if not both. The training data consisted of ~54,000 examples with the following four entries: test data was the same except for the omission of MPN field. The power of Deep Learning is that, provided enough training examples, we can learn these RegEx patterns from the data directly instead of hard-coding them. Here, this RegEx looks for pre-defined characters in fields surrounding the and “.” characters. Here’s an example for finding e-mail addresses: Here’s the basic gist of approaching this problem using RegEx: you hard-code some rules for patterns that you are interested in finding. Then I just wanted to extract the MPN “08GP46180KR” using some representations that learned to distinguish MPNs from other text making up the product title and description. For my purposes, given that I wanted to learn representations, it was enough for me to understand that if I had the following:ĮVGA NVIDIA GeForce GTX 1080 Founders Edition 8gb Gddr5x 08GP46180KR The Problemįrom the competition website, “The objective of this contest is to extract the MPN for a given product from its Title/Description using regex patterns.” Now, I didn’t know what RegEx patterns were, but I could understand the problem of extracting text from a larger text. Permission is however given to share the approach to the solution. Disclaimerīecause this was a winning submission, I cannot share code as per CrowdAnalytix’s Solver’s Agreement. Here I describe my solution that landed me a 4th place position on the public leaderboard. After a cursory look at the data, I saw that there were ~54,000 training examples so I decided to give Deep Learning a chance. The challenge was to extract the Manufacturer Part Number (MPN) from provided product titles and descriptions that were of varying length – a standard RegEx problem. I usually work with image or video data, so this was a refreshing exercise working with text data. Recently I decided to try my hand at the Extraction of product attribute values competition hosted on CrowdAnalytix, a website that allows companies to outsource data science problems to people with the skills to solve them. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |