Data, not algorithms, is key to machine learning success
By boris, January 06, 2016
There has been an explosion in machine learning activity, and Shivon Zilis recently mapped out the current machine intelligence ecosystem as we enter 2016. This is one of the key areas that we’ll be following this year.
While the opportunities here are tremendous, the exuberance surrounding machine learning distracts startups from a key hurdle: it’s data, not algorithms, that will dictate who wins in this space. Algorithms have largely been commoditized by now, so a machine learning company built around publicly accessible data isn’t defensible.
Access to unique data isn’t a problem for incumbents like Google, Facebook, and Amazon. But, startups face a serious chicken and egg problem: they have to convince people to give them data, but the machine intelligence service won’t be useful until people (and a lot of people) are actually using the service and sharing their data. Matt Turck recently wrote about this “cold start” problem in a great post, “The Power of Data Network Effects.”
I’ve seen recommendation services struggle with this challenge. It takes too long to train the bots and users drop off the platform before the service gets valuable.
Does this mean that startups can’t successfully play in the machine learning space? Not necessarily. First of all, there’s plenty of data out there. Likewise, there are plenty of opportunities to leverage machine intelligence to solve a business problem or improve day to day life. The key is to build a strong enough use case that compels people to give up their data before the benefits of machine learning and data network effects really kick in.
Turck talks about the “data trap” strategy where startups build fun and free side apps to start gathering data. A great example is Clarifai, a deep learning and image recognition company. Clarifai’s free consumer app, Forevery, offers instant value for its users, by making it easier to organize and find photos. With each new Forevery user, Clarifai has a bigger data pool to refine its image recognition technology.
Mint.com is another example where users gave access to very valuable and unique data, even though Mint didn’t do much on the machine learning front in the beginning.
The first step is to develop some kind of data acquisition strategy. The next step is to effectively communicate the value proposition. The focus has to be on tangible benefits for the end users, rather than the coolness of the technology platform. In Zilis’ summary of today’s machine learning ecosystem, she noted that “many machine intelligence companies have figured out that they need to speak the language of solving a business problem.” That’s a great sign.
The bottom line is that if startups want to succeed in machine learning, their top priority should be building proprietary data sets. Creating a strong use case is the most effective strategy to get that data set. It’s not easy, but the good news is that it’s incredibly hard to unseat a service once data network effects kick in.
I have been getting a lot of feed-back on the post: while the thesis of data over algorithms is true for most machine learning applications, there are two scenarios in which algorithms are very important: a) situations with limited data in which you try to learn the most in minimum time with minimum data and b) very complex data (e.g. computer vision applications). In both scenarios, deep learning algorithms can make a significant difference.