Once again we made it down to San Jose, ready for some smarts, some networking, and some bragging because, you know, Amazon (in case you haven’t heard, Amazon Web Services picked none other than Blaze and licensed it for their own Kinesis Analytics—and we’re unabashedly proud of that).
We’ve been talking for a while about the disappointment of the early Big Data solutions mixing concepts, features, and functionalities in an attempt to pin down the overall potential of growing data. But now that everyone has a clearer understanding of the Fast Data and Big Data separation and paradoxical interdependence (streaming vs. stored, unstructured vs. structured), we expected to see the Fast + Big Data mix finally treated as fundamentally an application problem. The supporting ecosystem has begun to catch up, and infrastructure, knowledge, and applications have finally started to address real needs, with real effects- and the result is moving away from store-then-query and into query-then-(maybe)-store.
Here are a few visible trends spotted this year at Strata that evolved or are affected by this transition:
- Kafka et alii are becoming co-responsible for the value in data: as an ever-expanding ecosystem, there is no central point other than the fact that data is valuable for the insight it contains; and so far, extracting the insight has traditionally been assigned to analytics and visualization functions- but things are changing. In order for a person or machine to use the insight in the best way at the best time, it needs to be exhaustive (use all data), available (run seamlessly and continuously), and timely (perform in true real time)- and that diffuses the responsibility not only between all applications, but onto all participating systems. Many talks and solutions presented at Strata were centered about using the right technologies for the right jobs, in order to make it all possible. From streaming ingestion and mining dark data to machine learning and OI applications, much was in the context of let’s give Caesar what belongs to Caesar, and use the right technology for the right job (and since we’re at it, here’s how Blaze works on Kafka)
- The Cloud is where it’s at: although, with small exceptions, talks were centered around pure cloud deployments, it is clear that streaming applications at the edge, in the cloud, and everywhere in between are rising as demanded by the evolving needs of the supporting infrastructures. As data volumes go up, so do storage costs and process latency- so exporting part of the load to a public or private cloud is exactly the type of efficient update needed to attain the continuous, real-time, actionable insight mentioned above. As a result, many talks and company proposals revolved around what we call the 3Cs: conditions, concerns, and costs. From at-the-edge solutions to Amazon’s Kinesis presentation (which highlighted us, because, again, this) to building multi-cloud architectures, it was all about finding the right deployment formula that is safe, seamless, and scalable.
- Application monitoring is as big as application building: As we’re moving into the realm of self-servicing, mission-critical, and real-time-driven operations, failure—with all its meanings—is not an option. Being able to diagnose issues, and doing it while applications are up and running is the only way to maintain business relevance- and many talks this year focused on the operationalization of data in a continuum context. Our own platform is built from the ground up to build, run, and manage streaming applications concurrently, continuously, and in real time, so we were very happy to see the market is finally moving in our direction (for more details on why this is important, check out this Streaming Analytics Wave report from Forrester, where we made the leaders’ circle).
- Specialization: as streaming becomes mainstream, many of the good-at-everything solutions are starting to focus on specific areas- Watson, for example, was initially developed to help with everything from food to cancer. This year it is clear that everyone from simple solutions to complex analytics and to machine learning platforms are becoming increasingly specialized on addressing particular pain points. The Fast + Big Data picture is relatively complete with decisive vertical players and ever-expanding functions, with life sciences, energy, and federal being the most frequent references nowadays (and analytics, cloud, and IoT, respectively).
Apart from these, there were a few more discrete threads, as well (let’s call them the understated Ss):
- SQL: There was never a time when SQL was not popular, but after an image hiatus in the shadow of cool proprietary languages, it is now coming back into the spotlight as the top choice for streaming—which we wholeheartedly support (if it sounds familiar, it’s because we’ve been talking about it for quite a while.)
- Security: Surprisingly, security is still of less focus to participants than one would think, although it’s a relatively well-represented topic by specialists at this event and online. Even so, most discussions revolve around one aspect of the three (data, transaction, and system security), which is baffling given the attention poured over the likes of IoT.
- Self-everything- a big driver in the democratization of data, access to operational insights (measuring performance, troubleshooting, preventing application failures, etc.) has long been notoriously difficult for the business decisioning side of things- and that is changing. So far, quite a few solutions liberalized access to analytics results, but besides our StreamLab, we haven’t seen many that offer BIs the option to design and deploy their own applications.