Or as Matt Aslett (@maslett) put it on Twitter, “No it hasn’t. Claims that the death of SQL has been predicted have been greatly exaggerated”. This comes out of a recent article on Forbes, As Big Data Booms, SQL Makes a Comeback. Apologies were extended to Mark Twain. However, perhaps the article should be changed from “SQL is emerging” to “SQL has emerged” as the data processing language for Big Data sets. Cloudera’s investment in Impala is proof in point.
SQL does not mean relational
SQL does not imply or indeed require a relational model, it never did, it was just that was what was available. SQL is a recognized standard, perhaps the only true standard, therefore queries (to a large extent) are reusable and transferrable across platforms. But more importantly, SQL is a mathematically tractable language that lends itself readily to automatic distributed query optimization – hence it eliminates the ultimately fruitless manual tweaking of bespoke Java-based and proprietary data processing architectures.
SQL for unstructured data stream processsing
At SQLstream we use standards-compliant SQL queries to process data streams. It’s the same SQL than runs in Oracle and DB2 for example (SQL:2011). Streaming SQL queries however execute continuously (never-ending) processing arriving data as they arrive over time windows. Most streaming data is semi-structured or unstructured – SQL works great for processing log files, sensor feeds, JSON, XML, CSV etc. etc. Why did we choice SQL? In part it is a standard, but primarily as it can be automatically optimized for distributed queries across multiple servers, with the added advantages of lower cost of ownership, platform stability and management, and faster development times. Using SQL, Big Data and stream processing applications can be built in days rather than the months of development effort required for frameworks such as MapReduce and streaming frameworks such as Storm.
SQL for the Enterprise
And that is the core argument in the article. It transpires that few developers have the skills to make effective use of MapReduce, and indeed, why should they with the focus now on SQL query capability over Hadoop. This is an important consideration when it comes to enterprise-class data management systems – it maps directly to cost of ownership. Hadoop server creep is also a consideration for the enterprise – space and cooling is often at a premium in downtown data centers. Hadoop will grow to be enterprise-class through Cloudera and others,and SQL will very likely be to the fore.