The Anatomy of Machine Data

The definition of machine data covers, not surprisingly, all data generated by machines – servers, applications, sensors, web feeds, networks and service platforms. It covers everything from data centers, telecommunications networks and services to machine-to-machine and the Internet of Things in a device-connected world.

The value of machine data is immense. Machine data contains a wealth of information on customer and consumer behavior and location, consumer quality of experience, financial transactions, security and compliance breaches, as well as the state of industrial processes, transportation networks and vehicle health.

Machine Data

Understanding Machine Data in Real-time

Log files are the most common source of machine data today across all industries. But sensor networks and applications are catching up fast and with the growth of machine-to-machine and the Internet of Things. And telecommunications has always been a generator of Big Data with Call Detail Records (CDRs), network equipment performance measurements and subscriber and handset location data.

The following list provides an overview of the different types of machine data that SQLstream can collect and transform into real-time operational intelligence. The list is not exhaustive and is growing every day.

Application Logs
SQLstream’s log adapter turns any log file into a stream of real-time updates. Data from any log file type and format can be collected and analyzed. Common examples include server and cloud infrastructure performance logs, Apache error logs, syslog, firewall security logs and data generated by any application. Complete log files or just updates can be collected, and log data joined, transformed and aggregated as needed.

Application Server Logs
Many application servers including JBoss, Websphere and WebLogic generate log files using standard logging frameworks such as log4j. The data contains critical insights into application and application server operation and performance, but also the transaction information offers insights into business transactions and in particular fraud and security problems.

Call Detail Records for Telecom Services
Call Detail Records (CDRs) and IP Data Records (IPDRs) are generated by telecoms network equipment for every call and session, and contain the information necessary to produce billing records. They also contain information that can be used to determine service quality and customer experience issues, particularly when joined in real-time with GPS and location data.

Clickstream Data
Clickstream data captures users’ activity on websites. It contains valuable information on visitor activity that can be used to alert on customer experience issues, drive real-time ad placement and detect shopping cart abandonments for example.

Deep Packet Inspection (DPI) Probe Data
Deep Packet Inspection (DPI) tools use network probes to extract detailed information on connections, services and sessions. DPI probes generate a vast amount of real-time data that must be captured, correlated and combined into end-to-end connection and service information. The data contains key insights into Quality of Service, capacity management and network performance issues, as well as indicators of security attacks and breaches.

GPS Data
GPS data records the exact position of a device at a specific moment in time. GPS events can be transformed easily into position and movement information, for example for vehicles on a road network, or mobile subscribers with smartphones. Telecommunications, transportation, logistics and telematics rely on the accurate and sophisticated processing of GPS information.

IP Router Syslog
All IP network equipment from the major vendors use syslog formation to capture connection status, capacity information, routing information, failure alerts, security alerts and performance data. When processed in real-time, and when data from all sources can be joined and queried simultaneously, this data provides a unique insight into the operation of the network and offers a platform for predictive analytics and forecasting.

Sensor Data
The availability of low cost, intelligent sensors, coupled with the latest 3G and 4G wireless technology has driven a dramatic increase in the volume of sensor data, but also the need to extract operational intelligence in real-time from the data. Examples include industrial automation plants, smart metering, environmental monitoring and the oil and natural gas industry.

SCADA Data for Industrial Control Systems
Supervisory Control and Data Acquisition (SCADA) is the data management infrastructure for industrial control systems. SCADA systems produce an immense volume of measurement data, status information and failure alerts, and is widely deployed for remote equipment process monitoring across the smart grid, oil and natural gas, transportation and utilities sectors.