By Jim DeBardi, CIO, and Justin Long, Sr Systems Administrator, NetCentrics
Federal government Security Operations Center (SOC) and Network Operations Center (NOC) teams are overwhelmed with tools. Dozens, even hundreds are not uncommon, which are designed to monitor and alert on various systems, applications, behaviors and other factors of the IT enterprise environment. This commonly leads to one of two scenarios: 1. being overwhelmed with false positives which desensitize security staff to legitimate alerts such as the famous Target Stores breach, or 2. Not getting alerts to legitimate concerns/breaches. In addition, this also adds a complex learning curve and tedious upkeep of the latest software, sensors, and integration requirements. To address this, AIOps is emerging as a key asset in federal IT teams’ arsenal.
The challenge with existing tools is that they often fail to “talk” to each other to share key data in the interest of the improved prediction, correlation, and resolution of events such as cyber threats and service disruptions. When they do “talk”, they are not doing so in a manner does perform correlation fast enough, meaning critical security issues may be discovered too late. Subsequently, agencies employ scores of SOC/NOC specialists who “stay within their silos,” focused strictly on their own, individual monitoring solutions with no cross-correlating and analysis of the data produced by the tools. These specialists often foster a mentality of ownership, which sometimes leads to possessiveness as well as not lending itself to sharing with other systems. This “legacy” security operation model can greatly benefit from the implementation of processes which incorporate automation, machine learning and analytics to maximize the predictive value of the tools as a collective whole, thus gaining enterprise-wide IT visibility.
Fortunately, Artificial Intelligence for IT Operations (AIOps), first known as algorithmic intelligence, can help agencies address these issues through benefits such as automation, machine learning, and analytics. Gartner originally coined the term AIOps, defining it as a platform that utilizes big data, modern machine learning and other advanced analytics technologies to, directly and indirectly, enhance IT operations (monitoring, automation and service desk) functions with proactive, personal and dynamic insight. AIOps platforms enable the concurrent use of multiple data sources, data collection methods, analytical (real-time and deep) technologies and presentation technologies.
Given the vast range of potentially positive outcomes, AIOps platforms are expected to account for an $11 billion global market by 2023, up from $2.5 billion last year, according to a forecast from MarketsandMarkets.
AIOps is all about enterprise performance management (i.e., monitoring, analyzing and instantly acting on data via end-to-end situational awareness and absolute command and control of network resources). It establishes a “single pane of glass” view of your entire infrastructure so data from every tool is ingested, correlated and analyzed to generate quantitative outputs that tell us how to improve. It launches advanced automation, machine learning, and analytics which inform proactive event management while reducing response times, to protect networks, systems, and devices while ensuring optimal user experiences. It allows teams to acquire a true understanding of possible cyber-attacks, help desk ticket spikes and other SOC/NOC events. It can also leverage the power of elastic and auto-scale cloud computing to be able to compute massive amounts of data in a fraction of the time vs a traditional on a premise data center.
With this, teams and machines do more than just identify root causes; they resolve events proactively with AIOps “self-healing” automation orchestration and deep learning functionality. This also eliminates the traditional binary “if-this-then-that” ruleset. AIOps can truly learn, to the most granular levels, the behavior and patterns of your organization and dynamically adjust its alerts and sensors accordingly, giving a level of insight and security without causing significant end-user experience frustration, which was previously unattainable.
The machine element cannot be understated. As AI innovation takes hold throughout organizations worldwide – dramatically expanding capabilities to accurately and swiftly detect incidents, and then respond – agencies cannot be left behind. Ultimately, AIOps elevates monitoring and data correlation/analytics to a level at which events are treated one in the same: Whether there is an influx of service desk tickets, an isolated incident, a service-affecting an enterprise or a critical business application that appears degraded, or even an unusual surge of traffic from one specific machine after hours: AIOps drives toward the core, using root-cause analysis and actionable intelligence that tells teams what action to take based upon lessons learned, mature processes and recommendations through AIOps in its entirety.
To take this concept even further – by leveraging machine learning and automation to the maximum extent practicable – we are able to address an event without involving human interaction and resolve potential events before they become actual events. AIOps services and solutions will increasingly enable machines to make these decisions and take appropriate action, reducing IT staffing costs for agencies while increasing the performance and uptime of the service. AIOps is already heavily utilized to maintain the largest of computing environments: Azure, AWS, Google Cloud, Oracle Cloud, etc.
AIOps is readily available to government customers via a number of contract vehicles, including DISA ENCORE III, FAA eFast, GSA Schedule 70, Seaport-e, and the C5 Consortium Other Transaction Agreement (OTA). To position an agency for success here, we recommend these critical components/steps:
- Control and management of AIOps solutions and services in a multi-tenant environment with an integrated array of best-fit commercial off-the-shelf (COTS) solutions
- Integrated capabilities across development, deployment, management, monitoring and collaboration platforms on-premise, off-premise and in the cloud
- Detail-driven project management in which every activity is initiated, planned and controlled to meet overall objectives within agreed-upon time and budget constraints
- System integration that is tested and proven before new capabilities are implemented
- Compliance with all security requirements and regulations
- Instruction sets and training plans for new features
The pursuit of enterprise-wide IT visibility has emerged as quite a quest for organizations in general, including government agencies. The accumulation of multiple tools to oversee a growing number of tech functions/areas adds to the complexities and challenges. Yet, as AI continues to advance in terms of its capabilities and impact, we can leverage AIOps solutions and services to drive toward a consolidated, cohesive and completely integrated ecosystem, capturing it all within that long-sought “single pane of glass” view for proactive, effective responses to events.
As a result, agencies are well-positioned to not only address what’s needed now – but what will be needed in the immediate or even longer-term future. That’s what happens when man and machine work together to best benefit the enterprise. Most importantly, eliminating repetitive, manual “labor” work by IT SME’s may be seen as a threat to some, however, this is quite the opposite. By offloading the day to day routine “Ops” monitoring and alerting tasks, your organizations IT Experts are now free to work on the next evolution of services and technologies for your organization.
About the Author
Jim brings 40 years’ experience in the Research & Development, Operations Excellence, and IT fields in contributing to NetCentrics’ technical leadership and its sustained growth. Jim has extensive experience supporting DoD and Federal Government customers. Since the late 1990s, Jim has primarily supported Dept. of Defense clients within the Pentagon reservation and DHS/USCG customers in their move to enterprise-wide security and monitoring systems. As CIO, Jim leads NetCentrics’ IT support services for its internal infrastructure and operations, where he evaluates emerging technology for incorporation into NetCentrics’ infrastructure.
Justin Long has been in the Information Technology industry for 12 years. His current role is the lead Security Operations Manager at NetCentrics Corporation. His responsibilities include managing and operating their corporate environment to include instructing new and emerging technologies in support of their customers. His passion for technology can be seen through the various projects and staff he supports.