How DataGravity for Virtualization Works
Gain Data Security and Actionable Insights from Your Virtual Environment DataGravity for Virtualization
DataGravity for Virtualization (DGfV) was designed to visualize, secure, and protect your data in a virtual environment. Unlike physical storage or NAS storage, data in a virtual environment is encapsulated in virtual disks attached to a virtual machine (VM).
DGfV monitors, analyzes, and takes action to help you secure and protect your data, based on three key pieces of information within your virtual environment:
People - who is interacting with data?
Content - what is in the data?
Activities - how are people interacting with the data?
With this information and the time element, DGfV is able to surface valuable information about the data in your virtual environment and proactively protect it.
DataGravity for Virtualization Components
DataGravity for Virtualization Virtual Appliance (DGfV)
A software appliance, delivered as an Open Virtual Appliance (OVA), runs within your existing virtual infrastructure. The appliance is self-contained, so there’s no need to create and install additional servers or to configure additional databases to run it. Just deploy the OVA, answer a few basic environmental questions, and you are running in about 10 minutes.
DataGravity Discovery Tools (DGDT)
The DataGravity Discovery Tools install a file system minifilter driver (dgfilter) that provides the ability to tracks user activities such as create/delete/rename/read/write to a specific local user or Active Directory user. The dgfilter also resolves NTFS volume names to Windows drive letters (e.g., C:\ or H:\), making it easy to map file paths to Windows drive letters. The Discovery Tools are installed in the guest OS of each VM that will be added to the DataGravity inventory.
The Initial Discovery Process
The DGfV discovery process enumerates all the VMs in your virtual environment. DGfV collaborates with the VMware vCenter/ ESX server infrastructure and any supported NFS and iSCSI datastores to assemble the list of applicable VMs. The discovery process runs at setup time then periodically to track VMs being added/removed from the virtual environment.
From this information, DGfV discovers all VMs and their associated virtual disks in your environment.
Once all the VMs are discovered, DGfV allows you to choose all VMs, or a subset, to be added to the inventory of VMs to be analyzed, monitored, and protected. VMs can be added or removed from the inventory at any time.
DGfV can reside on the same ESX host as the VMs being analyzed or on a different ESX host - an architecture designed to support both primary and disaster recovery deployments.
Analysis of a VM – Understanding the Content
Upon adding the VMs to the inventory, DGfV analyzes each VM, its content and its activities over time. DGfV understands the internal structures of each VM's guest OS and file systems, so it can efficiently index all of the relevant file metadata residing within the VM. The platform uses a variety of lightweight techniques to process and store the collected information while maintaining a small storage footprint.
All VM-specific data analysis takes place within the DGfV VM, using a read-only snapshot of the VM being analyzed, not the VM itself. VMs that are powered off can be analyzed just as easily as those that are running.
During the initial analysis, all files in the targeted VMs are examined by DGfV. Subsequent analyses analyze only those files that have changed since the previous pass by comparing the file system internal metadata between these two points in time. The platform minimizes its impact on VMs when reading the content by taking an adaptive approach. The frequency at which VMs are analyzed is also configurable. The VM does not need to be powered-on for the analysis to occur.
DGfV stores and processes the metadata and content index for easy correlation, combination, and search to answer multifaceted questions about the data.
Correlating People and Activities
DGfV is integrated with Microsoft Active Directory to facilitate the translation of each user’s security identifier (SID) to his human understandable username.
Dgfilter, which runs within the VM, continuously monitors the VM, forwarding activities to the DGfV virtual appliance to be stored and correlated with content and file information.
The lightweight filter is like a flight recorder, tracking all file activities (i.e. read, write, rename, delete, set ACL) within the VM. The activity is “session-ized” reducing the overhead of the activity stream and the burden induced by a chatty application. For example, a single user opening a file may result in 20 reads by that user in less than a second. DGfV reduces this overhead to a single read in the activity stream.
The activity stream is processed by DGfV and used to trigger alerts of proactive actions when anomalies in user or system behavior are detected. Alerts are customer configurable, and can be triggered for a variety of events. An analysis engine, running within DGfV, evaluates data change activity rates; any time the change rate of a VM is greater than a configurable threshold (such as 3 percent), DGfV triggers a snapshot of the virtual machine.
Snapshots are a common storage feature, whether it is software-defined, a hardware appliance, or embedded in the hypervisor. By analyzing snapshots of the VMs in its inventory, DGfV provides advanced features. For example:
Anomalies in user behavior
Anomalies in user behavior can often be detected by abnormally high file change activity, yet storage can’t tell the difference between normal and malicious behavior (encryption by ransomware, for example, just looks like a high I/O workload). Traditional snapshots are taken on a time-based frequency; but malware doesn’t corrupt your data according to your backup schedule. When DGfV detects unusual activity, it proactively takes a snapshot, providing a frozen-in-time data copy for recovery before more data is damaged.
Detection of changed files and fine-grained recovery
As part of its analysis process, DGfV creates catalogs that provide a fine-grained list of exactly what files changed in each snapshot, by who, and when. Coupled with built-in file restore capabilities, DGfV can recover individual files from a snapshot into a running VM.
Data forensics and search
DGfV employs each VM’s most current snapshot as the source for conducting a full-text indexing of all the files (more than 600 file types) within the VM, providing search ability and pattern identification of the content. Combined with its other information sources, DGfV makes it possible to know what’s changed in the data, by whom, when, where, and how.
Orchestration – Putting it all together
Internally, the DGfV software employs a parallel-processing data analysis engine to derive complex analytics and insights. The analysis engine begins with a set of inputs about people, content, and activities. As part of analytics processing, a broad set of metadata is constructed and organized, combining different facets of the data to answer a wide scope of questions. Predefined rules pre-compute answers for common data security and management questions, allowing complex questions to be answered in seconds, not hours or days.
User filesystem activities are collected through time, resulting in a dynamically updated information store that reflects the current state of the virtual infrastructure as well as its history, allowing insights that are not possible from a static view of filesystem data. Extracting intelligence and insights is performed through a powerful query engine and by applying policies against the information store. These policies are made up of a set of rules, defined by a set of queries against the data and the actions to be executed.
Insights are surfaced in a variety of ways including pre-configured dashboards, user-defined policy-based alerts, interactive and saved search queries, filtering of historic activity streams, as well as click and drill preview of files, their details, and prior versions. Recovery of prior versions may be achieved by selecting from among multiple file versions or perusing time-based catalogs of changed and deleted files.
Impact on the Virtual Environment
The amount of data, and the speed at which it can be analyzed and protected, is dependent on the amount of memory, compute, and disk resources allocated to the DGfV virtual appliance. Resource consumption is also impacted by the demographics of the data and the amount of historical information retained.
In the case that data is heavily text-based, with tens of millions of files and billions of activities, DGfV requires more memory and disk space as compared to the resources in support of a data set of millions of files with a mix of application executables, video, audio, and rich text files.
With adequate resources, the impact on existing virtual workloads is minimal. The initial VM analysis is the most compute-intensive. Techniques are used to stage processing appropriately to take advantage of available resources while minimizing the impact on the infrastructure. After the initial analysis of a VM, only changes since the previous analysis need to be subsequently analyzed.
Anecdotal data from customers suggests disk space overhead of 10 percent of data analyzed. Overhead is higher if the data contains all rich-text files, and lower in the case that the data is primarily executable data and images.
DataGravity for Virtualization provides advanced analytics and insight into the data in your virtual environment - including how that data is being accessed. The platform uses this information to offer the data equivalent of a full MRI scan - a method for obtaining insightful images of opaque objects, with built-in anomaly detection to inform you about your data and automatically take action to protect it.
DataGravity for Virtualization was designed with a deep understanding of virtualization, data, and security. The software’s adaptive design provides for multiple inputs to feed the analysis engine, from which it performs data transformations on the inputs (metadata) to organize, answer and respond to a broad set of data security and management challenges. The policy and action framework allows for deployment- specific customization.
The result is actionable insights designed to help IT, security, and governance, risk, and compliance (GRC) professionals to identify and reduce risk, and proactively protect and secure their companies’ most valuable asset: their data.
Download the DataGravity for Virtualization How it Works (PDF).