By Muqaddas Mehmood

Graph Data Science and Analytics

Graph Data Science is an alternative of analytics that uses an abstraction called a graph model. The accessibility of this model allows rapid consolidation and connection of large volumes of data from many sources in ways that refine the limitations of its source structures (or lack thereof). Graph analytics is an alternative to the traditional data warehouse model as a framework for absorbing both structured and unstructured data from various sources to enable analysts to probe the data in an undirected manner.

Big data analytic systems should enable a platform that can support different analytic techniques that can be adapted in ways to help solve a variety of challenging problems. This suggests that these systems are high performance, elastic distributed data environments that enable the use of creative algorithms to utilize variant modes of data management in ways that differ from the traditional batch-oriented approach to data warehousing.


The Simplicity of Graph Science

Graph analytics is based on a model of representing individual entities and the numerous kinds of relationships that connect those entities. More precisely, it employs the graph abstraction for representing connectivity, consisting of a collection of vertices (which are also referred to as nodes or points) that represent the modeled entities, connected by edges (which are also referred to as links, connections, or relationships) that capture the way the two entities are related.

The flexibility of the model is based on its simplicity. A simple unlabeled undirected graph, in which the edges between vertices neither reflect the nature of the relationship nor indicate their direction, has limited utility.

Among other enhancements, these can enrich the meaning of the nodes and edges represented in the graph model:

· Vertices can be labeled to indicate the types of entities that are related.

· Edges can be labeled with the nature of the relationship.

· Edges can be directed to indicate the “flow” of the relationship.

· Weights can be added to the relationships represented by the edges.

· Additional properties can be attributed to both edges and vertices.

· Multiple edges can reflect multiple relationships between pairs of vertices.


Choosing Graph Analytics

Deciding the appropriate analytics application to a graph an analytics solution instead of the other big data alternatives can be based on the following characteristics and factors of business problems:

· Connectivity: The solution to the business problem requires the analysis of relationships and connectivity between a variety of different types of entities.

· Undirected discovery: Solving the business problem involves iterative undirected analysis to seek out as-of-yet unidentified patterns.

· Absence of structure: Multiple datasets to be subjected to the analysis are provided without any inherent imposed structure.

· Flexible semantics: The business problem exhibits dependence on contextual semantics that can be attributed to the connections and corresponding relationships.

· Extensibility: Because additional data can add to the knowledge embedded within the graph, there is a need for the ability to quickly add in new data sources or streaming data as needed for further interactive analysis.

· Knowledge embedded in the network: Solving the business problem involves the ability to employ critical features of the embedded relationships that can be inferred from the provided data.

· Ad hoc nature of the analysis: There is a need to run ad hoc queries to follow lines of reasoning.

· Predictable interactive performance: The ad hoc nature of the analysis creates a need for high performance because discovery in big data is a collaborative man/machine undertaking, and predictability is critical when the results are used for operational decision making.

Muqaddas Mehmood

Author Muqaddas Mehmood

More posts by Muqaddas Mehmood