Home > Alex, Analysis, Authors, Data Analysis, Innovation, Strategy > Multi-professional approach to Data Analytics

Multi-professional approach to Data Analytics

Data analytics, Enterprise Intelligence, Continuous Assurance, Regression Analysis, Data Life Cycle are terms that you may hear when discussing potential approaches to addressing the Big Data Challenge. Unfortunately, the term “Big Data Challenge” is a misleading one, for it implies that there is only a single problem that needs solution, while in fact there is a number of unique circumstances that companies face, each requiring its own tailored approach. In this post I will aim to highlight the main areas of concern for Big Data specialists and some of the tools that have been developed to address these problems.

Before we begin, it is important to understand that a number of professions aim to fill the need for data analytics capability in business. Accountants, Actuaries, Internal Auditors, External Auditors, Statisticians, Mathematicians, Data warehouse specialists, Programmers, Risk Managers and Consultants, all of these professionals feel the need to contribute to the discussion. As you can imagine there is a great variety of problems faced and each profession has developed its own set of tools to cope with these challenges. Many of the professions struggle to adapt, in many cases statistical analysis has become more prominent, with Statisticians and Actuaries taking a lead and fewer professionals in the accounting field or consulting having the necessary skills. In other cases, professions come into conflict , with some professionals feeling that their domain is being taken over. As such, there is no single way to distinguish underlying domains of the Big Data Challenge, but I will try to do my best to reconcile various views.

What is the Big Data Challenge?

Most commonly, Big Data is described as a an explosion in the amount or frequency of data generated by modern enterprises. However, this is not a useful definition for it describes only the fact of occurrence and not repercussions of such a change.

I would postulate that this data explosion affects us in the following ways:

1. It is harder to find relevant information now, than when data was less abundant, because we need to dedicate more resources to searching.

2. It is harder to ensure consistency and compatibility of records, than when data was less abundant, because there are more ways in which data is collected.

3. It is harder to detect meaningful patterns within the data, than when data was less abundant, because the volume and speed of transactions require additional processing capabilities.

What solutions are out there?

As you can imagine, each organisation has its unique challenges, each challenge has several solutions, depending on the type of data, urgency, market conditions, and even people involved. As such, it is very difficult to create discrete rules that would classify each type of problem and advise a particular solution. This framework is aimed to be a rough guide, rather than a prescription.

1. Getting data warehouses in order and enabling easier access

Believe it or not, but data storage, data accuracy and ease of data access have been a topic of discussion in the computer science profession for decades. Database structure has had a considerable evolutionary history over the past 50 years. In short, databases became quicker, more tolerant to errors and more flexible. Unfortunately, not all organisations have cutting edge databases. A great variety of legacy systems and improper ways of using existing systems introduce number of errors into the datasets, errors that need to be remedied if further analysis is to take place. The explosion in data volumes exacerbated the situation by placing additional volume strain, as well as accuracy and operational requirements (as, for example, is the case for distributed databases). A number of new and established firms responded in a variety of ways to this challenge, either by developing new database technologies or by dedicating more processing and accuracy verification resources. This area has traditionally been addressed by IT professional.

Further reading on this topic can be found here.

2. More advanced and specialised search engines

In a way mini Big Data problems have been around for centuries. When the first printing press was invented, an explosion in print media warranted creation of libraries and subsequent catalog systems. Similar experience gave birth to phonebooks. And Google, in its brilliance, brought order to the informational chaos of the early Internet. Since then several new technologies emerged in order to tackle the challenge of finding the correct piece of information within a cluster of related data. Examples of companies involved in this field include IBM (and its famous Watson Computer), Sinequa (with its unified information access tools), and Recommind (with automatic categorisation tools), just to name a few. Each approach uses different underlying technologies, and if your Big Data problem falls into Search Engine category, you need to do additional research to understand which technology would work best in your circumstances.

3. Pattern recognition and detection – new and old data analysis techniques

Another domain of Big Data is the need (or an opportunity?) to detect patterns within the data with a view of making forward-looking predictions or to detect anomalies. A range of situations where this capability might be useful is virtually limitless and varies from pricing, customer management and production planning to fraud detection and equipment monitoring. However, methods that address this issue fall into three main categories.

First method is data visualisation. This method is very intuitive and appealing, since we can perceive visual information very rapidly. Majority of data visualisation techniques focus around enabling rapid prototyping of visual models, some examples can be found here. These techniques allow to pinpoint outliers and trends, but rely heavily on personal interpretation. Additionally, not all phenomena can be expressed in a visual way, with some patterns taking form of multi-dimensional multi-order relationships. Furthermore, a great deal of training and experience is needed for these visual models to be used correctly. Human brain excels at finding visual patterns, however in some cases it is susceptible to finding false positives, Astrology being one example.

Second method is mathematical modelling. This approach leverages a number of well-known statistical techniques starting from various types of regression and drawing heavily on differential equations. This approach has proven to be effective in a number of applications, such as its integration with ERP systems. However, it is very expensive and complex to implement. The level of mathematical expertise required and specialised nature of the models often restrict application of this method to a high value and high impact projects. Furthermore, most models of this type have limited dynamic flexibility, and if underlying relationships change the model becomes obsolete. As such this method is most appropriate for specialised application in relatively stable environments.

Third method is automated software modelling or sometimes called artificial intelligence modelling. Instead of hiring a team of mathematicians to build a model, several companies are developing software packages that are themselves capable of  choosing what factors are most important in modelling a particular environment. Most notable example of a company engaged in this area is Numenta. While this approach can be orders of magnitude cheaper, compared to traditional statistical approaches, its usefulness rests with high velocity temporal data applications, such as modelling electricity usage, credit card transactions or monitoring equipment status. This software is also capable to dynamically adapt to underlying changes in relationships within data.

Final words

As can be seen from the above list of solutions, the Big Data Challenge is a fragmented problem. Each particular situation demands careful problem classification and selecting appropriate tools to address it. I believe that these tools fall into the three categories described above and that each category is experiencing rapid evolution. The challenge facing many businesses today is navigating through this complex environment, and hopefully this article helps them to do so.

~Alexey Mitko

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: