Archive for March, 2013

Multi-professional approach to Data Analytics

March 21, 2013 Leave a comment

Data analytics, Enterprise Intelligence, Continuous Assurance, Regression Analysis, Data Life Cycle are terms that you may hear when discussing potential approaches to addressing the Big Data Challenge. Unfortunately, the term “Big Data Challenge” is a misleading one, for it implies that there is only a single problem that needs solution, while in fact there is a number of unique circumstances that companies face, each requiring its own tailored approach. In this post I will aim to highlight the main areas of concern for Big Data specialists and some of the tools that have been developed to address these problems.

Before we begin, it is important to understand that a number of professions aim to fill the need for data analytics capability in business. Accountants, Actuaries, Internal Auditors, External Auditors, Statisticians, Mathematicians, Data warehouse specialists, Programmers, Risk Managers and Consultants, all of these professionals feel the need to contribute to the discussion. As you can imagine there is a great variety of problems faced and each profession has developed its own set of tools to cope with these challenges. Many of the professions struggle to adapt, in many cases statistical analysis has become more prominent, with Statisticians and Actuaries taking a lead and fewer professionals in the accounting field or consulting having the necessary skills. In other cases, professions come into conflict , with some professionals feeling that their domain is being taken over. As such, there is no single way to distinguish underlying domains of the Big Data Challenge, but I will try to do my best to reconcile various views.

What is the Big Data Challenge?

Most commonly, Big Data is described as a an explosion in the amount or frequency of data generated by modern enterprises. However, this is not a useful definition for it describes only the fact of occurrence and not repercussions of such a change.

I would postulate that this data explosion affects us in the following ways:

1. It is harder to find relevant information now, than when data was less abundant, because we need to dedicate more resources to searching.

2. It is harder to ensure consistency and compatibility of records, than when data was less abundant, because there are more ways in which data is collected.

3. It is harder to detect meaningful patterns within the data, than when data was less abundant, because the volume and speed of transactions require additional processing capabilities.

What solutions are out there?

As you can imagine, each organisation has its unique challenges, each challenge has several solutions, depending on the type of data, urgency, market conditions, and even people involved. As such, it is very difficult to create discrete rules that would classify each type of problem and advise a particular solution. This framework is aimed to be a rough guide, rather than a prescription.

1. Getting data warehouses in order and enabling easier access

Believe it or not, but data storage, data accuracy and ease of data access have been a topic of discussion in the computer science profession for decades. Database structure has had a considerable evolutionary history over the past 50 years. In short, databases became quicker, more tolerant to errors and more flexible. Unfortunately, not all organisations have cutting edge databases. A great variety of legacy systems and improper ways of using existing systems introduce number of errors into the datasets, errors that need to be remedied if further analysis is to take place. The explosion in data volumes exacerbated the situation by placing additional volume strain, as well as accuracy and operational requirements (as, for example, is the case for distributed databases). A number of new and established firms responded in a variety of ways to this challenge, either by developing new database technologies or by dedicating more processing and accuracy verification resources. This area has traditionally been addressed by IT professional.

Further reading on this topic can be found here.

2. More advanced and specialised search engines

In a way mini Big Data problems have been around for centuries. When the first printing press was invented, an explosion in print media warranted creation of libraries and subsequent catalog systems. Similar experience gave birth to phonebooks. And Google, in its brilliance, brought order to the informational chaos of the early Internet. Since then several new technologies emerged in order to tackle the challenge of finding the correct piece of information within a cluster of related data. Examples of companies involved in this field include IBM (and its famous Watson Computer), Sinequa (with its unified information access tools), and Recommind (with automatic categorisation tools), just to name a few. Each approach uses different underlying technologies, and if your Big Data problem falls into Search Engine category, you need to do additional research to understand which technology would work best in your circumstances.

3. Pattern recognition and detection – new and old data analysis techniques

Another domain of Big Data is the need (or an opportunity?) to detect patterns within the data with a view of making forward-looking predictions or to detect anomalies. A range of situations where this capability might be useful is virtually limitless and varies from pricing, customer management and production planning to fraud detection and equipment monitoring. However, methods that address this issue fall into three main categories.

First method is data visualisation. This method is very intuitive and appealing, since we can perceive visual information very rapidly. Majority of data visualisation techniques focus around enabling rapid prototyping of visual models, some examples can be found here. These techniques allow to pinpoint outliers and trends, but rely heavily on personal interpretation. Additionally, not all phenomena can be expressed in a visual way, with some patterns taking form of multi-dimensional multi-order relationships. Furthermore, a great deal of training and experience is needed for these visual models to be used correctly. Human brain excels at finding visual patterns, however in some cases it is susceptible to finding false positives, Astrology being one example.

Second method is mathematical modelling. This approach leverages a number of well-known statistical techniques starting from various types of regression and drawing heavily on differential equations. This approach has proven to be effective in a number of applications, such as its integration with ERP systems. However, it is very expensive and complex to implement. The level of mathematical expertise required and specialised nature of the models often restrict application of this method to a high value and high impact projects. Furthermore, most models of this type have limited dynamic flexibility, and if underlying relationships change the model becomes obsolete. As such this method is most appropriate for specialised application in relatively stable environments.

Third method is automated software modelling or sometimes called artificial intelligence modelling. Instead of hiring a team of mathematicians to build a model, several companies are developing software packages that are themselves capable of  choosing what factors are most important in modelling a particular environment. Most notable example of a company engaged in this area is Numenta. While this approach can be orders of magnitude cheaper, compared to traditional statistical approaches, its usefulness rests with high velocity temporal data applications, such as modelling electricity usage, credit card transactions or monitoring equipment status. This software is also capable to dynamically adapt to underlying changes in relationships within data.

Final words

As can be seen from the above list of solutions, the Big Data Challenge is a fragmented problem. Each particular situation demands careful problem classification and selecting appropriate tools to address it. I believe that these tools fall into the three categories described above and that each category is experiencing rapid evolution. The challenge facing many businesses today is navigating through this complex environment, and hopefully this article helps them to do so.

~Alexey Mitko

The Land of Hairless Carpets

March 19, 2013 Leave a comment

During my Bachelor studies my economics professor shared an interesting story, the lessons of which I’m just beginning to grasp. Back in early 1900’s, when vacuum cleaners were a recent invention, many vacuum cleaner producers were competing on the suction power of their devices. In the beginning, competition on the power dimension made perfect sense, after all, the higher suction power provided for better cleaning. Over time and as technology progressed vacuum cleaners grew more powerful and eventually became capable of tearing hairs out of the carpets they cleaned. Unfortunately, consumers didn’t know at what point vacuum cleaners become carpet barbers, thus Consumer Protection Agency had to step in and restrict how vacuum cleaners ought to be marketed.

What is the moral of the story? Initially important, but subsequently outdated competitive dimension may actually siphon your resources, which in turn could have been used for true research and innovation. The mistake of competing on irrelevant factor is often made when company loses the sight of its purpose. If in the example above the purpose of the company was to provide an easy and efficient cleaning tool, then vacuum’s power is an important factor, but to a point. As history has it, eventually dust bags and cyclone vacuums were created, and overall weight of vacuum cleaners was reduced as well. Product innovation cycles through competitive factors, companies that fail to recognise that end up in the land of hairless carpets.

These principles are not limited to vacuum cleaners! Similar cycles can be observed in the mobile phone industry. Every time a new smartphone comes out its hardware is carefully examined. Over the years, cpu power and RAM capacity were legitimate competitive dimensions. If your engineers were able to produce faster, lighter phones, without increasing power consumption, then the resulting improvements contributed directly to customer experience. The end product was more fluid and could boast better graphics experience. But these competitive dimensions have diminishing returns. What if human eye cannot tell the difference between a super definition display and ultra definition one? What if all smartphones on the market are capable of super smooth performance? After all, once response times become minuscule,  even orders of magnitude improvements become had to notice. It is quite possible that current smartphone race is reaching its hardware limits and companies that are not careful may miss the next competitive dimension.


Categories: Alex, Authors, Strategy