Problems with the implementation of projects of predictive analytics or what happens «after the customer light up»
Iot.ru portal published an article by the Technical Director of Beltel Datanomics, Sergey Shcherbakov, in which he shares his experience in the implementation of projects in predictive analytics.
Since the subject of Big Data analytics, predictive analytics, and in principle the use of AI systems in business processes is quite new, problems are encountered literally at every step.
Everything begins, of course, with mistrust and rejection of the fact that the «iron fool» can advise something worthwhile (an example of objection — «you get the result either absurd or trivial» (c) customer), and ends with the fact that for this moment business in principle does not use such methods for forecasting. What’s more difficult to remake — a person or a business process — is a complex question and in each case is revealed individually.
But the topic of mistrust and rejection at the «before stage», I think, will be revealed next time. The main problems that we face at the stage of piloting and implementing such systems, I’ll try to highlight.
Let’s start from afar — with methodology. There is such a technique as CRISP-DM: Cross Industry Standard Process for Data Mining (1999), an inter-industry typical process for data research. This is a proven in the industry and the most common methodology for analyzing data (the development of the last century!). The main stages of which are:
- Understanding the business task;
- Understanding of data;
- Data preparation;
- Modeling;
- Estimation of model accuracy;
- Integration in the business process.
Let’s say that we decided together with the customer with the business task (although this is also not easy). Next, we face Problem # 1 — Data. This problem is simple and has the following variations:
- No data. Dot. It only remains to joke (but it is sad in fact). Many compare the data with oil, but the phrase from the old anecdote about beer in the USSR comes to my mind more often. The beer was of two kinds: «there is beer» and «there is no beer». It is the same thing with data.
- We have data (which is good news), but the Customer does not store it for long (a month or two or three), and then they are overwritten, erased, the accuracy worsens … please underline as necessary.
This problem cannot be resolved with any mathematical means. The only thing you can recommend is – collect the data. If you do not analyze it now, this does not mean that this cannot be done in the future. How much money you automatically erase at the moment nobody knows and, most importantly, will never know until you start collecting and analyzing that data.
Let’s assume that we have first grade data (that is, the “we have data») and for a statistically significant period. And even omit the fact that sometimes it takes months to get that data (months !!! to unload a large one, but just an excel sheet!). But even after we received this data, we almost always face problem number 2 – its reliability and completeness in different variations. Most often, the data is unreliable due to the notorious human factor — negligence, inattention, fraud or something else. With the completeness of the data, everything is clear. It is the same case as with money: the data is always not enough and it would always be a desire to know something else — whether it is marketing actions of competitors in the past or a dollar exchange rate in the future. So the more data there is, the more accurate the forecast will be.
A small poetic note about incomplete data. The more I work in this field, the more I understand that if you have that infamous «Big Data», then you are the king (the old slogan «who owns information, owns the world» has not been canceled). With Big Data, you can do almost anything and «pull” a lot of useful things that bring money out of it. But in our harsh reality, business people do not have this Big Data itself — it was simply unnecessary to collect, process, store and spend resources on it. Thus, often the task of «Big Data» analytics turns into the task of analytics of what is available. By the way, do you know that neural networks that are loved by many journalists (and, as a consequence, readers) require a huge amount of data for training and, when working on a limited sample of data, show average results?
In general, this is a topic for a separate discussion, but, summing up some results, I want to note that the Small Data problem is known and has a number of mathematical solutions, as well as the methodology of data recovery. All this affects the accuracy of the model, but they do exist.
I will not write much about model building and models accuracy. These are not problems, this is work and even some fun, since most of the models and approaches have been invented long enough, their pros and cons, limitations on applicability for various tasks and etc. are understood. And the main routine questions arise again with the data.
By the way, I have not yet said that all AI (and machine learning is not an exception) 80-90% consist of work with data? So, at the stage of model building, everything revolves around data — their transformation, normalization, purification, addition… as well as searching for additional unaccounted factors, adding them to the common data set with cleaning, transformation, normalization and so on in the cycle, until the result of how model works will not be satisfactory. And where does the fun come from? And everything is very simple — sometimes in the process of working with data, very funny artifacts emerge, sometimes not at all obvious dependencies appear, theories that customers have been believing in for years are being refuted… and here is digging to the roots of this or that phenomenon — this is that particular fun, which we get. Unfortunately, I cannot name any examples because of the NDA signed.
Well, the final problem is the problem of building in the customer’s business process. As experience shows, this problem is solved if not more difficult, then often takes more time than all the math described above. Frankly speaking, most of the problems occur from human factor. For example:
- Optimization of the mathematical solution (i.e. mathematical forecast) for the business task. The fact is that the forecast never has 100% accuracy (if it is, check the data, the model is based on data from the future) and a separate interesting task is to choose a criterion for optimizing this forecast. This is done in order not to improve the accuracy of the model (all we could do, we already did), but in order to increase its efficiency for business. And this is another subject to think, discuss and work. And everything looks clear — mathematicians, people from business, sat together, set optimization criteria, gave out the missing information and did it. But it’s not that simple — have you ever tried to put in the same office math analyst and a middle-level manager / financial expert (or even top-manager)? It sounds like the beginning of an anecdote, and it looks often the same, but point is not to laugh, we need to solve the problem.
- We got an excellent model with excellent results, but the Customer never predicted anything, and for these excellent results he simply does not have a business process (there is an idea, but there is no process … it happens). So we need to move the established bureaucratic machine, rewrite the contracts, buy a bunch of software and hardware, and establish new processes — this is the whole theme of an excellent project … for the next five-year plan. Five-Year Plan? By that time, the AI systems, according to the forecast of some Futurists, will already enslave us, and only the installation of the first forecasting model will be completed.
- In the end, just mistrust to the «iron fool». It also makes mistakes sometimes, maybe less often and not so dramatic (especially if the customer has spent time «thinking» on item 1), but makes mistakes. And infinite comparisons begin » here he was mistaken,» «and here he did not catch,» «and here is a false triggering,» etc.
- Well, a separate case — integration with the customer’s systems. There isn’t a thing we don’t have in out vast country. Although, true to form, brand of the solution does not mean that it is easier or more difficult to integrate with it (that is, to obtain data from it and return forecast values). There are self-written solutions that the customer’s programmers finish in a couple of days, and there are solutions from the most famous brands that are being done for months … Will you say again that it is the «human factor»? Maybe you’re right. I practically did not meet any system, with which it would be impossible to integrate one way or another. The question is how much time and effort it will take.
The one who read this article may have a feeling that everything is bad for everyone with data, people are clumsy, everything is crooked and it is difficult to integrate. In fact, this is not the case. The acquisition of data, its further processing and integration of forecasts into business can and should be done quickly and easily. There should be concerned parties and support from top management.