With the growing rate of available sensors and the increased data density that is accessible to companies involved in drilling for oil and gas, the processes and approaches used by data management and IT departments, and the impact of advanced analytics on these systems, need to be re-evaluated. As a part of this increase in data reliability and availability, Petrolink sees a parallel increase in variability and data density. When the changes in data are paired with the adoption of richer data analysis through deep machine learning (ML) and artificial intelligence (AI) processes, people must also change their approach to data management or risk success due to failing infrastructure.

Shifting analysis

The first and foremost job of any real-time centric data management infrastructure is to ensure the preservation of that real-time data. A close second responsibility is to make the data available to those processes that are essential to real-time process safety, real-time efficiency, and shared and real-time situational awareness.

With the introduction and adoption of analytical processes into the drilling process, the need to interact at granular levels with the growing data is becoming more urgent. This type of deep and detailed analysis unlocks the valuable insights and patterns that can help to assess potential risks and, most importantly, improve overall functionality and productivity. This is where AI and ML technologies come in. These technologies allow companies to delve deep into massive volumes of data and uncover meaningful insights from them. However, there is an inherent risk. If these processes are pointed at the primary locations of the corporate data, their intense interactions might easily overwhelm the primary function of the data management infrastructures.

One approach is to use copies of the real-time data operating in isolation. These copies are referred to as sandboxes, virtual copies or digital twins by the industry. Their purpose is to provide a subset or copy of the selected data in a safe and controlled environment. ML processes use these digital twins to minimize the risk they have as they design, evolve and grow their algorithms and models. This process typically works through a process of data mining, analyzing and then reviewing findings. In some cases, new data are created, and in others, the data are destroyed or adjusted.

There are two primary risks to the data management infrastructure when ML and AI processes are used: risk to the data itself and risk to the systems that are managing the real-time infrastructure. For example, the machine that is running the database has a limited amount of cycles, so if the central processing unit is flooded with queries it is distracted from its primary job of listening for data.

Scalability of the digital twin

Although a digital twin is a viable solution, it also is part of the problem. As Figure 1 shows, Process A in the ML algorithm needs a large quantity of information while Process B needs completely different information. The result is two simultaneous, very different and intense queries asking for data against the digital twin. Very quickly, the digital twin cannot keep up as it has a physical limit to the number of queries it can accommodate. A natural and logical option is to expand the digital twin—a digital triplet.

Then, as the number of processes in the ML algorithm increases, it creates digital quintuplets and so on. This scenario requires companies to purchase more servers and ultimately is not sustainable as too much time is spent synchronizing information between the systems, and this cost grows exponentially as the number of twins increases. A second challenge is tied to how insights are gained from the various ML processes back into the corporate or production environment where decisions are made.

Making the data usable

Now that critical information is residing in the digital twin, how does one get that information back into the main system where it can be used? One answer is to move the results.

Although this solves the problem of getting the information where it is needed, there is no context on how it was created or who created it. So using that information takes a big leap of faith. The second option is to extend the production environment and include the digital twin in that environment. However, this results in the system now having two sources of truth. The third answer is to move the ML algorithm to the production environment and run it there. This is a viable option and can be achieved by using industry standards such as predictive model makeup language. However, it does slow down the system. None of these three options is a perfect solution that is going to work all of the time.

Getting to automation

ML uses data to help understand what the most likely outcome will be. However, ML is only as accurate as the data it is fed. AI says there are patterns to follow, where it can predict what the outcome will be even without having all of the data to define the scenario. While ML requires constant tuning, AI moves far beyond what is currently known. The path to proper AI is a progression: descriptive, diagnostic, predictive and prescriptive (Figure 2). Companies looking at AI and automation should consider moving in a step process that starts with analytics, which are data-intensive, people-intensive and process- intensive, then move to ML and AI to automation.

Leveraging the power of AI and automation is something that many industries have already tapped into around the world. According to McKinsey & Co., AI is predicted to get even better. Advances in algorithmic research, together with increasingly powerful computer hardware, will allow AI to demonstrate autonomy and creativity.

In the coming years, AI-based machines will find ways to create solutions to complex problems within a given solution space. The drilling industry should be there too.