AI and Cloudomation

Design ohne Titel (4)

Getting started with using Artificial Intelligence (AI) and Machine Learning (ML) can be tricky. What are the most common challenges of using AI and ML? 

Packaged AI Products Rule!

The easiest way to use AI for your business is to use standardised, packaged AI products or services. This can help you get started very quickly with getting first hands-on experience.

One of the biggest players in the AI field is Google. It  provides a number of AI application programming interfaces (APIs): web endpoints through which you can use AI models. Example are the Speech-to-Text and Text-to-Speech APIs: you send audio data and receive a text, or you send a text and receive an audio file. All of this without the need to develop and execute an AI model yourself! 

There are other companies that provide logical APIs. One of my favourites is OpenCalais: a very powerful NLP (natural language processing), named entity recognition and text tagging API. You send a text to this API, and get back a set of standard tags. I can tell you, it is heaven when you need to automatically process and categorise large numbers of text documents.

Where does Cloudomation come in?

Cloudomation helps you tie these AI APIs into your existing processes. An example: do you need to tag documents with an AI API? Cloudomation can extract the documents from your document management system (DMS), send them to the tagging API, and return the documents with their tags into your DMS. All within seconds. 

Faster Time to Production with Custom Models

If you already have your own Data Science team in your company, their focus will be on developing custom models based on your data. Challenges often arise when these models are ready to be put in production. You need to find ways to enable your Data Science team to incorporate their models into the existing processes in your organisation.

The large majority of Data Scientists use one of two programming languages: R and Python. I myself started my career as a Data Scientist and avid R enthusiast. Now, I prefer Python. This is the case with many Data Scientists, who develop skills in both languages during their careers. Without going into detail, each language has their advantages and their disadvantages. The big advantage of Python is that it is a fully-fledged programming language with functionality that goes far beyond “just” Data Science work.

Cloudomation is a platform for code-based automation. On the Cloudomation platform, automation scripts are written in Python. This means that any automation script on the Cloudomation platform is fully compatible with any AI model written in Python. It means that you can incorporate your models into an automated workflow on the Cloudomation platform without any additional effort. 

This is super convenient for light-weight statistical or machine learning (ML) models such as linear regression and cluster models. Any process that is automated on the Cloudomation platform can make use of ML models simply by copy-pasting the Python code for the model into your automation script. 

Imagine how easy this makes it to use your models on a daily basis.  Selection of target groups for marketing campaigns? Classification of customer service requests? No problem!

Complex Models in Production

Heavy-duty models like neural networks or random forests require more computing power and more data than a simple linear regression model. Putting them into production is a bit more complicated, as you cannot run them as easily directly on the Cloudomation platform.

Common challenges include: 

  • Data delivery and preprocessing: AI models need fresh data, in large volumes, in the right format. Often, heavy preprocessing is required before the data is aggregated and cleaned sufficiently to be used by the AI model. 
  • Provision of infrastructure for running AI models – which is expensive, therefore we would like to only provision it when we actually need it.
  • Integration of AI models into existing processes: making the results of the AI model available to the next step in the process, e.g. a customer platform or app.

Data Delivery and Preprocessing

The first problem is often the delivery and preprocessing of data. Selecting, cleaning, restructuring and merging data takes up the majority of a Data Scientists time. It is the less glamorous part of the job and is often seen as an annoying duty rather than something many Data Scientists enjoy.

During model development, Data Scientists often work with sample data. For example, this can be data from one specific day, or from all your customers at one specific point in time. Data Scientists define data preprocessing routines in Python, after which the data is in a format that can be used in their model.

Once it comes to the point where the model should be run regularly, this becomes a problem. The data needs to be delivered on a regular basis for each model run. You need direct connections to your data sources, often several databases, file systems, blob stores, etc., sometimes in different networks, e.g. your private cloud and your in-house network. 

With Cloudomation, it becomes easy to tie these different data sources together, even across network boundaries. The Data Scientist’s Python scripts (or SQL statements) can be used directly in the Cloudomation automation script. The effort from the Data Scientists prototype to a production-ready data delivery pipeline is reduced drastically. 

On-Demand Infrastructure for Model Calculation

When working with large data sets, Data Scientists quickly run into problems with their IT infrastructure: they run out of memory. What do they do? They add memory to their computing cluster. Until they run out of disk space. So they add disk space. Then they run out of CPU, so they add CPU nodes. Maybe they add some GPU nodes. And so on, you get the idea. Until we end up with a very big and very expensive computing cluster.

Which makes the Data Scientist happy, and this is also how it should be: if you want your Data Scientists to develop good big data models, you need to supply them with the processing power they need.

But you don’t have to pay for the computing cluster when your Data Scientists are at home and the cluster sits idle. 

The value of on-demand infrastructure really comes into play with expensive clusters like the ones required to run big data models. With Cloudomation, you can blueprint the infrastructure you want and create it at the push of one button. Yes, one button, not a configuration form nightmare that takes two hours to click through.

This can help you save money in two situations:

  1. On-demand infrastructure for model development: your Data Scientists deploy their mammoth-cluster via Cloudomation when they need it, and remove it again when they go home at night.
  2. On-demand infrastructure for model execution in production: Cloudomation deploys a compute cluster at the time the model execution is scheduled or triggered to run, and removes it again after model calculations have finished. It may be that the cluster only runs for a few minutes a day, saving you a lot of money.

In both cases, you make sure that your models run at optimum performance, your Data Scientists can work productively, while at the same time managing cost.

Now you have your data, your infrastructure, and your model. Only the last and most important step is missing: tying your model into your organisation’s processes.

Integration of Models into Productive Processe

Depending on what the purpose of your model is, you will need to feed its results back into different systems within your organisation. If you have a model that predicts your customer’s churn probability, you will want to feed that into your marketing campaign tool so that you can send those with the highest likelihood of churn a tailored offer to keep them from leaving. If you have a predictive maintenance model, you will want to feed its results into your work planning tool to make sure that those machines predicted to require maintenance actually get their maintenance. 

There are many other processes for which it makes sense to use AI or ML models. Whatever your model does, there are always some steps needed after the model has been calculated to make sure that its results can be used.

This is another point where Cloudomation comes in very handy. Since Cloudomation can integrate with many different systems and tools, you can use it to take the results from your model calculation and forward it to one or several target systems: a cloud CRM tool, an on-premise database, a campaign tool, a maintenance work planner, or any other system.

Bottom Line

Getting started with using Artificial Intelligence (AI) and Machine Learning (ML) in your organisation is challenging. First, you need to decide if you want to use AI services provided by vendors like Google, or if you want to develop your own AI and ML models in-house. If you do build up a Data Science team within your organisation, you need to provide your Data Scientists with sufficient computing power to build good models. And once your team has developed their first models, you need to face the challenge of putting their models into production: supplying data access and preprocessing for each model run, deciding on a schedule or trigger for model calculation, providing infrastructure for the model to run in production, and connecting the model to target systems where the results of the model should be used.

Using a tool like Cloudomation puts you in a position to overcome these challenges by providing you and your Data Science team with a tool set that fits into the way of working of Data Scientists. It helps you leverage your Data Science capabilities, control cost, and reduce the time and effort until your Data Science models can be used in production.