Predictive Analytics, Big Data, Hadoop, PMML: September 2012

Wednesday, September 26, 2012

Predictions in the Cloud

Moving a predictive model from the data scientist's desktop to the production environment is a "no brainer" with PMML, the Predictive Model Markup Language. Once expressed in PMML, a model can be operationally deployed in minutes.

ADAPA, the Zementis PMML-based scoring engine, allows for predictive models to be put to work in a host of different platforms and systems, including the IBM SmartCloud Enterprise. Since ADAPA is offered on the IBM SmartCloud as a service, users only pay for the service and the capacity on a monthly basis, eliminating the necessity for expensive software licenses and in-house hardware resources.

With PMML and ADAPA on the Cloud, one can deploy a predictive model in minutes anywhere in the world in any of the available data centers. The process of launching a virtual ADAPA server in the IBM SmartCloud corresponds to the traditional scenario of buying hardware and installing it in a server room. The only difference is that the server in this case sits in the cloud, comes with a preinstalled version of ADAPA, and launches in just a few minutes, on-demand and ready to be used. At any given time, you can have one or more instances running. Independent of processing power, each instance type provides a single-tenant architecture. The service is implemented as a private, dedicated instance that encapsulates predictive models and business rules. In this way, access (via HTTPS) to any instance is private. As a consequence, decision files and data never share the same engine with other clients.

Open-standards and cloud computing make it easier for companies to tackle the big data challenge. Predictive analytics is finally delivering on its promise of transforming data into insights and value.

Monday, September 24, 2012

ACM Data Mining Talk: Representing Predictive Solutions with PMML

Dr. Alex Guazzelli's talk on PMML and Predictive Analytics to the ACM Data Mining Bay Area/SF group at the LinkedIn auditorium in Sunnyvale, CA.

Abstract:

Data mining scientists work hard to analyze historical data and to build the best predictive solutions out of it. IT engineers, on the other hand, are usually responsible for bringing these solutions to life, by recoding them into a format suitable for operational deployment. Given that data mining scientists and engineers tend to inhabit different information worlds, the process of moving a predictive solution from the scientist's desktop to the operational environment can get lost in translation and take months. The advent of data mining specific open standards such as the Predictive Model Markup Language (PMML) has turned this view upside down: the deployment of models can now be achieved by the same team who builds them, in a matter of minutes.

In this talk, Dr. Alex Guazzelli not only provides the business rationale behind PMML, but also describes its main components. Besides being able to describe the most common modeling techniques, as of version 4.0, released in 2009, PMML is also capable of handling complex pre-processing tasks. As of version 4.1, released in December 2011, PMML has also incorporated complex post-processing to its structure as well as the ability to represent model ensemble, segmentation, chaining, and composition within a single language element. This combined representation power, in which an entire predictive solution (from pre-processing to model(s) to post-processing) can be represented in a single PMML file, attests to the language's refinement and maturity.

Presentation slides are available for download HERE.

Tuesday, September 18, 2012

Predictive Maintenance Solutions made possible by Big Data, Open Standards, and Analytics

Predictive analytics is an integral part of our daily lives. At this very moment, predictive solutions are busy at work, monitoring financial transactions for fraud and abuse, recommending movies and other products, or selecting the next best offer you will get from your favorite store. As much as it permeates our lives today, the application of predictive analytics is bound to increase. For example, boosted by Big Data and cost efficient processing in the cloud, predictive maintenance applications are on their way towards becoming ubiquitous.

Predictive maintenance solutions are based on the idea that one is able to know that a machine or equipment is going to fail, and take proactive actions to ensure process reliability and safety. By using data from sensors that capture vibration information from rotating equipment, my team built a predictive maintenance solution that alerted personnel of eminent breakdowns. For that, we used a combination of statistical tools. For example, we used R, an open-source statistical package for data analysis, IBM SPSS Statistics for analysis and model building, and the Zementis ADAPA platform for model deployment. Since all these systems support PMML, the Predictive Model Markup Language, instead of spending time translating code from one system to another, we were able to concentrate on the problem itself and use the tools we trusted the most to get the job done.

PMML is the de facto standard used to represent predictive analytics or data mining models. With PMML, a predictive solution may be built in one system and deployed in another where it can be put to work immediately. The adoption of PMML by all the major analytic vendors is a testimony to their commitment to interoperability and the advancement of predictive analytics as a critical factor to the betterment of society. PMML is developed by the Data Mining Group (DMG), a committee composed not only by commercial and open-source analytic companies including IBM, SAS, Zementis, FICO, Salford Systems, Microstrategy, Togaware, KNIME and Rapid-I, but also by analytic users such as NASA, Visa, the San Diego Supercomputer Center, and Equifax.

Predictive analytics and open standards can provide yet another tool for safe guarding operations and ensuring safety and process reliability. While predictive analytics can offer solutions to alert us of problems before they actually happen, open standards such as PMML are key ingredients for ensuring that the building and deployment of predictive maintenance solutions is application independent and so agile and transparent.

We recently wrote a series of two articles for the IBM developerWorks website that covers PMML and predictive maintenance. To read both articles in their entirety, please refer to the following links:

1)What is PMML? Explore the power of predictive analytics and open standards

2)Representing predictive solutions in PMML: Move from raw data to predictions

Wednesday, September 12, 2012

Predictive model deployment and execution made easy with PMML

Developed by the Data Mining Group (DMG), an independent, vendor led committee, PMML provides an open standard for representing data mining models. In this way, models can easily be shared between different applications avoiding proprietary issues and incompatibilities. Currently, all major commercial and open source data mining tools support PMML. These include IBM/SPSS, SAS, KXEN, TIBCO, STATISTICA, Microstrategy, R, KNIME, and RapidMiner (for a list of PMML-compliant tools, see of PMML-powered tools at DMG.org).

PMML is an XML-based language which follows a very intuitive structure to describe data pre- and post-processing as well as predictive algorithms. Not only does PMML represent a wide range of statistical techniques, but it can also be used to represent input data as well as the data transformations necessary to transform raw data into meaningful features.

PMML Conversion

Given that a tool may generate an older version of PMML (earlier than its latests), Zementis has worked out a way to convert older versions of PMML to its latest, version 4.1. This conversion proces is also used to validate a data mining model against the PMML specification for versions 2.0, 2.1, 3.0, 3.1, 3.2, 4.0 and 4.1. If validation is not successful, the conversion process gives back a file containing explanations for why the validation failed as comments embedded in the PMML file.

Before actual conversion takes place, the validation phase needs to be successful, i.e. the model file needs to conform to the PMML specification as published by the DMG (for any of the older PMML versions listed above). For known PMML issues (from a variety of sources/vendors), the conversion process will actually correct the model file so that it can be converted appropriately.

The ADAPA Decision Engine

If you are using the ADAPA Decision Engine (or any of our scoring products), the conversion process described above is automatically executed every time a PMML file is uploaded. By doing that, ADAPA understands PMML files generated by different vendors in all the different PMML versions. Besides syntactic validation, ADAPA also validates PMML from a semantic perspective.

And so, once a model is successfully uploaded in ADAPA, it is syntactically and semantically sound. For more details, click HERE.

You can benefit from ADAPA today by signing up for your private ADAPA instance on the Amazon Cloud or on the IBM SmartCloud. You can also sign up for the ADAPA free trial.

Start executing your models right now!