Thursday, November 18, 2010

Ensuring safety and process reliabilty through predictive analytics and PMML

Predictive analytics is an integral part of our daily lives. At this very moment, predictive solutions are busy at work, monitoring financial transactions for fraud and abuse, recommending movies and other products, or selecting the next best offer you will get from your favorite store. As much as it permeates our lives today, the application of predictive analytics is bound to increase, especially among data intensive situations and fields such as predictive maintenance.

Predictive maintenance solutions are based on the idea that one is able to know that a machine or equipment is going to fail, and take proactive actions to ensure process reliability and safety. By using data from sensors that capture vibration information from rotating equipment, my team built a predictive maintenance solution that alerted personnel of eminent breakdowns. For that, we used a combination of statistical tools. For example, we used R, an open-source statistical package for data analysis, IBM SPSS Statistics for analysis and model building, and the Zementis ADAPA platform for model deployment. Since all these systems support PMML, the Predictive Model Markup Language, instead of spending time translating code from one system to another, we were able to concentrate on the problem itself and use the tools we trusted the most to get the job done.


PMML is the de facto standard used to represent predictive analytic or data mining models. With PMML, a predictive solution may be built in one system and deployed in another where it can be put to work immediately. The adoption of PMML by the major analytic vendors is a testimony to their commitment to interoperability and the advancement of predictive analytics as a critical factor to the betterment of society. PMML is developed by the Data Mining Group (DMG), a committee composed not only by commercial and open-source analytic companies including IBM, SAS, Zementis, Microstrategy, KNIME and Rapid-I, but also by analytic users such as NASA, Visa, and Equifax.

In the wake of the gulf tragedy, predictive analytics and open standards can provide yet another tool for safe guarding operations and ensuring safety and process reliability. While predictive analytics can offer solutions to alert us of problems before they actually happen, open standards such as PMML are key ingredients for ensuring that the building and deployment of predictive maintenance solutions is application independent and so agile and transparent.

We recently wrote a series of two articles for the IBM developerWorks website that covers PMML and predictive maintenance. To read both articles in their entirety, please refer to the following links:


1)What is PMML? Explore the power of predictive analytics and open standards

2)Representing predictive solutions in PMML: Move from raw data to predictions

Wednesday, November 17, 2010

Examining PMML 4.0: Pre-Processing and Data Manipulation

You may be wondering what is all the fuss around PMML and its 4.0 version. So, we decided to explore all that PMML 4.0 has to offer in a series of blogs. In part I, we will be exploring its improved pre-processing capabilities.

All data mining models manipulate the raw data in a way or another before passing it through a neural network, support vector machine, or regression model. Therefore, a language that wants to represent all the computations that go into a model needs also to be able to represent the data transformations that were applied to the raw data before scoring takes place. PMML is this language! It is the Yin and Yang of data mining.

Let's first re-cap on the pre-processing capabilities available in PMML 3.2. This version of PMML allows for the following out of the box data transformations:
  • Normalization of continuous variables: this is accomplished via the NormContinuous element of PMML. It is mostly used to normalized a variable between 0 and 1. See example below (real PMML code) in which two variables are normalized. The first between 0 and 1 and the second between 0 and 4.
  • Normalizing Categorical Inputs: normally used to transform strings into numerical variables. This is accomplished by the element NormDiscrete. In the PMML example below, a categorical variable creates dummy variables that will be assigned values 1 or 0 depending on the category assumed by the input variable.
  • Discretization: this is used to transform continuous variables into strings. This is accomplished by the Discretize element. In the PMML example below, if the input variable is equal to 500, it is transformed to low; if equal to 5000, it is transformed to medium; and if 50,000, it is high.
  • Value Mapping: this is accomplished in PMML by the use of a mapping table and the element MapValues. To make things more interesting, in the PMML example below, we combine elements MapValues and NormDiscrete to group small sets of categorical values. In specific, we want to find out if the input variable belongs to a specific group of colors. We do that by using MapValues to map different colors to the same number. We then use the element NormDiscrete to create dummy variables which are used to indicate group membership.
  • Arithmetic Expressions: PMML offers a range of arithmetic functions (as well as string and date/time maniputation functions) that can be arranged in different ways to express complex arithmetic expressions. The example below solves the following operation:
ResultVar=maximum(round(InputVar1/3.3),2^(1+log(1.3*InputVar2+1)))

  • PMML 4.0 - Boolean Operations: Not only PMML 4.0 allows for Boolean operations to be fully expressed, but it also allows these to be nested into IF-THEN-ELSE logic. These new buit-in functions offer a vast new array of possibilites for representing data transformations in PMML. So, we devote the rest of this review by looking at transformations that can now be easily expressed in PMML 4.0.
We start with the PMML code below which implements the following logical and arithmetic operations:
IF InputVar1 == "Partner" THEN DerivedVar1 = "P" ELSE DerivedVar2 = 2 * InputVar2



Note that it uses the newly defined 4.0 functions: "if", "equal", and "not" as well as function "*".

The PMML code below assumes that both "then" and "else" parts of the "if" use the same derived variable to implement the following operations:
IF InputVar1 == "Partner" THEN DerivedVar1 = "5.1 * InputVar2" ELSE DerivedVar1 = "InputVar2 / 3.3"

Finally, we end our list of PMML pre-processing examples by showing the use of 4.0 functions "isMissing" and "isIn" combined with function "if". The PMML example below implements the following operations:
IF InputVar is missing THEN DerivedVar = 1 ELSE (IF InputVar is in ("Partner", "Associate", "Colleague") THEN DerivedVar = 2 ELSE DerivedVar = 3)


We finish part I of our PMML tour hoping that this short description of its pre-processing capabilities can help you to easily navigate through all the data transformations available in PMML 4.0.

PMML in Action: A practical look at PMML, the standard to represent data mining models.

It is our pleasure to announce the publication of a new (first) book on PMML:


PMML (Predictive Model Markup Language) is the de facto standard used to represent and share predictive analytic solutions between applications. This enables data mining scientists and users alike to easily build, visualize, and deploy their solutions using different platforms and systems. This book presents PMML from a practical perspective. It contains a variety of code snippets so that concepts are made clear through the use of examples.

PMML in Action is a great way to learn how to represent your predictive models through a mature open standard. The book is divided into six parts, taking you in a PMML journey in which language elements and attributes are used to represent not only modeling techniques but also data transformations.

With PMML, users benefit from a single and concise standard to represent data and models, thus avoiding the need for custom code and proprietary solutions.

You too can join the PMML movement! Unleash the power of predictive analytics and data mining today! Available now on Amazon.com.

Reviews:

"The very first book that covers the industry standard for transferring and integrating predictive models across systems, this is a milestone for predictive analytics. If you want the long and short on engineering for versatility in how predictive models can be deployed and put to work, get started by curling up with this book."
Eric Siegel, Ph.D.
President, Prediction Impact, Inc.
Conference Chair, Predictive Analytics World


"Open standards facilitate innovation and progress (web is a great example). PMML (the Predictive Model Markup Language) is an open standard for predictive analytics and data mining, developed over more than 12 years and supported by most industry leaders. This easy to read book covers data transformations, many modeling methods (Associations, Clustering, Decision Trees, Neural Nets, Regression, SVM, and more), model ensembles, and verification. This book is your essential guide to PMML !"
Gregory Piatetsky, Ph.D.
Editor KDNuggets, Founder KDD/SIGKDD
KDNuggets.com


"Next generation enterprise are going to be driven by analytics, especially predictive analytics. Sharing and rapidly deploying predictive analytic models is essential and PMML is the open standard that delivers the interoperability and agility that these predictive enterprises need."
James Taylor
CEO, Decision Management Solutions
Co-author of “Smart (Enough) Systems: How to Deliver Competitive Advantage by Automating Hidden Decisions ”
JTonEDM.com


PMML in Action may be destined to become an analog to the famous Kernighan and Richie book, "The C Programming Language", published in 1978. This book (affectionately known as K&R) became the standard guide for ANSII C programming practice. I expect that "PMML in Action" will function likewise in the burgeoning development of PMML in analytical tools now, and in the future. It is the "cookbook" for PMML programming. Julia Child made French cuisine kiss-simple for housewives to create. Now, programmers can follow the descriptions and practices in this book to implement analytical solutions in PMML as easily and efficiently as Julia enabled a housewife to make a French soufflé."
Robert A. Nisbet, Ph.D.
Co-author of “Handbook of Statistical Analysis & Data Mining Applications”

Welcome to the World of Predictive Analytics!

© Predictive Analytics by Zementis, Inc. - All Rights Reserved.





Copyright © 2009 Zementis Incorporated. All rights reserved.

Privacy - Terms Of Use - Contact Us