Predictive Analytics, Big Data, Hadoop, PMML: 2012

Tuesday, December 11, 2012

Spotlight on Zementis

Zementis, Inc. is a company that makes software for the operational deployment and integration of predictive analytics and data-mining solutions. Its main products are the ADAPA Decision Engine, a platform for statistics and data processing, and the Universal PMML Plug-in for Hadoop and in-database scoring.

The name Zementis, symbolizing "concrete thoughts", is derived from the German word Zement (cement, concrete) and the Latin word Mentis (thought, intellect) and relates to the company's core competence in machine learning and AI.

Road to ADAPA

Founded in 2004 with the goal of providing predictive analytics to the marketplace, Zementis is composed of two main divisions, analytics and engineering. Although it started as a company focused on building predictive models, Zementis scientists soon realized that their models needed a platform in which they could be easily deployed and managed. From this need, the ADAPA Decision Engine came to be.

ADAPA initially supported only neural networks, but it soon became a platform for the deployment of a myriad of statistical techniques as well as data processing (download the ADAPA Product Datasheet for a list of supported techniques). From its inception, ADAPA has been based on open-standards, including PMML, the Predictive Model Markup Language. As a member of the Data Mining Group (DMG), the committee defining PMML, Zementis has helped shaped the standard as it becomes the necessary vehicle for the sharing of predictive solutions between applications.

In 2008, ADAPA was launched as a service on the Amazon Elastic Compute Cloud (Amazon EC2) and is currently being used worldwide by companies and individuals who want to execute their predictive models and decision logic.

In 2012, ADAPA cloud offering was extended to the IBM SmartCloud. In this way, IBM provides companies around the world predictive decisions when and where they are needed.

Universal PMML Scoring Engine - UPPI

Building on the heritage of its ADAPA Decision Engine, Zementis launched the Universal PMML Plug-in (UPPI), a highly optimized, in-database scoring engine for predictive models, fully supporting the PMML standard. With PMML, UPPI delivers a wide range of predictive analytics for high performance scoring. It shortens time to market for predictive models and empowers users through instant deployment of predictive models. UPPI is available for the following DB platforms:

The Universal PMML Scoring Engine is also available for Datameer for scoring in Hadoop.

Zementis Locations

Zementis HQ is located in San Diego in California. It also has an office in Hong Kong for servicing clients in the Asia-Pacific region.

References

R. Nisbet, J. Elder, and G. Miner. Handbook of Statistical Analysis and Data Mining Applications. Academic Press, 2009.

A. Guazzelli, M. Zeller, W. Lin, and G. Williams. PMML: An Open Standard for Sharing Models. The R Journal, Volume 1/1, May 2009.

A. Guazzelli, K. Stathatos, M. Zeller. Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing. The ACM SIGKDD Explorations Newsletter, Volume 11/1, July 2009.

A. Guazzelli, T. Jena, W. Lin, M. Zeller. The PMML Path Towards True Interoperability in Data Mining. In Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2011.

A. Guazzelli, W. Lin, T. Jena (2012). PMML in Action (2nd Edition): Unleashing the Power of Open Standards for Data Mining and Predictive Analytics. CreateSpace.

Wednesday, November 14, 2012

Universal PMML Scoring for Teradata and Aster

Big Data and PMML, the Predictive Model Markup Language, are hot topics these days. But, when combined with in-database scoring, they take a new and powerful meaning. It is then no wonder that Zementis is thrilled to announce its partnership with Teradata, a global leader in data warehousing and analytics.

Teradata and Zementis

Zementis is pleased to announce that its Universal PMML Scoring Engine (UPPI) will soon be available on the Teradata and Aster databases.

Zementis offers a range of products that make possible the deployment of predictive solutions and data mining models built in all the top commercial and open-source data mining vendors. Our products include the ADAPA Decisioning Engine for real-time scoring and UPPI, which is currently available for a host of database platforms as well as Hadoop/Datameer.

With UPPI for Teradata and UPPI for Aster, Zementis is expanding considerably the number of advanced platforms able to combine in-database scoring and data warehousing for rapid, on-the-fly predictive analytics on large volumes of data.

UPPI for Teradata and UPPI for Aster enable analytic enterprises to realize significant business value from new business models and help companies drive both top-line revenue growth and bottom-line cost savings.

Check out the Zementis website for webinars, presentations and product data sheets and to learn more about in-database scoring with UPPI.

Thursday, November 8, 2012

Model Deployment with PMML, the Predictive Model Markup Language

The idea behind this demo is to show you how easy it is to operationally deploy a predictive solution once it is represented in PMML, the Predictive Model Markup Language.

As a model building environment, I use KNIME to generate a neural network model for predicting customer churn. Once data pre-processing and model are represented in PMML, I go on to deploy it in the Amazon Cloud using the ADAPA Scoring Engine and on top of Hadoop using the Universal PMML Plug-in (UPPI) for Datameer. So, the very same model is readily available for execution in two very distinct Big Data platforms: cloud and Hadoop.

The easy of model deployment and interoperability between platforms is the power of PMML, the de facto standard for predictive analytics and data mining models.

Resources:

Download the KNIME workflow used to generate a sample neural network for predicting churn
Download the PMML file created during the demo

Tuesday, November 6, 2012

Big Data and Real-Time Scoring with ADAPA and the Universal PMML Plug-in

PMML, the Predictive Model Markup Language, allows for predictive models to be easily moved into production and operationally deployed on-site, in the cloud, in-database or Hadoop. Zementis offers a range of products that make possible the deployment of predictive solutions and data mining models built in IBM SPSS, SAS, StatSoft STATISTICA, KNIME, SAP KXEN, R, etc. Our products include the ADAPA Scoring Engine and the Universal PMML Plug-in (UPPI).

SOLUTIONS FOR REAL-TIME SCORING AND BIG DATA

ADAPA, the Babylonian god of wisdom, is the first PMML-based, real-time predictive decisioning engine available on the market, and the first scoring engine accessible on the Amazon Cloud and IBM SmartCloud as a service. ADAPA on the Cloud combines the benefits of Software as a Service (SaaS) with the scalability of cloud computing. ADAPA is also available as a traditional software license for deployment on site.

As even the god of wisdom knows, not all analytic tasks are born the same. If one is confronted with massive volumes of data that need to be scored on a regular basis, in-database scoring sounds like the logical thing to do. In all likelihood, the data in these cases is already stored in a database and, with in-database scoring, there is no data movement. Data and models reside together; hence, scores and predictions flow at an accelerated pace. ADAPA’s sister product, the Universal PMML Plug-in (UPPI), is the Zementis solution for Hadoop and in-database scoring. UPPI is available for the IBM Netezza appliance, SAP Sybase IQ, and EMC Greenplum/Pivotal, Teradata and Teradata Aster. It is also available for Hadoop/Datameer.

BROAD SUPPORT FOR PREDICTIVE ANALYTICS AND PMML

ADAPA and UPPI consume model files that conform to the PMML standard, version 2.0 through 4.2. If your model development environment exports an older version of PMML, our products will automatically convert your file into a 4.2 compliant format.

Our products support an extensive collection of statistical and data mining algorithms. These include:

Neural Networks (Back-Propagation, Radial-Basis Function, and Neural-Gas)
Regression Models (Linear, Polynomial, and Logistic)
General Regression Models (General Linear, Ordinal Multinomial, Generalized Linear, Cox)
Support Vector Machines (for regression and multi-class and binary classification)
Decision Trees (for classification and regression)
Scorecards (including support for reason codes and complex attributes)
Association Rules
Ruleset Models (flat Decision Trees)
Clustering Models (Distribution-Based, Center-Based, and 2-Step Clustering)
Naive Bayes Classifiers
Multiple Models (model composition, chaining, segmentation, and ensemble - including Random Forest Models and Stochastic Boosting)

A myriad of functions for implementing data pre- and post-processing are also supported, including:

Text Mining (introduced in PMML 4.2)
Regular Expressions
Value Mapping
Discretization
Normalization
Scaling
Logical and Arithmetic Operators
Conditional Logic
Built-in Functions
Lookup Tables
Business Decisions and Thresholds
Custom Functions ... and much much more

Contact us today!

Visit us on the web: www.zementis.com

Or send us an e-mail at info@zementis.com

Wednesday, October 31, 2012

When Big Data and Predictive Analytics Collide

Big Data is usually defined in terms of Volume, Variety and Velocity (the so called 3 Vs). Volume implies breadth and depth, while variety is simply the nature of the beast: on-line transactions, tweets, text, video, sound, ... Velocity, on the other hand, implies that data is being produced amazingly fast (according to IBM, 90% of the data that exists today was generated in the last 2 years), but that it also gets old pretty fast. In fact, a few data varieties tend to age quicker than others.

To be able to tackle Big Data, systems and platforms need to be robust, scalable, and agile.

It is in this context that IntelliFest 2012 came to be. The conference theme this year was "Intelligence in the Cloud", exploring the use of applied AI in cloud computing, mobile apps, Big Data, and many other application areas. Among several amazing speakers at Intellifest were Stephen Grossberg from Boston University, Rajat Monga from Google, Carlos Serrano-Morales from Sparkling Logic, Paul Vincent from TIBCO, and Alex Guazzelli from Zementis.

Dr. Alex Guazzelli's talk on Big Data, Predictive Analytics, and PMML is now available for on-demand viewing on YouTube. The abstract follows below, together with several resources including the presentation slides and files used in the live demo.

Abstract:

Predictive analytics has been used for many years to learn patterns from historical data to literally predict the future. Well known techniques include neural networks, decision trees, and regression models. Although these techniques have been applied to a myriad of problems, the advent of big data, cost-efficient processing power, and open standards have propelled predictive analytics to new heights.

Big data involves large amounts of structured and unstructured data that are captured from people (e.g., on-line transactions, tweets, ... ) as well as sensors (e.g., GPS signals in mobile devices). With big data, companies can now start to assemble a 360 degree view of their customers and processes. Luckily, powerful and cost-efficient computing platforms such as the cloud and Hadoop are here to address the processing requirements imposed by the combination of big data and predictive analytics.

But, creating predictive solutions is just part of the equation. Once built, they need to be transitioned to the operational environment where they are actually put to use. In the agile world we live today, the Predictive Model Markup Language (PMML) delivers the necessary representational power for solutions to be quickly and easily exchanged between systems, allowing for predictions to move at the speed of business.

This talk will give an overview of the colliding worlds of big data and predictive analytics. It will do that by delving into the technologies and tools available in the market today that allow us to truly benefit from the barrage of data we are gathering at an ever-increasing pace.

Resources:

Download the presentation slides
Download the KNIME workflow used to generate a sample neural network for predicting churn
Download the PMML file created during the demo

Wednesday, October 17, 2012

Big data insights through predictive analytics, open-standards and cloud computing

Organizations increasingly recognize the value that predictive analytics and big data offer to their business. The complexity of development, integration, and deployment of predictive solutions, however, is often considered cost-prohibitive for many projects. In light of mature open source solutions, open standards, and SOA principles we propose an agile model development life cycle that quickly leverages predictive analytics in operational environments.

Starting with data analysis and model development, you can effectively use the Predictive Model Markup Language (PMML) standard, to move complex decision models from the scientist's desktop into a scalable production environment hosted in the cloud (Amazon EC2 and IBM SmartCloud Enterprise).

Expressing Models in PMML

PMML is an XML-based language used to define predictive models. It was specified by the Data Mining Group, an independent group of leading technology companies including Zementis. By providing a uniform standard to represent such models, PMML allows for the exchange of predictive solutions between different applications and various vendors.

Open source PMML-compliant statistical tools such as R, KNIME, and RapidMiner can be used to develop data mining models based on historical data. Once models are exported into a PMML file, they can then be imported into an operational decision platform and be ready for production use in a matter of minutes.

On-Demand Predictive Analytics

Both Amazon and IBM offer a reliable and on-demand cloud computing infrastructure on which we offer the ADAPA® Predictive Decisioning Engine based on the Software as a Service (SaaS) paradigm. ADAPA imports models expressed in PMML and executes these in batch mode, or real-time via web-services.

Our service is implemented as a private, dedicated instance of ADAPA. Each client has access to his/her own ADAPA Engine instance via HTTP/HTTPS. In this way, models and data for one client never share the same engine with other clients.

The ADAPA Web Console

Each instance executes a single version of the ADAPA engine. The engine itself is accessible through the ADAPA Web Console which allows for the easy managing of predictive models and data files. The instance owner can use the console to upload new models as well as score or classify records on data files in batch mode. Real-time execution of predictive models is achieved through the use of web-services. The ADAPA Console offers a very intuitive interface which is divided into two main sections: model and data management. These allow for existing models to be used for generating decisions on different data sets. Also, new models can be easily uploaded and existing models can be removed in a matter of seconds.

Predicting in the Cloud

Using a SaaS solution to break down traditional barriers that currently slow the adoption of predictive analytics, our strategy translates predictive solutions into operational assets with minimal deployment costs and leverages the inherent scalability of utility computing.

In summary, ADAPA revolutionizes the world of predictive analytics and cracks the big data code, since it allows for:

Cost-effective and reliable service based on two outstanding cloud computing infrastructures: Amazon and IBM.

Secure execution of predictive models through dedicated and controlled instances including HTTPS and Web-Services security

On-demand computing. Choice of instance type and launch of multiple instances.

Superior time-to-market by providing rapid deployment of predictive solutions and an agile enterprise decision management environment.

Monday, October 8, 2012

ADAPA in the Cloud: Feature List

Broad support for predictive algorithms

ADAPA supports an extensive collection of statistical and data mining algorithms. These are:

Ruleset Models (flat Decision Trees)
Clustering Models (Distribution-Based, Center-Based, and 2-Step Clustering)
Decision Trees (for classification and regression) together with multiple missing value handling strategies (Default Child, Last Prediction, Null Prediction, Weighted Confidence, Aggregate Nodes)
Naive Bayes Classifiers
Association Rules
Neural Networks (Back-Propagation, Radial-Basis Function, and Neural-Gas)
Regression Models (Linear, Polynomial, and Logistic) and General Regression Models (General Linear, Ordinal Multinomial, Generalized Linear, Cox)
Support Vector Machines (for regression and multi-class and binary classification)
Scorecards (including reason codes and point allocation for categorical, continuous, and complex attributes)
Multiple Models (Segmentation, Ensembles - including Random Forest Models and Stochastic Boosting, Chaining and Model Composition)

Model interfaces: pre- and post-processing

Additionally, ADAPA supports a myriad of functions for implementing data pre- and post-processing. These include:

Text Mining
Value Mapping
Discretization
Normalization
Scaling
Logical and Arithmetic Operators
Business Rules
Lookup Tables
Regular Expressions
Custom Functions

and much much more.

If you think of anything ADAPA cannot do or something else you need to do in terms of data manipulation, let us know.

Automatic conversion (and correction) for older versions of PMML

ADAPA consumes model files that conform to PMML, version 2.0 through 4.2. If your model development environment exports an older version, ADAPA will automatically convert your file into a 4.2 compliant format. It will also correct a number of common problems found in PMML generated by some popular modeling tools, allowing the models to work as intended.

Web-based management and interactive execution of predictive models and business rules

Model management: Models and rule sets are deployed and managed through an intuitive, Web-based management console, the ADAPA Console.

Model verification: The ADAPA Console includes a model validation test, allowing models to be verified for correctness. By providing ADAPA a test file containing input data and expected results for a model, the engine will report any deviations from expected results, greatly enhancing traceability of errors and debugging of model deployment issues. The console also provides easy access to our rules testing framework in which business rules are submitted to regression testing and acceptance.
Batch-scoring: The console also provides functionality to upload a (compressed) CSV data file and batch-scores it against any of the deployed models. Results are returned in the same format and may be downloaded for further processing and visualization.

Simplified integration via SOA

Service Oriented Architecture (SOA) principles simplify integration with existing IT infrastructure. Since ADAPA publishes all deployed models as a Web-Service, you can score data records from within your own environment. With the simple execution of a web service call (SOAP or REST), you are able to leverage the power of predictive models and business rules on-demand or in real-time.

Data scoring from inside Excel

The ADAPA Add-in for Microsoft Office Excel 2007, 2010, and 2013 allows you to easily score data using ADAPA on the Cloud. Once the Add-in is installed, all you need to do is to select your data in Excel, connect to ADAPA and start scoring right away. Your predictions will be made available as new columns.

On-demand predictive analytics solution

ADAPA in the Cloud is a fully hosted Software-as-a-Service (SaaS) solution. You only pay for the service and the capacity that is used, eliminating the necessity for expensive software licenses and in-house hardware resources. As the business grows, ADAPA in the Cloud provides a cost-effective expansion path, for example, by adding multiple ADAPA instances for scalability or failover. The SaaS model removes the burden for you to manage a scalable, on-demand computing infrastructure.

Private instance for all your decisioning needs

We provide you with a single-tenant architecture. The service is implemented as a private, dedicated instance of ADAPA that encapsulates your predictive models and business rules. Only you have access to your private ADAPA instance(s) via HTTPS. Your decisioning files and data never share the same engine with other clients.

Trusted, secure, scalable cloud infrastructure

Zementis leverages FICO and Amazon EC2 for providing on-demand infrastructure for ADAPA in the Cloud. Cloud computing offers utility computing with virtually unlimited scalability.

Friday, October 5, 2012

Seamless Integration of Predictive Analytics and Business Rules

Operational deployment of predictive solutions includes exporting the data mining models you built in SAS, IBM SPSS, STASTISTICA, KNIME, R, ... into PMML, the Predictive Model Markup Language. Once in PMML standard, these models can be easily moved into production: on-site, in the cloud, Hadoop or in-database. Zementis offers a range of products that make this possible. These include the ADAPA Decisioning Engine and the Universal PMML Plug-in. Besides providing a predictive analytics engine, ADAPA also encapsulates a rules engine which allows for predictive models to be seamlessly integrated with business rules.

In this demo, we show a pre-qualification app that uses predictive models and rules to analyze the risk of mortgage default on loan applications. An application is accepted or referred for a variety of loan products depending on its perceived risk. ADAPA is the engine driving this application in the back-end.

Once logged in we use the ADAPA Web to download the mortgage solution files which are used throughout the demo. Predictive models expressed in PMML format are uploaded and verified in ADAPA along with rulesets expressed in tabular format. The ADAPA Web Console is used for managing predictive models, rulesets, and resource files as well as for batch-scoring. Real-time scoring is obtained via web-services or the Java API.

Finally, we show how the ADAPA Add-in for Excel is used to score data directly from within Excel. This part of the demo features the scoring of loan and tax data as well as the visualization of results via dashboards.

Tuesday, October 2, 2012

Amazing In-database Analytics with PMML and UPPI

Not all analytic tasks are born the same. If one is confronted with massive volumes of data that need to be scored on a regular basis, in-database scoring sounds like the logical thing to do. In all likelihood, the data in these cases is already stored in a database and, with in-database scoring, there is no data movement. Data and models reside together hence scores and predictions flow on an accelerated pace.

A new day has come!

Zementis is now offering its amazing Universal PMML Plug-in™ (UPPI) for in-database scoring for the IBM Netezza appliance, SAP Sybase IQ, EMC Greenplum, Teradata and Teradata Aster.

Amazing! Why?

For starters, it won't break your budget (feel free to contact us for details). Also, it is simple to deploy and maintain. Our Universal PMML Plug-in was designed from the ground up to take advantage of efficient in-database execution. Last but not least, as its name suggests, it is PMML-based. PMML, the Predictive Model Markup Language is the standard for representing predictive models currently exported from all major commercial and open-source data mining tools. So, if you build your models in either SAS, IBM SPSS, STATISTICA, or R, you are ready to start benefiting from in-database scoring right away.

The PMML plugin seamlessly embeds models within your database. Data scoring requires nothing more than adding a simple function call into your SQL statements. You can score data against one model or against multiple models at the same time. There is no need to code complex data transformations and calculations in SQL or stored procedures. PMML and our Universal Plug-in can easily take care of that.

Modeling techniques currently supported are:

Neural Networks
Support Vector Machines
Naive Bayes Classifiers
Ruleset Models
Clustering Models (including Two-Step Clustering)
Decision Trees
Regression Models (including Cox Regression Models)
Scorecards (including reason codes)
Association Rules
Multiple Models (model composition, chaining, segmentation, and ensemble - including Random Forest models)

As well as extensive data pre- and post-processing capabilities.

In addition to all these predictive techniques, UPPI accepts PMML models of all versions (2.0, 2.1, 3.0, 3.1, 3.2, 4.0, 4.1 and 4.2) generated by any of the major commercial and open source mining tools (SAS, SPSS/IBM, STATISTICA, MicroStrategy, Microsoft, Oracle, KXEN, Salford Systems, TIBCO, R/Rattle, KNIME, RapidMiner, etc.). It does not get more universal than this!

Wednesday, September 26, 2012

Predictions in the Cloud

Moving a predictive model from the data scientist's desktop to the production environment is a "no brainer" with PMML, the Predictive Model Markup Language. Once expressed in PMML, a model can be operationally deployed in minutes.

ADAPA, the Zementis PMML-based scoring engine, allows for predictive models to be put to work in a host of different platforms and systems, including the IBM SmartCloud Enterprise. Since ADAPA is offered on the IBM SmartCloud as a service, users only pay for the service and the capacity on a monthly basis, eliminating the necessity for expensive software licenses and in-house hardware resources.

With PMML and ADAPA on the Cloud, one can deploy a predictive model in minutes anywhere in the world in any of the available data centers. The process of launching a virtual ADAPA server in the IBM SmartCloud corresponds to the traditional scenario of buying hardware and installing it in a server room. The only difference is that the server in this case sits in the cloud, comes with a preinstalled version of ADAPA, and launches in just a few minutes, on-demand and ready to be used. At any given time, you can have one or more instances running. Independent of processing power, each instance type provides a single-tenant architecture. The service is implemented as a private, dedicated instance that encapsulates predictive models and business rules. In this way, access (via HTTPS) to any instance is private. As a consequence, decision files and data never share the same engine with other clients.

Open-standards and cloud computing make it easier for companies to tackle the big data challenge. Predictive analytics is finally delivering on its promise of transforming data into insights and value.

Monday, September 24, 2012

ACM Data Mining Talk: Representing Predictive Solutions with PMML

Dr. Alex Guazzelli's talk on PMML and Predictive Analytics to the ACM Data Mining Bay Area/SF group at the LinkedIn auditorium in Sunnyvale, CA.

Abstract:

Data mining scientists work hard to analyze historical data and to build the best predictive solutions out of it. IT engineers, on the other hand, are usually responsible for bringing these solutions to life, by recoding them into a format suitable for operational deployment. Given that data mining scientists and engineers tend to inhabit different information worlds, the process of moving a predictive solution from the scientist's desktop to the operational environment can get lost in translation and take months. The advent of data mining specific open standards such as the Predictive Model Markup Language (PMML) has turned this view upside down: the deployment of models can now be achieved by the same team who builds them, in a matter of minutes.

In this talk, Dr. Alex Guazzelli not only provides the business rationale behind PMML, but also describes its main components. Besides being able to describe the most common modeling techniques, as of version 4.0, released in 2009, PMML is also capable of handling complex pre-processing tasks. As of version 4.1, released in December 2011, PMML has also incorporated complex post-processing to its structure as well as the ability to represent model ensemble, segmentation, chaining, and composition within a single language element. This combined representation power, in which an entire predictive solution (from pre-processing to model(s) to post-processing) can be represented in a single PMML file, attests to the language's refinement and maturity.

Presentation slides are available for download HERE.

Tuesday, September 18, 2012

Predictive Maintenance Solutions made possible by Big Data, Open Standards, and Analytics

Predictive analytics is an integral part of our daily lives. At this very moment, predictive solutions are busy at work, monitoring financial transactions for fraud and abuse, recommending movies and other products, or selecting the next best offer you will get from your favorite store. As much as it permeates our lives today, the application of predictive analytics is bound to increase. For example, boosted by Big Data and cost efficient processing in the cloud, predictive maintenance applications are on their way towards becoming ubiquitous.

Predictive maintenance solutions are based on the idea that one is able to know that a machine or equipment is going to fail, and take proactive actions to ensure process reliability and safety. By using data from sensors that capture vibration information from rotating equipment, my team built a predictive maintenance solution that alerted personnel of eminent breakdowns. For that, we used a combination of statistical tools. For example, we used R, an open-source statistical package for data analysis, IBM SPSS Statistics for analysis and model building, and the Zementis ADAPA platform for model deployment. Since all these systems support PMML, the Predictive Model Markup Language, instead of spending time translating code from one system to another, we were able to concentrate on the problem itself and use the tools we trusted the most to get the job done.

PMML is the de facto standard used to represent predictive analytics or data mining models. With PMML, a predictive solution may be built in one system and deployed in another where it can be put to work immediately. The adoption of PMML by all the major analytic vendors is a testimony to their commitment to interoperability and the advancement of predictive analytics as a critical factor to the betterment of society. PMML is developed by the Data Mining Group (DMG), a committee composed not only by commercial and open-source analytic companies including IBM, SAS, Zementis, FICO, Salford Systems, Microstrategy, Togaware, KNIME and Rapid-I, but also by analytic users such as NASA, Visa, the San Diego Supercomputer Center, and Equifax.

Predictive analytics and open standards can provide yet another tool for safe guarding operations and ensuring safety and process reliability. While predictive analytics can offer solutions to alert us of problems before they actually happen, open standards such as PMML are key ingredients for ensuring that the building and deployment of predictive maintenance solutions is application independent and so agile and transparent.

We recently wrote a series of two articles for the IBM developerWorks website that covers PMML and predictive maintenance. To read both articles in their entirety, please refer to the following links:

1)What is PMML? Explore the power of predictive analytics and open standards

2)Representing predictive solutions in PMML: Move from raw data to predictions

Wednesday, September 12, 2012

Predictive model deployment and execution made easy with PMML

Developed by the Data Mining Group (DMG), an independent, vendor led committee, PMML provides an open standard for representing data mining models. In this way, models can easily be shared between different applications avoiding proprietary issues and incompatibilities. Currently, all major commercial and open source data mining tools support PMML. These include IBM/SPSS, SAS, KXEN, TIBCO, STATISTICA, Microstrategy, R, KNIME, and RapidMiner (for a list of PMML-compliant tools, see of PMML-powered tools at DMG.org).

PMML is an XML-based language which follows a very intuitive structure to describe data pre- and post-processing as well as predictive algorithms. Not only does PMML represent a wide range of statistical techniques, but it can also be used to represent input data as well as the data transformations necessary to transform raw data into meaningful features.

PMML Conversion

Given that a tool may generate an older version of PMML (earlier than its latests), Zementis has worked out a way to convert older versions of PMML to its latest, version 4.1. This conversion proces is also used to validate a data mining model against the PMML specification for versions 2.0, 2.1, 3.0, 3.1, 3.2, 4.0 and 4.1. If validation is not successful, the conversion process gives back a file containing explanations for why the validation failed as comments embedded in the PMML file.

Before actual conversion takes place, the validation phase needs to be successful, i.e. the model file needs to conform to the PMML specification as published by the DMG (for any of the older PMML versions listed above). For known PMML issues (from a variety of sources/vendors), the conversion process will actually correct the model file so that it can be converted appropriately.

The ADAPA Decision Engine

If you are using the ADAPA Decision Engine (or any of our scoring products), the conversion process described above is automatically executed every time a PMML file is uploaded. By doing that, ADAPA understands PMML files generated by different vendors in all the different PMML versions. Besides syntactic validation, ADAPA also validates PMML from a semantic perspective.

And so, once a model is successfully uploaded in ADAPA, it is syntactically and semantically sound. For more details, click HERE.

You can benefit from ADAPA today by signing up for your private ADAPA instance on the Amazon Cloud or on the IBM SmartCloud. You can also sign up for the ADAPA free trial.

Start executing your models right now!

Friday, August 31, 2012

Zementis is proud to announce PMML 4.1 support

PMML 4.1, the latest version of the Predictive Model Markup Language, is loaded with new and powerful features.

Zementis is proud to announce support for PMML 4.1 throughout its scoring products, including:

ADAPA on Site

ADAPA on the Cloud (Amazon and IBM SmartCloud)

UPPI for in-database scoring (IBM Netezza, SAP Sybase IQ, EMC Greenplum)

UPPI for Hadoop (Datameer).

We have also updated our PMML conversion process so that it now converts PMML files from older versions to version 4.1. In this way, every time a PMML file is presented to ADAPA or UPPI, it is automatically converted to PMML 4.1.

Our support for PMML 4.1 includes:

1) Scorecards (including reason or adverse codes and point allocation for complex attributes)

2) Post-processing: you can now transform scores into business decisions as well as output generic data manipulation steps

3) Multiple Models: a powerful and yet simpler way for the expression of model segmentation, composition, chaining and ensemble, which includes Random Forest models

4) Is the model scorable? The "isScorable" flag was added as a way to flag models not destined for production deployment, but that are nonetheless an important part of the model building cycle

5) New built-in functions (for pre- and post-processing).

With this new release and version update, ADAPA and UPPI can be used not only for deployment and execution of predictive solutions, but also for data analysis and processing before model training.

If you have any questions about PMML 4.1 and all the features supported in our products, please make sure to contact us or feel free to check out our PMML 4.1 forum for detailed support information.

Thursday, August 9, 2012

Agile Deployment of Predictive Analytics on Hadoop: Faster Insights through Open Standards

This joint Datameer/Zementis presentation given at the 2012 Hadoop Summit outlines the benefits of the PMML standard as key element of data science best practices and its application in the context of distributed processing. In a live demonstration, we showcase how Datameer and the Zementis Universal PMML Plug-in (UPPI) take advantage of a highly parallel Hadoop architecture to efficiently derive predictions from very large volumes of data.

Watch it now on YouTube:

http://www.youtube.com/watch?v=r_g99-kP_BE

Session Abstract:

While Hadoop provides an excellent platform for data aggregation and general analytics, it also can provide the right platform for advanced predictive analytics against vast amounts of data, preferably with low latency and in real-time. This drives the business need for comprehensive solutions that combine the aspects of big data with an agile integration of data mining models. Facilitating this convergence is the Predictive Model Markup Language (PMML), a vendor-independent standard to represent and exchange data mining models that is supported by all major data mining vendors and open source tools (see figure below).

PMML is an XML-based language developed by the Data Mining Group (DMG) which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications. It provides applications a vendor-independent method of defining models so that proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. PMML allows users to develop models within one vendor's application, and use another vendors' applications to visualize, analyze, evaluate or otherwise use the models. Previously, this was very difficult, but with PMML, the exchange of models between compliant applications is now straightforward.

Wednesday, August 1, 2012

TOP 10 PMML Resources

We offer you a host of free on-line resources that allow you to expand your PMML skills. With these, you can learn how to best operationalize your predictive models, not only on your own infrastructure, but also on the cloud, in-database, or on Hadoop.

Your peers are already communicating predictive analytics with PMML. Learn how you too can benefit from it.

1) BOOK: We have recently published the 2nd edition of our PMML book. Entitled "PMML in Action", the book is available on amazon.com in paperback or in kindle format.

2) BLOGS: Another great resource for PMML related material is the predictive-analytics.info blog site. Besides highlighting the standard itself, this site also discusses the latest PMML support offered by producers and consumers.

3) VIDEOS: We have been busy producing informative webinars with our partners. You can find all our past webinars (including joint webinars with IBM SPSS and Revolution) by visiting our videos page.

4) ARTICLES: White-papers (including joint papers with KNIME and EMC), peer-reviewed articles and invited articles. Check them out! Visit the Zementis articles page.

5) TOOLS: Our tools page contains the description and link to the Transformations Generator, which allows you to graphically design your transformations and export them into PMML.

6) FORUMS: A place to ask questions and discuss model deployment. Explore and join our community forums.

7) EXAMPLES: In the DMG PMML Examples page, you not only can find typical predictive models such as neural networks and decision trees, but also association rules and random forest models.

8) PRESENTATION: Our PMML presentation at LinkedIn earlier this year to the ACM Data Mining Bay Area/SF group is available for on-demand viewing on YouTube. Presentation slides can be donwloaded HERE.

9) NEWSLETTER: The latest information on PMML and model deployment. Our Deploy! Newsletter is now on its 21st issue.

10) GROUP: Last, but not least, you are welcome to join the PMML discussion group in LinkedIn now with close to 3,000 members and growing fast.

Monday, July 16, 2012

Predicting the future ... in four parts

I recently finished writing a four-part article series about predictive analytics entitled Predicting the Future. The topic is near and dear to my heart, since I have been working on the field since my undergrad years back in Brazil (more than 20 years ago). And, lately, through my work with PMML, the Predictive Model Markup Language.

The four articles have just been published by IBM in their entirety in the developerWorks website together with a video in which I introduce each article.

The article themselves can be found here:

Predicting the future, Part 1: What is predictive analytics?
Predicting the future, Part 2: Predictive modeling techniques
Predicting the future, Part 3: Create a predictive solution
Predicting the future, Part 4: Put a predictive solution to work

And, if you are interested in learning about open-standards and predictive analytics, I would also recommend the following articles:

Predictive Analytics in Healthcare: The importance of open standards
What is PMML? Explore the Power of Predictive Analytics and Open Standards
Representing predictive solutions in PMML: Move from raw data to predictions

Enjoy!

Friday, July 13, 2012

Webcast: Predictive Analytics on Hadoop

UPDATE: Thanks for your interest in our joint webinar with Datameer: Predictive Analytics on Hadoop. If you were not able to attend or would like to watch it again at your own pace, just click HERE.

To extract value and insight from "Big Data", leading organizations increasingly leverage predictive analytics. By using statistical techniques that uncover important patterns present in historical data, companies are able to predict the future. In doing so, they become more precise, consistent and automated in everyday business decisions.

Please join the Datameer/Zementis webcast entitled Predictive Analytics on Hadoop: Gaining Faster Insights through Open Standards to learn to efficiently derive predictions from very large volumes of structured and unstructured data.

WHEN: Thursday, July 19, 2012, 10:00 am PT / 1:00 pm ET

Free registration

In this webinar, we showcase the technical capabilities of the Universal PMML Plug-in for Datameer, a solution that combines open standards and Hadoop to reduce complexity and accelerate time-to-market for predictive analytics in any industry and for any business application.

Leave this webinar knowing:

The benefits of the Predictive Model Markup Language (PMML) standard as a data science best practice for data mining
How to leverage predictive analytics in the context of big data
How to reduce the cost and complexity of predictive analytics

You can register HERE

Friday, June 29, 2012

Synergies and Value Proposition between the R Statistical Package and Zementis ADAPA

The ADAPA Decision Engine provides additional value to all your predictive assets. It is complimentary to R, since it extends your modeling environment into the IT operational domain.

ADAPA® is compatible with R through PMML, the Predictive Model Markup Language, which is the de facto standard to represent predictive models. PMML allows for models to be developed in one application and deployed on another, as long as both are PMML-compliant.

Immediate benefits of using ADAPA

Once a model built in R is saved as a PMML file, it can be directly uploaded in ADAPA. With ADAPA, you can:

Execute your models independently of R
Overcome memory and speed limitations imposed by R
Produce scores in real-time (using Web Services or Java API), on-demand, or batch-mode
Tap into all the advantages of cloud computing with ADAPA on Cloud (IBM SmartCloud or Amazon EC2)
Execute your models directly from Excel, by using the ADAPA Add-in for Excel
Benefit from using other PMML-compliant model development tools such as KNIME and RapidMiner
Deploy your models in minutes, not months (no need for recoding models into production)
Manage models via Web Services or a Web console
Upload one or many models into ADAPA at once
Use rules to implement model segmentation
Benefit from the seamless integration of business rules and predictive models through PMML

R PMML support

R offers support for PMML through the R PMML Package available in CRAN. Zementis is a proud contributor to the PMML package which was featured on an article we wrote for The R Journal (to download article, click HERE). The PMML package allows users to export a multitude of predictive models in PMML (for details, click HERE).

We have put together a video which shows how easy it is to export PMML models from R. It uses a simple R script to build a decision tree model using rpart and exports it to PMML using the PMML package. To read posting and watch video, click HERE.

A common industry standard

PMML allows for the de-coupling of two very important modeling phases: development and operational deployment. With PMML, scientists can focus on data analysis and model building using the best of breed model development tools, whereas operational deployment and actual use of the model is made extremely easy and simple with ADAPA.

For example, if a data mining scientist develops a decision tree model using R rpart package, all he/she needs to do to effectively deploy his/her model operationally is to save it as a PMML file and uploaded it in ADAPA. Once in ADAPA, the decision tree model is available for all to use, directly by business users and applications. The model may be used by a business user directly from within Excel to score customers for a marketing campaign.

By doing that, PMML allows for the model development environment to be used just for that, model development. Scoring, real-time or batch-mode from anywhere and at anytime, is handled by ADAPA.

Thursday, June 28, 2012

Learn how the IBM / Zementis Partnership simplifies Predictive Analytics

Benefiting from Interoperability

PMML, the Predictive Model Markup Language, has become the de-facto standard to represent not only predictive models, but also data pre- and post-processing. In so doing, it allows for the interchange of models among different tools and environments, avoiding proprietary issues and incompatibilities.

Model Building: IBM SPSS

IBM SPSS Modeler and IBM SPSS Statistics are extremely powerful data analysis and model building environments. This power is backed-up by their support of PMML. In either tool, predictive models as well as data transformations can be easily exported into PMML. IBM SPSS Statistics, for example, allows for automatic data preparation which can be exported into PMML and subsequently merged into the final PMML file for the entire solution.

View on-demand replay of the joint IBM SPSS/Zementis webcast focusing on the synergies between IBM SPSS and Zementis ADAPA (presented, May 14th, 2012).

Discover the benefits of executing your IBM SPSS models in ADAPA

Model Execution: ADAPA on the IBM SmartCloud

Once exported in PMML, your IBM SPSS models can be readily deployed in the Zementis ADAPA Scoring Engine, where they can be put to work immediately. To minimize total cost of ownership, model execution in ADAPA is now available as a service through the IBM SmartCloud.

View on-demand replay of the joint IBM/Zementis webcast focusing on predictive analytics deployment and execution on the IBM SmartCloud (presented, May 24th, 2012).

Review IBM developerWorks article about executing predictive solutions using ADAPA on the IBM SmartCloud.

Discover features and capabilities of ADAPA on SmartCloud

In-database Scoring: UPPI for IBM Netezza

Predictive solutions expressed in PMML can also be put to work inside the database with the Zementis Universal PMML Plug-in (UPPI) which is now available for IBM Netezza. Since UPPI transforms your complex predictive solutions into SQL functions, these can be readily used in any query and generate instant business decisions and insights where and when you need them.

Review the UPPI for IBM Netezza Product Data Sheet

Discover all the features and UPPI supported databases

Explore the IBM Netezza Analytics website

Thursday, June 7, 2012

IBM/Zementis Webinar: Have You Fully Tapped the Business Value of Predictive Analytics?

The analysis of "Big Data" to support your business objectives is a new to most companies. To extract value and insight from "Big Data", leading organizations increasingly leverage predictive analytics. By using statistical techniques that uncover important patterns present in historical data, companies are able to predict the future. In doing so, they become more precise, consistent and automated in everyday business decisions.

In this webinar, Ed Bottini, IBM’s Global SmartCloud Services Ecosystem Leader, and Dr. Michael Zeller, Zementis CEO, showcase the technical capabilities of the Zementis ADAPA Decision Engine on the IBM SmartCloud, a solution which combines open standards and cloud computing to reduce complexity and accelerate time-to-market for predictive analytics in any industry and for any business application.

Wednesday, June 6, 2012

Agile Deployment of Predictive Analytics on Hadoop: Faster Insights through Open Standards

Join us for the 2012 Hadoop Summit at the San Jose Convention Center on June 13-14.

Ulrich Rueckert, Data Scientist at Datameer and Michael Zeller, Zementis CEO, will be presenting on Wednesday, June 13, 1:30-2:10 pm.

Session Abstract:

While Hadoop provides an excellent platform for data aggregation and general analytics, it also can provide the right platform for advanced predictive analytics against vast amounts of data, preferably with low latency and in real-time. This drives the business need for comprehensive solutions that combine the aspects of big data with an agile integration of data mining models. Facilitating this convergence is the Predictive Model Markup Language (PMML), a vendor-independent standard to represent and exchange data mining models that is supported by all major data mining vendors and open source tools (see figure below).

PMML is an XML-based language developed by the Data Mining Group (DMG) which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications. It provides applications a vendor-independent method of defining models so that proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. PMML allows users to develop models within one vendor's application, and use another vendors' applications to visualize, analyze, evaluate or otherwise use the models. Previously, this was very difficult, but with PMML, the exchange of models between compliant applications is now straightforward.

This joint Datameer/Zementis presentation will outline the benefits of the PMML standard as key element of data science best practices and its application in the context of distributed processing. In a live demonstration, we will showcase how Datameer and the Zementis Universal PMML Plug-in take advantage of a highly parallel Hadoop architecture to efficiently derive predictions from very large volumes of data.

Session atendees will learn:

How to leverage predictive analytics in the context of big data
Introduction to the Predictive Model Markup Language (PMML) open standard for data mining
How to reduce cost and complexity of predictive analytics

Tuesday, May 29, 2012

Synergies and value proposition between IBM SPSS and Zementis ADAPA

The ADAPA Decision Engine provides additional value to all your predictive assets. It is complimentary to IBM SPSS Modeler and IBM SPSS Statistics, since it extends these modeling environments into the IT operational domain.

ADAPA is compatible with Modeler and Statistics through PMML, the Predictive Model Markup Language, which is the de facto standard to represent predictive models. PMML allows for models to be developed in one application and deployed on another, as long as both are PMML-compliant.

Immediate benefits of using ADAPA

Once a model built in any of the IBM SPSS tools is saved as a PMML file, it can be directly uploaded in ADAPA. With ADAPA, you can:

Execute your models independently of the IBM SPSS model development tool
Overcome any speed limitations
Dramatically lower your infrastructure cost
Tap into all the advantages of cloud computing with ADAPA on the Cloud (IBM SmartCloud or Amazon EC2)
Produce scores in real-time (using Web Services or Java API), on-demand, or batch-mode
Execute your models directly from Excel, by using the ADAPA Add-in for Excel
Benefit from using other PMML-compliant model development tools such as R, KNIME, or SAS
Deploy your models in minutes, not months (no need for recoding models into production)
Manage models via Web Services or a Web console
Upload one or many models into ADAPA at once
Use rules to implement model segmentation
Benefit from the seamless integration of business rules and predictive models

IBM SPSS PMML support

IBM SPSS offers vast support for PMML through IBM SPSS Modeler (formerly known as Clementine) and Statistics. Both systems allow users to export a multitude of models in PMML (for details, click HERE). IBM products such as DB2 Intelligent Miner and ILOG JRules also offer support for PMML.

A common industry standard

For example, if a data mining scientist develops a decision tree model using IBM SPSS Modeler, all he/she needs to do to effectively deploy his/her model operationally is to save it as a PMML file and uploaded it in ADAPA. Once in ADAPA, the decision tree model is available for all to use, directly by business users and applications. It may be used by a business user directly from within Excel to score customers for a marketing campaign.

By doing that, PMML allows for the model development environment to be used just for that, model development. Scoring, real-time or batch-mode from anywhere and at anytime, is handled by ADAPA.

Predictive Analytics at the Speed of Business

Decision Management Solutions/Zementis Webinar (presented, May 3rd, 2012)

Organizations are looking to maximize the value of their analytics investment. They need to accelerate the deployment process, reduce costs and get the analytic insight where they need it, when they need it. Increasingly organizations must deploy and manage many predictive models, use those models in real-time and integrate predictive analytics into a wide range of operational systems – in the cloud, on-premise, for Hadoop and in-database.

In this webinar you will learn how Decision Management and ADAPA – a proven approach and real-time infrastructure – transform passive models into operational success. This webinar is jointly presented by James Taylor, CEO of Decision Management Solutions and Dr. Alex Guazzelli, Vice President of Analytics at Zementis.

Presentation (on YouTube):

Demo (on YouTube):

Presentation and demo cover:

The current challenges in getting a return on your predictive analytic investment
The role of decision management in applying analytics when and where they are needed
The roles of predictive analytics and business rules technologies in decision management
How real-time infrastructure and rapid deployment maximizes analytic value
The importance of continuous monitoring and improvement in delivering ongoing results

Decision Management Solutions & Zementis are leaders in Decision Management, providing consulting services in Decision Management, business rules and predictive analytics as well as a flexible platform for deploying predictive analytics on premise, in the cloud, for Hadoop or in-database.

Download slides

Tuesday, December 11, 2012

Road to ADAPA

Universal PMML Scoring Engine - UPPI

Zementis Locations

References

Wednesday, November 14, 2012

Teradata and Zementis

Thursday, November 8, 2012

Resources:

Tuesday, November 6, 2012

Wednesday, October 31, 2012

Wednesday, October 17, 2012

Monday, October 8, 2012

Friday, October 5, 2012

Tuesday, October 2, 2012

Wednesday, September 26, 2012

Monday, September 24, 2012

Tuesday, September 18, 2012

Wednesday, September 12, 2012

Friday, August 31, 2012

Thursday, August 9, 2012

Session Abstract:

Wednesday, August 1, 2012

Monday, July 16, 2012

Friday, July 13, 2012

Friday, June 29, 2012

Immediate benefits of using ADAPA

R PMML support

A common industry standard

Thursday, June 28, 2012

Thursday, June 7, 2012

Wednesday, June 6, 2012

Tuesday, May 29, 2012

Immediate benefits of using ADAPA

IBM SPSS PMML support

A common industry standard

Welcome to the World of Predictive Analytics!