Predictive Analytics, Big Data, Hadoop, PMML: ADAPA

Showing posts with label ADAPA. Show all posts

Wednesday, May 28, 2014

Scoring Data from MySQL or SQL Server using KNIME and ADAPA

The video below shows the use of KNIME for handling data (reading data from a flat file and/or a database) as well as model building (data massaging and training a neural network). It also highlights how easy and straightforward it is to move a predictive model represented in PMML, the Predictive Model Markup Language, into the Zementis ADAPA Scoring Engine. ADAPA is then used for model deployment and scoring. PMML is the de facto standard to represent data mining models. It allows for predictive models to be moved between applications and systems without the need for model re-coding.

When training a model, scientists rely on historical data, but when using the model on a regular basis, the model is moved or deployed in production where it presented with new data. ADAPA provides a scalable and blazing fast scoring engine for models in production. And, although KNIME data mining nodes are typically used by scientists to build models, its database and REST nodes nodes can simply be used to create a flow for reading data from a database (MySQL, SQL Server, Oracle, ...) and passing it for scoring in ADAPA via its REST API.

Use-cases are:

Read data from a flat file, use KNIME for data pre-processing and building of a neural network model. Export the entire predictive workflow as a PMML file and then take this PMML file and upload and score it in ADAPA via its Admin Web Console.
Read data from a database (MySQL, SQLServer, Oracle, ...), build model in KNIME, export model as a PMML file and deploy it in ADAPA using its REST API. This use-case also shows new or testing data flowing from the database and into ADAPA for scoring via a sequence of KNIME nodes. The video also shows a case in which one can use KNIME nodes to simply read a PMML file produced in any PMML-compliant data mining tool (R, SAS EM, SPSS, ...), upload it in ADAPA using the REST API and score new data from MySQL in ADAPA also through the REST interface. Note that in this case, the model has already been trained and we are just using KNIME to deploy the existing PMML file in ADAPA for scoring.

Zementis and SAP HANA: Real-time Scoring for Big Data

The Zementis partnership with SAP is manifesting itself in a number of ways. Two weeks ago we were part of the SAP Big Data Bus parked outside Wells Fargo in San Francisco. This week, we would like to share with you three new developments.

1) ADAPA is not being offered at the SAP HANA Marketplace.

2) An interview with our CEO, Mike Zeller, was just featured by SAP on the SAP Blogs.

3) Zementis was again part of the SAP Big Data Bus and the "Big Data Theatre". This time, the bus was parked outside US Bank in Englewood, Colorado. We were engaged in a myriad of conversations with the many people that came through the bus about how ADAPA and SAP HANA work together to bring predictive analytics and real-time scoring to transactional data and millions of accounts, in any industry.

Visit the Zementis ADAPA for SAP HANA page for more details on the Zementis and SAP real-time solution for predictive analytics.

Friday, April 18, 2014

Real-time scoring of transactional data with ADAPA for SAP HANA

At the recent DEMO Enterprise 2014 conference, Zementis announced its participation in the SAP® Startup Focus program and launched ADAPA for SAP HANA, a standards-based predictive analytics scoring engine.

ADAPA for SAP HANA provides a simple plug-and-play platform to deploy the most complex predictive models and execute them in real-time, even in the context of Big Data.

In joining the SAP HANA Startup Focus program, Zementis set out to address two key challenges related to the operational deployment of predictive analytics: Agile deployment and scalable execution.

Transactional data has for years pushed the boundaries of predictive analytics. The financial industry, for example, has been using transactional data to detect fraud and abuse for decades with complex custom solutions. Real-time scoring is paramount for companies to be able to predict and prevent fraudulent activity before it actually happens. Likewise, the Internet of Things (IoT) demands effective processing of sensor data to employ predictive maintenance for detecting issues before they turn into device failures.

To solve these challenges, Zementis combined its ADAPA predictive analytics scoring engine with SAP HANA in a true plug-and-play platform which is universally applicable across all industries. ADAPA to serve scoring requests and execute predictive models, HANA to offload complex model preprocessing and computation of aggregates.

In this scenario, real-time execution critically depends on HANA serving complex data lookups and aggregate profile computation in a few milliseconds. In a high-volume environment, such aggregates or lookups may have to be computed over millions of transactions.

ADAPA provides scalable real-time scoring of the core model, plus agility for model deployment through the Predictive Model Markup Language (PMML) industry standard. Clients are able to instantly deploy existing predictive models from various data mining tools. For example, you can take a complex predictive model from SAS Enterprise Miner, export it in PMML format and simply make it available for real-time scoring in ADAPA for SAP HANA. The same process, of course, applies to most commercial tools, e.g. SAP Predictive Analysis, KXEN, IBM SPSS, as well as open source tools like R and KNIME.

The unique aspect of the Zementis / SAP platform is that it combines the benefits of an open standard for predictive analytics with the power of in-memory computing.

For more product details, please see http://zementis.com/saphana.htm

Wednesday, October 9, 2013

CIO Review: Zementis selected as one of the top 20 most promising big data companies

Selected by a distinguished panel comprising of CEOs, CIOs, VCs, industry analysts and the editorial board of CIO Review, Zementis has been named by CIO Review as one of the "Top 20 Most Promising Big Data Companies in 2013." Congratulations Zementis!

Read CIO Review - FULL ARTICLE

That comes as no surprise since Zementis is all about kicking down barriers for the fast deployment and execution of predictive solutions. By leveraging the PMML (Predictive Model Markup Language) standard, Zementis' products allow for predictive models built anywhere (IBM SPSS, KXEN, KNIME R, SAS, ...) to be deployed right-away on-site, in the cloud (Amazon, IBM, FICO), in-database (Pivotal/Greenplum, SAP Sybase IQ, IBM PureData for Analytics/Netezza, Teradata and Teradata Aster) or in Hadoop (Hive or Datameer).

Predictive analytics has been used for many years to learn patterns from historical data to literally predict the future. Well known techniques include neural networks, decision trees, and regression models. Although these techniques have been applied to a myriad of problems, the advent of big data, cost-efficient processing power, and open standards have propelled predictive analytics to new heights.

Big data involves large amounts of structured and unstructured data that are captured from people (e.g., on-line transactions, tweets, ... ) as well as sensors (e.g., GPS signals in mobile devices). With big data, companies can now start to assemble a 360 degree view of their customers and processes. Luckily, powerful and cost-efficient computing platforms such as the cloud and Hadoop are here to address the processing requirements imposed by the combination of big data and predictive analytics.

Creating predictive solutions is just part of the equation. Once built, they need to be transitioned to the operational environment where they are actually put to use. In the agile world we live today, the Predictive Model Markup Language (PMML) delivers the necessary representational power for solutions to be quickly and easily exchanged between systems, allowing for predictions to move at the speed of business.

Zementis' PMML-based products: ADAPA for real-time scoring and UPPI for big data scoring, are designed from the ground up to deliver the agility necessary for models to be easily deployed in a variety of platforms and to be put to work right-away.

Zementis ADAPA and UPPI kick-down the barriers for big data adoption!

Wednesday, October 2, 2013

R PMML Support: BetteR than EveR

How does it work? Simple! Once you build your model in R using any of the PMML supported model types, pass the model object as an input parameter to the pmml package as shown in the figure below.

pmml package

The pmml package offers export for a variety of model types, including:

   •   ksvm (kernlab): Support Vector Machines
   •   nnet: Neural Networks
   •   rpart: C&RT Decision Trees
   •   lm & glm (stats): Linear and Binary Logistic Regression Models
   •   arules: Association Rules
   •   kmeans and hclust: Clustering Models
   •   multinom (nnet): Multinomial Logistic Regression Models
   •   glm (stats): Generalized Linear Models for classification and regression with
         a wide variety of link functions
   •   randomForest: Random Forest Models for classification and regression
   •   coxph (survival): Cox Regression Models to calculate survival and stratified
         cumulative hazards
   •   naiveBayes (e1071): Naive Bayes Classifiers
   •   glmnet: Linear ElasticNet Regression Models
   •   ada: Stochastic Boosting (coming soon)
   •   svm (e1071): Support Vector Machines (coming soon)

The pmml package can also export data transformations built with the pmmlTransformations package (see below). It can also be used to merge two distinct PMML files into one. For example, if transformations and model were saved into separate PMML files, it can combine both files, as described in Chapter 5 of the PMML book - PMML in Action

Data Transformations - the R pmmlTransformations Package

The pmmlTransformations package transforms data and, when used in conjunction with the pmml package, allows for data transformations to be exported together with the predictive model in a single PMML file. Transformations currently supported are:

   •   Min-max normalization
   •   Z-score normalization
   •   Dummy-fication of categorical variables
   •   Value Mapping
   •   Variable renaming

To learn more about this package, check out the paper we presented at the KDD 2013 PMML Workshop.

Tuesday, September 10, 2013

Predictive model deployment with PMML

Model deployment used to be a big task. Predictive models, once built, needed to be re-coded into production to be able to score new data. This process was prone to errors and could easily take up to six months. Re-coding of predictive models has no place in the big data era we live in. Since data is changing rapidly, model deployment needs to be instantaneous and error-free.

PMML, the Predictive Model Markup Language, is the standard to represent predictive models. Given that PMML can be produced by all the top commercial and open-source data mining tools (e.g., FICO Model Builder, SAS EM, IBM SPSS, R, KNIME, ...), a predictive model can be easily moved into the production environment once it is represented as a PMML file.

Zementis offers ADAPA for real-time scoring and UPPI for big data scoring which make the entire model deployment process a no-brainer. Given that ADAPA and UPPI are universal PMML consumers (accept any version of PMML produced by any PMML-compliant tool), they can make predictive models instantly available for execution inside the production environment.

Check out the Zementis website for details.

Tuesday, May 7, 2013

The Zementis Partnership with FICO

Stuart Wells, FICO CTO, announced the strategic partnership between Zementis and FICO at FICO World on May 2, 2013. FICO clients will now benefit from the outstanding Zementis scoring technology.

How? The Zementis ADAPA scoring engine provides a highly scalable framework to deploy, integrate, and execute complex data mining and predictive models based on the PMML standard. Models built in most commercial and open source data mining tools, such as FICO Model Builder or R, can now instantly be deployed in the FICO Anaytic Cloud.

Customers, application developers and FICO partners will be able to extract value and insight from their predictive models and data immediately, using ADAPA and PMML. This will result in quicker time to innovation and value on their analytic applications.

Read the press release!

Predictive Analytics Deployment

Zementis offers software solutions that enable scalable, real-time execution of predictive analytics across a variety of platforms based on the PMML standard. These include:

ADAPA Scoring Engine: Our solution for real-time scoring. ADAPA is available for on-site deployment as a traditional license or as a service in the Amazon Elastic Compute Cloud (EC2) and IBM SmartCloud Enterprise. And now, with our FICO partnership, ADAPA will also be available in the FICO Analytic Cloud.

UPPI, the Universal PMML Plug-in: The leading solution for Big Data, UPPI provides scoring in-database and for Hadoop. It is available for EMC Greenplum, IBM Netezza, SAP Sybase IQ, Teradata/Aster as well as Hadoop/Hive and Datameer.

Friday, April 12, 2013

The Zementis Partnership with Infocom in Japan

It is our pleasure to announce a strategic partnership with Infocom. If you missed out on our press release, here is the headline:

Zementis and Infocom partner to deliver predictive analytic solutions in Japan.

Dedicated to the Japanese market, Infocom combines strong expertise in data mining and predictive analytics with extensive delivery and consulting capabilities.

Zementis offers software solutions that enable scalable, real-time execution of predictive analytics across a variety of platforms based on the PMML standard. These include the ADAPA Scoring Engine available for on-site deployment or in the cloud, and UPPI, the Universal PMML Plug-in for in-database scoring and Hadoop (available for IBM Netezza, Teradata/Aster, EMC Greenplum, SAP Sybase IQ as well as Hadoop and Datameer).

Infocom will market, distribute and support Zementis's predictive analytics software in Japan.

To take a look at the press release, click HERE.

Additional Online Resources

Visit the Zementis resources pages for videos and articles on our products and PMML
Follow @Zementis on Twitter
Join the PMML discussion forum on LinkedIn

Thursday, March 7, 2013

Making the case for PMML and ADAPA

If you are not familiar with PMML, the Predictive Model Markup Language, you may be wondering what all the fuss is about ...

PMML is the de facto standard to represent data mining and predictive analytic solutions. With PMML, one can easily share a predictive solution among PMML-compliant applications and systems For example, you can build your model in R, export it in PMML, and use ADAPA, the Zementis Scoring Engine, to deploy it in production.

Many data mining models are a one-time affair. You use historical data to build the model and use it to analyze ... historical data. Wait! That sounds more like descriptive analytics, not predictive analytics. Well, that is sort of true. To be truly predictive, a data mining model needs to be applied to new data. These are the models that need to be operationally deployed and, from my point of view, these are the solutions that are truly revolutionizing the way we do business and live in the Big Data world.

If you want then to use your data mining model to make predictions when presented with new data, it needs to be a dynamic asset. It cannot be static. You need to be able to build it and instantly put it to use. And, that's where PMML and ADAPA come in handy.

Obviously, a few data mining tools try to lock you in. You happily build the model using tool A, just to realize that you need the same tool to execute it. In this case, you are missing out. Here are some of the benefits of moving your predictive model to ADAPA:

Overcome speed/memory limitations
Dramatically lower your infrastructure cost
Tap into all the advantages of cloud computing with ADAPA on the Cloud (IBM SmartCloud or Amazon EC2)
Produce scores in real-time (using Web Services or Java API), on-demand, or batch-mode
Execute your models directly from Excel, by using the ADAPA Add-in for Excel
Benefit from using a set of PMML-compliant model development tools (best of breed)
Deploy your models in minutes
Manage models via Web Services or a Web console
Upload one or many models into ADAPA at once
Benefit from the seamless integration of business rules and predictive models (yes, for those who need it, ADAPA comes with a business rules engine)

PMML and ADAPA allow you to use best of breed tools (not the same old tool) for the job at hand. Also, you can leverage the expertise from a diverse group of data scientists. That means, not all your data scientists need to be experts on a single tool. They can use different tools that share one thing in common, the PMML standard. And, once represented in PMML, models can be easily understood by all team members. PMML allows for transparency and, in doing so, fosters best practices.

Why not benefit from: 1) an open standard to represent data mining models; and 2) a proven scoring engine that consumes any version of PMML and make it available for execution right away, in real-time?

Keep also in mind that ADAPA's sister product, the Universal PMML Plug-in (UPPI), allows you to move the same PMML file in-database or Hadoop. UPPI is currently available for EMC Greenplum, SAP Sybase IQ, IBM Netezza, and Teradata/Aster. With UPPI for in-database scoring, there is no need to move your data outside the database. Data and models reside inside it and so there is minimal data movement and maximum scoring speed. UPPI is also available for Datameer and will soon be available for Hadoop/Hive.

Making a model operational in minutes has never been easier! And, it is all because of PMML and scoring tools such as ADAPA and UPPI.

Tuesday, December 11, 2012

Spotlight on Zementis

Zementis, Inc. is a company that makes software for the operational deployment and integration of predictive analytics and data-mining solutions. Its main products are the ADAPA Decision Engine, a platform for statistics and data processing, and the Universal PMML Plug-in for Hadoop and in-database scoring.

The name Zementis, symbolizing "concrete thoughts", is derived from the German word Zement (cement, concrete) and the Latin word Mentis (thought, intellect) and relates to the company's core competence in machine learning and AI.

Road to ADAPA

Founded in 2004 with the goal of providing predictive analytics to the marketplace, Zementis is composed of two main divisions, analytics and engineering. Although it started as a company focused on building predictive models, Zementis scientists soon realized that their models needed a platform in which they could be easily deployed and managed. From this need, the ADAPA Decision Engine came to be.

ADAPA initially supported only neural networks, but it soon became a platform for the deployment of a myriad of statistical techniques as well as data processing (download the ADAPA Product Datasheet for a list of supported techniques). From its inception, ADAPA has been based on open-standards, including PMML, the Predictive Model Markup Language. As a member of the Data Mining Group (DMG), the committee defining PMML, Zementis has helped shaped the standard as it becomes the necessary vehicle for the sharing of predictive solutions between applications.

In 2008, ADAPA was launched as a service on the Amazon Elastic Compute Cloud (Amazon EC2) and is currently being used worldwide by companies and individuals who want to execute their predictive models and decision logic.

In 2012, ADAPA cloud offering was extended to the IBM SmartCloud. In this way, IBM provides companies around the world predictive decisions when and where they are needed.

Universal PMML Scoring Engine - UPPI

Building on the heritage of its ADAPA Decision Engine, Zementis launched the Universal PMML Plug-in (UPPI), a highly optimized, in-database scoring engine for predictive models, fully supporting the PMML standard. With PMML, UPPI delivers a wide range of predictive analytics for high performance scoring. It shortens time to market for predictive models and empowers users through instant deployment of predictive models. UPPI is available for the following DB platforms:

The Universal PMML Scoring Engine is also available for Datameer for scoring in Hadoop.

Zementis Locations

Zementis HQ is located in San Diego in California. It also has an office in Hong Kong for servicing clients in the Asia-Pacific region.

References

R. Nisbet, J. Elder, and G. Miner. Handbook of Statistical Analysis and Data Mining Applications. Academic Press, 2009.

A. Guazzelli, M. Zeller, W. Lin, and G. Williams. PMML: An Open Standard for Sharing Models. The R Journal, Volume 1/1, May 2009.

A. Guazzelli, K. Stathatos, M. Zeller. Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing. The ACM SIGKDD Explorations Newsletter, Volume 11/1, July 2009.

A. Guazzelli, T. Jena, W. Lin, M. Zeller. The PMML Path Towards True Interoperability in Data Mining. In Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2011.

A. Guazzelli, W. Lin, T. Jena (2012). PMML in Action (2nd Edition): Unleashing the Power of Open Standards for Data Mining and Predictive Analytics. CreateSpace.

Thursday, November 8, 2012

Model Deployment with PMML, the Predictive Model Markup Language

The idea behind this demo is to show you how easy it is to operationally deploy a predictive solution once it is represented in PMML, the Predictive Model Markup Language.

As a model building environment, I use KNIME to generate a neural network model for predicting customer churn. Once data pre-processing and model are represented in PMML, I go on to deploy it in the Amazon Cloud using the ADAPA Scoring Engine and on top of Hadoop using the Universal PMML Plug-in (UPPI) for Datameer. So, the very same model is readily available for execution in two very distinct Big Data platforms: cloud and Hadoop.

The easy of model deployment and interoperability between platforms is the power of PMML, the de facto standard for predictive analytics and data mining models.

Resources:

Download the KNIME workflow used to generate a sample neural network for predicting churn
Download the PMML file created during the demo

Tuesday, November 6, 2012

Big Data and Real-Time Scoring with ADAPA and the Universal PMML Plug-in

PMML, the Predictive Model Markup Language, allows for predictive models to be easily moved into production and operationally deployed on-site, in the cloud, in-database or Hadoop. Zementis offers a range of products that make possible the deployment of predictive solutions and data mining models built in IBM SPSS, SAS, StatSoft STATISTICA, KNIME, SAP KXEN, R, etc. Our products include the ADAPA Scoring Engine and the Universal PMML Plug-in (UPPI).

SOLUTIONS FOR REAL-TIME SCORING AND BIG DATA

ADAPA, the Babylonian god of wisdom, is the first PMML-based, real-time predictive decisioning engine available on the market, and the first scoring engine accessible on the Amazon Cloud and IBM SmartCloud as a service. ADAPA on the Cloud combines the benefits of Software as a Service (SaaS) with the scalability of cloud computing. ADAPA is also available as a traditional software license for deployment on site.

As even the god of wisdom knows, not all analytic tasks are born the same. If one is confronted with massive volumes of data that need to be scored on a regular basis, in-database scoring sounds like the logical thing to do. In all likelihood, the data in these cases is already stored in a database and, with in-database scoring, there is no data movement. Data and models reside together; hence, scores and predictions flow at an accelerated pace. ADAPA’s sister product, the Universal PMML Plug-in (UPPI), is the Zementis solution for Hadoop and in-database scoring. UPPI is available for the IBM Netezza appliance, SAP Sybase IQ, and EMC Greenplum/Pivotal, Teradata and Teradata Aster. It is also available for Hadoop/Datameer.

BROAD SUPPORT FOR PREDICTIVE ANALYTICS AND PMML

ADAPA and UPPI consume model files that conform to the PMML standard, version 2.0 through 4.2. If your model development environment exports an older version of PMML, our products will automatically convert your file into a 4.2 compliant format.

Our products support an extensive collection of statistical and data mining algorithms. These include:

Neural Networks (Back-Propagation, Radial-Basis Function, and Neural-Gas)
Regression Models (Linear, Polynomial, and Logistic)
General Regression Models (General Linear, Ordinal Multinomial, Generalized Linear, Cox)
Support Vector Machines (for regression and multi-class and binary classification)
Decision Trees (for classification and regression)
Scorecards (including support for reason codes and complex attributes)
Association Rules
Ruleset Models (flat Decision Trees)
Clustering Models (Distribution-Based, Center-Based, and 2-Step Clustering)
Naive Bayes Classifiers
Multiple Models (model composition, chaining, segmentation, and ensemble - including Random Forest Models and Stochastic Boosting)

A myriad of functions for implementing data pre- and post-processing are also supported, including:

Text Mining (introduced in PMML 4.2)
Regular Expressions
Value Mapping
Discretization
Normalization
Scaling
Logical and Arithmetic Operators
Conditional Logic
Built-in Functions
Lookup Tables
Business Decisions and Thresholds
Custom Functions ... and much much more

Contact us today!

Visit us on the web: www.zementis.com

Or send us an e-mail at info@zementis.com

Wednesday, October 31, 2012

When Big Data and Predictive Analytics Collide

Big Data is usually defined in terms of Volume, Variety and Velocity (the so called 3 Vs). Volume implies breadth and depth, while variety is simply the nature of the beast: on-line transactions, tweets, text, video, sound, ... Velocity, on the other hand, implies that data is being produced amazingly fast (according to IBM, 90% of the data that exists today was generated in the last 2 years), but that it also gets old pretty fast. In fact, a few data varieties tend to age quicker than others.

To be able to tackle Big Data, systems and platforms need to be robust, scalable, and agile.

It is in this context that IntelliFest 2012 came to be. The conference theme this year was "Intelligence in the Cloud", exploring the use of applied AI in cloud computing, mobile apps, Big Data, and many other application areas. Among several amazing speakers at Intellifest were Stephen Grossberg from Boston University, Rajat Monga from Google, Carlos Serrano-Morales from Sparkling Logic, Paul Vincent from TIBCO, and Alex Guazzelli from Zementis.

Dr. Alex Guazzelli's talk on Big Data, Predictive Analytics, and PMML is now available for on-demand viewing on YouTube. The abstract follows below, together with several resources including the presentation slides and files used in the live demo.

Abstract:

Predictive analytics has been used for many years to learn patterns from historical data to literally predict the future. Well known techniques include neural networks, decision trees, and regression models. Although these techniques have been applied to a myriad of problems, the advent of big data, cost-efficient processing power, and open standards have propelled predictive analytics to new heights.

Big data involves large amounts of structured and unstructured data that are captured from people (e.g., on-line transactions, tweets, ... ) as well as sensors (e.g., GPS signals in mobile devices). With big data, companies can now start to assemble a 360 degree view of their customers and processes. Luckily, powerful and cost-efficient computing platforms such as the cloud and Hadoop are here to address the processing requirements imposed by the combination of big data and predictive analytics.

But, creating predictive solutions is just part of the equation. Once built, they need to be transitioned to the operational environment where they are actually put to use. In the agile world we live today, the Predictive Model Markup Language (PMML) delivers the necessary representational power for solutions to be quickly and easily exchanged between systems, allowing for predictions to move at the speed of business.

This talk will give an overview of the colliding worlds of big data and predictive analytics. It will do that by delving into the technologies and tools available in the market today that allow us to truly benefit from the barrage of data we are gathering at an ever-increasing pace.

Resources:

Download the presentation slides
Download the KNIME workflow used to generate a sample neural network for predicting churn
Download the PMML file created during the demo

Friday, October 5, 2012

Seamless Integration of Predictive Analytics and Business Rules

Operational deployment of predictive solutions includes exporting the data mining models you built in SAS, IBM SPSS, STASTISTICA, KNIME, R, ... into PMML, the Predictive Model Markup Language. Once in PMML standard, these models can be easily moved into production: on-site, in the cloud, Hadoop or in-database. Zementis offers a range of products that make this possible. These include the ADAPA Decisioning Engine and the Universal PMML Plug-in. Besides providing a predictive analytics engine, ADAPA also encapsulates a rules engine which allows for predictive models to be seamlessly integrated with business rules.

In this demo, we show a pre-qualification app that uses predictive models and rules to analyze the risk of mortgage default on loan applications. An application is accepted or referred for a variety of loan products depending on its perceived risk. ADAPA is the engine driving this application in the back-end.

Once logged in we use the ADAPA Web to download the mortgage solution files which are used throughout the demo. Predictive models expressed in PMML format are uploaded and verified in ADAPA along with rulesets expressed in tabular format. The ADAPA Web Console is used for managing predictive models, rulesets, and resource files as well as for batch-scoring. Real-time scoring is obtained via web-services or the Java API.

Finally, we show how the ADAPA Add-in for Excel is used to score data directly from within Excel. This part of the demo features the scoring of loan and tax data as well as the visualization of results via dashboards.

Wednesday, September 12, 2012

Predictive model deployment and execution made easy with PMML

Developed by the Data Mining Group (DMG), an independent, vendor led committee, PMML provides an open standard for representing data mining models. In this way, models can easily be shared between different applications avoiding proprietary issues and incompatibilities. Currently, all major commercial and open source data mining tools support PMML. These include IBM/SPSS, SAS, KXEN, TIBCO, STATISTICA, Microstrategy, R, KNIME, and RapidMiner (for a list of PMML-compliant tools, see of PMML-powered tools at DMG.org).

PMML is an XML-based language which follows a very intuitive structure to describe data pre- and post-processing as well as predictive algorithms. Not only does PMML represent a wide range of statistical techniques, but it can also be used to represent input data as well as the data transformations necessary to transform raw data into meaningful features.

PMML Conversion

Given that a tool may generate an older version of PMML (earlier than its latests), Zementis has worked out a way to convert older versions of PMML to its latest, version 4.1. This conversion proces is also used to validate a data mining model against the PMML specification for versions 2.0, 2.1, 3.0, 3.1, 3.2, 4.0 and 4.1. If validation is not successful, the conversion process gives back a file containing explanations for why the validation failed as comments embedded in the PMML file.

Before actual conversion takes place, the validation phase needs to be successful, i.e. the model file needs to conform to the PMML specification as published by the DMG (for any of the older PMML versions listed above). For known PMML issues (from a variety of sources/vendors), the conversion process will actually correct the model file so that it can be converted appropriately.

The ADAPA Decision Engine

If you are using the ADAPA Decision Engine (or any of our scoring products), the conversion process described above is automatically executed every time a PMML file is uploaded. By doing that, ADAPA understands PMML files generated by different vendors in all the different PMML versions. Besides syntactic validation, ADAPA also validates PMML from a semantic perspective.

And so, once a model is successfully uploaded in ADAPA, it is syntactically and semantically sound. For more details, click HERE.

You can benefit from ADAPA today by signing up for your private ADAPA instance on the Amazon Cloud or on the IBM SmartCloud. You can also sign up for the ADAPA free trial.

Start executing your models right now!

Friday, August 31, 2012

Zementis is proud to announce PMML 4.1 support

PMML 4.1, the latest version of the Predictive Model Markup Language, is loaded with new and powerful features.

Zementis is proud to announce support for PMML 4.1 throughout its scoring products, including:

ADAPA on Site

ADAPA on the Cloud (Amazon and IBM SmartCloud)

UPPI for in-database scoring (IBM Netezza, SAP Sybase IQ, EMC Greenplum)

UPPI for Hadoop (Datameer).

We have also updated our PMML conversion process so that it now converts PMML files from older versions to version 4.1. In this way, every time a PMML file is presented to ADAPA or UPPI, it is automatically converted to PMML 4.1.

Our support for PMML 4.1 includes:

1) Scorecards (including reason or adverse codes and point allocation for complex attributes)

2) Post-processing: you can now transform scores into business decisions as well as output generic data manipulation steps

3) Multiple Models: a powerful and yet simpler way for the expression of model segmentation, composition, chaining and ensemble, which includes Random Forest models

4) Is the model scorable? The "isScorable" flag was added as a way to flag models not destined for production deployment, but that are nonetheless an important part of the model building cycle

5) New built-in functions (for pre- and post-processing).

With this new release and version update, ADAPA and UPPI can be used not only for deployment and execution of predictive solutions, but also for data analysis and processing before model training.

If you have any questions about PMML 4.1 and all the features supported in our products, please make sure to contact us or feel free to check out our PMML 4.1 forum for detailed support information.