Predictive Analytics, Big Data, Hadoop, PMML: Universal PMML Plug-in

Showing posts with label Universal PMML Plug-in. Show all posts

Wednesday, January 8, 2014

Zementis and Teradata Announce In-database Scoring for Big Data

As a result of its partnership with Teradata, Zementis is excited to announce the availability of the Universal PMML Plug-in (UPPI) for Teradata analytic platforms. It does not get easier than this! Simply deploy your predictive models built in R, IBM SPSS, SAS EM, ... and score your big data, directly in-database, where it resides.

The Zementis Universal PMML Plug-in (UPPI) enables the execution of standards-based predictive analytics directly within the Teradata Unified Data Architecture™. Users can now easily deploy predictive models built in R, IBM SPSS, SAS EM and other popular analytic tools on Aster and/or Teradata to achieve scale. The bridge between these systems is PMML, the Predictive Model Markup Language standard. It allows for models to be instantly moved from the scientist's desktop to the database where they will be executed.

As described by Teradata's Chris Twogood, VP for Product and Services Marketing, "by partnering with Zementis, we are able to offer high performance, enterprise-level predictive analytics scoring for the major analytics tools that support PMML. With Zementis and PMML, we are eliminating the need for customers to recode predictive analytic models in order to deploy them within our database. In turn, this enables an analyst to reduce the time to insight required in most businesses today."

Available for Teradata and Teradata Aster databases, UPPI leverages the massively parallel databases as a scalable, high-performance, scoring engine that easily processes through petabyte-scale data volumes. UPPI takes full advantage of the high-performance data warehouse with its massively parallel processing capabilities for rapid execution of standards-based predictive analytics based on the PMML standard.

Models built in most commercial and open source data mining tools can now instantly be deployed in Teradata or Aster. The net result is the ability to leverage the power of standards-based predictive analytics on a massive scale, right where the data resides.

Read the full press release!

Friday, November 8, 2013

Big Data Scoring with UPPI for IBM Pure Data (for Analytics and Hadoop)

In-database scoring is one of the most straightforward ways to gain insights from Big Data. It is no surprise then that the Zementis Universal PMML Plug-in (UPPI) is now being offered for a variety of database platforms. These include IBM Pure Data for Analytics (Netezza), Pivotal/Greenplum, SAP Sybase IQ, Teradata and Teradata Aster. Zementis also offers UPPI for Hadoop/Hive, including IBM Pure Data for Hadoop as well as InfoSphere BigInsights. It is in this context that we travelled to Vegas to attend the IBM Information on Demand (IOD) Conference.

I must say, I am always impressed by the IBM universe of products and tools that are being offered for analytics (descriptive and predictive) as well as Big Data in general. Zementis had a booth inside the Pure Data exhibit area and next to all the Pure Data appliances. As you can imagine, traffic was solid not just because of all the blinking lights but also because the conference itself attracts a lot of people. I believe there were 14 thousand attendants this year.

Why in-database scoring? Well, simple. Not all analytic tasks are born the same. If one is confronted with massive volumes of data that need to be scored on a regular basis, in-database scoring sounds like the logical thing to do. In all likelihood, the data in this case is already stored in a database and, with in-database scoring, there is no data movement. Data and models reside together hence scores and predictions flow on an accelerated pace.

Why scoring in Hadoop? Big Data and Hadoop are somewhat synonymous terms these days, since the latter offers an important technological platform to tackle the challenge of analyzing large volumes of data. In fact, predictive analytics is paramount for companies to extract value and insight from such data. By offering the Universal PMML Plug-in (UPPI) for Hadoop, Zementis takes a big step in making its technology available for companies around the globe to easily deploy, execute, and integrate scalable standards-based predictive analytics on a massive parallel scale through the use of Hive, a data warehouse system for Hadoop.

UPPI brings together essential technologies, offering the best combination of open standards and scalability for the application of predictive analytics. It fully supports the Predictive Model Markup Language (PMML), the de facto standard for data mining applications, which enables the integration of predictive models from IBM/SPSS, SAS, R, and many more.

Thursday, March 7, 2013

Making the case for PMML and ADAPA

If you are not familiar with PMML, the Predictive Model Markup Language, you may be wondering what all the fuss is about ...

PMML is the de facto standard to represent data mining and predictive analytic solutions. With PMML, one can easily share a predictive solution among PMML-compliant applications and systems For example, you can build your model in R, export it in PMML, and use ADAPA, the Zementis Scoring Engine, to deploy it in production.

Many data mining models are a one-time affair. You use historical data to build the model and use it to analyze ... historical data. Wait! That sounds more like descriptive analytics, not predictive analytics. Well, that is sort of true. To be truly predictive, a data mining model needs to be applied to new data. These are the models that need to be operationally deployed and, from my point of view, these are the solutions that are truly revolutionizing the way we do business and live in the Big Data world.

If you want then to use your data mining model to make predictions when presented with new data, it needs to be a dynamic asset. It cannot be static. You need to be able to build it and instantly put it to use. And, that's where PMML and ADAPA come in handy.

Obviously, a few data mining tools try to lock you in. You happily build the model using tool A, just to realize that you need the same tool to execute it. In this case, you are missing out. Here are some of the benefits of moving your predictive model to ADAPA:

Overcome speed/memory limitations
Dramatically lower your infrastructure cost
Tap into all the advantages of cloud computing with ADAPA on the Cloud (IBM SmartCloud or Amazon EC2)
Produce scores in real-time (using Web Services or Java API), on-demand, or batch-mode
Execute your models directly from Excel, by using the ADAPA Add-in for Excel
Benefit from using a set of PMML-compliant model development tools (best of breed)
Deploy your models in minutes
Manage models via Web Services or a Web console
Upload one or many models into ADAPA at once
Benefit from the seamless integration of business rules and predictive models (yes, for those who need it, ADAPA comes with a business rules engine)

PMML and ADAPA allow you to use best of breed tools (not the same old tool) for the job at hand. Also, you can leverage the expertise from a diverse group of data scientists. That means, not all your data scientists need to be experts on a single tool. They can use different tools that share one thing in common, the PMML standard. And, once represented in PMML, models can be easily understood by all team members. PMML allows for transparency and, in doing so, fosters best practices.

Why not benefit from: 1) an open standard to represent data mining models; and 2) a proven scoring engine that consumes any version of PMML and make it available for execution right away, in real-time?

Keep also in mind that ADAPA's sister product, the Universal PMML Plug-in (UPPI), allows you to move the same PMML file in-database or Hadoop. UPPI is currently available for EMC Greenplum, SAP Sybase IQ, IBM Netezza, and Teradata/Aster. With UPPI for in-database scoring, there is no need to move your data outside the database. Data and models reside inside it and so there is minimal data movement and maximum scoring speed. UPPI is also available for Datameer and will soon be available for Hadoop/Hive.

Making a model operational in minutes has never been easier! And, it is all because of PMML and scoring tools such as ADAPA and UPPI.

Monday, February 25, 2013

The Zementis Partnership with Teradata

The partnership between Zementis and Teradata allows customers with a variety of data mining tools to efficiently deploy predictive models based on the Predictive Model Markup Language (PMML) standard. Focused on Big Data applications, the Universal PMML Plug-in (UPPI) for Teradata enables scalable execution of standards-based predictive analytics directly within the Teradata data warehouse.

To read more about the benefits of running your predictive solutions inside Teradata and Teradata Aster, please visit:

http://www.teradata.com/templates/Partners/PartnerProfile.aspx?id=12884902321

PMML Scoring

Zementis offers a range of products that make possible the deployment of predictive solutions and data mining models built in all the top commercial and open-source data mining vendors. Our products include the ADAPA Scoring Engine for real-time scoring and UPPI, which is currently available for a host of database platforms as well as Hadoop/Datameer. For a list of available platforms, please visit our in-database products page.

Rationale

Not all analytic tasks are born the same. If one is confronted with massive volumes of data that need to be scored on a regular basis, in-database scoring sounds like the logical thing to do. In all likelihood, the data in this case is already stored in a database and, with in-database scoring, there is no data movement. Data and models reside together hence scores and predictions flow on an accelerated pace

Friday, August 31, 2012

Zementis is proud to announce PMML 4.1 support

PMML 4.1, the latest version of the Predictive Model Markup Language, is loaded with new and powerful features.

Zementis is proud to announce support for PMML 4.1 throughout its scoring products, including:

ADAPA on Site

ADAPA on the Cloud (Amazon and IBM SmartCloud)

UPPI for in-database scoring (IBM Netezza, SAP Sybase IQ, EMC Greenplum)

UPPI for Hadoop (Datameer).

We have also updated our PMML conversion process so that it now converts PMML files from older versions to version 4.1. In this way, every time a PMML file is presented to ADAPA or UPPI, it is automatically converted to PMML 4.1.

Our support for PMML 4.1 includes:

1) Scorecards (including reason or adverse codes and point allocation for complex attributes)

2) Post-processing: you can now transform scores into business decisions as well as output generic data manipulation steps

3) Multiple Models: a powerful and yet simpler way for the expression of model segmentation, composition, chaining and ensemble, which includes Random Forest models

4) Is the model scorable? The "isScorable" flag was added as a way to flag models not destined for production deployment, but that are nonetheless an important part of the model building cycle

5) New built-in functions (for pre- and post-processing).

With this new release and version update, ADAPA and UPPI can be used not only for deployment and execution of predictive solutions, but also for data analysis and processing before model training.

If you have any questions about PMML 4.1 and all the features supported in our products, please make sure to contact us or feel free to check out our PMML 4.1 forum for detailed support information.

Thursday, August 9, 2012

Agile Deployment of Predictive Analytics on Hadoop: Faster Insights through Open Standards

This joint Datameer/Zementis presentation given at the 2012 Hadoop Summit outlines the benefits of the PMML standard as key element of data science best practices and its application in the context of distributed processing. In a live demonstration, we showcase how Datameer and the Zementis Universal PMML Plug-in (UPPI) take advantage of a highly parallel Hadoop architecture to efficiently derive predictions from very large volumes of data.

Watch it now on YouTube:

http://www.youtube.com/watch?v=r_g99-kP_BE

Session Abstract:

While Hadoop provides an excellent platform for data aggregation and general analytics, it also can provide the right platform for advanced predictive analytics against vast amounts of data, preferably with low latency and in real-time. This drives the business need for comprehensive solutions that combine the aspects of big data with an agile integration of data mining models. Facilitating this convergence is the Predictive Model Markup Language (PMML), a vendor-independent standard to represent and exchange data mining models that is supported by all major data mining vendors and open source tools (see figure below).

PMML is an XML-based language developed by the Data Mining Group (DMG) which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications. It provides applications a vendor-independent method of defining models so that proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. PMML allows users to develop models within one vendor's application, and use another vendors' applications to visualize, analyze, evaluate or otherwise use the models. Previously, this was very difficult, but with PMML, the exchange of models between compliant applications is now straightforward.

Thursday, June 28, 2012

Learn how the IBM / Zementis Partnership simplifies Predictive Analytics

Benefiting from Interoperability

PMML, the Predictive Model Markup Language, has become the de-facto standard to represent not only predictive models, but also data pre- and post-processing. In so doing, it allows for the interchange of models among different tools and environments, avoiding proprietary issues and incompatibilities.

Model Building: IBM SPSS

IBM SPSS Modeler and IBM SPSS Statistics are extremely powerful data analysis and model building environments. This power is backed-up by their support of PMML. In either tool, predictive models as well as data transformations can be easily exported into PMML. IBM SPSS Statistics, for example, allows for automatic data preparation which can be exported into PMML and subsequently merged into the final PMML file for the entire solution.

View on-demand replay of the joint IBM SPSS/Zementis webcast focusing on the synergies between IBM SPSS and Zementis ADAPA (presented, May 14th, 2012).

Discover the benefits of executing your IBM SPSS models in ADAPA

Model Execution: ADAPA on the IBM SmartCloud

Once exported in PMML, your IBM SPSS models can be readily deployed in the Zementis ADAPA Scoring Engine, where they can be put to work immediately. To minimize total cost of ownership, model execution in ADAPA is now available as a service through the IBM SmartCloud.

View on-demand replay of the joint IBM/Zementis webcast focusing on predictive analytics deployment and execution on the IBM SmartCloud (presented, May 24th, 2012).

Review IBM developerWorks article about executing predictive solutions using ADAPA on the IBM SmartCloud.

Discover features and capabilities of ADAPA on SmartCloud

In-database Scoring: UPPI for IBM Netezza

Predictive solutions expressed in PMML can also be put to work inside the database with the Zementis Universal PMML Plug-in (UPPI) which is now available for IBM Netezza. Since UPPI transforms your complex predictive solutions into SQL functions, these can be readily used in any query and generate instant business decisions and insights where and when you need them.

Review the UPPI for IBM Netezza Product Data Sheet

Discover all the features and UPPI supported databases

Explore the IBM Netezza Analytics website

Tuesday, April 12, 2011

Universal PMML Plug-in for EMC Greenplum Database

It is our pleasure to announce a new Zementis product, the Universal PMML Plug-in for in-database scoring. Available now for the EMC Greenplum Database, a high-performance massively parallel processing (MPP) database, the plug-in leverages the Predictive Model Markup Language (PMML) to execute predictive models directly within EMC Greenplum, for highly optimized in-database scoring.

Developed by the Data Mining Group (DMG), PMML is supported by all major data mining vendors, e.g., IBM SPSS, SAS, Teradata, FICO, STASTICA, Microstrategy, TIBCO and Revolution Analytics as well as open source tools like R, KNIME and RapidMiner. With PMML, models built in any of these data mining tools can now instantly be deployed in the EMC Greenplum database. The net result is the ability to leverage the power of standards-based predictive analytics on a massive scale, right where the data resides.

"By partnering with Zementis, a true PMML innovator, we are able to offer a vendor-agnostic solution for moving enterprise-level predictive analytics into the database execution environment," said Dr. Steven Hillion, Vice President of Analytics at EMC Greenplum. "With Zementis and PMML, the de-facto standard for representing data mining models, we are eliminating the need to recode predictive analytic models in order to deploy them within our database. In turn, this enables an analyst to reduce the time to insight required in most businesses today."

Want to learn more?

To learn more about how the EMC Greenplum Database and the Universal PMML Plug-in work together, feel free to:

Visit the PMML Plug-in product page
Download the white paper

The Universal PMML Plug-in for the EMC Greenplum Database is available now. Contact us today for more information.