Predictive Analytics, Big Data, Hadoop, PMML: 2013

Friday, November 8, 2013

Big Data Scoring with UPPI for IBM Pure Data (for Analytics and Hadoop)

In-database scoring is one of the most straightforward ways to gain insights from Big Data. It is no surprise then that the Zementis Universal PMML Plug-in (UPPI) is now being offered for a variety of database platforms. These include IBM Pure Data for Analytics (Netezza), Pivotal/Greenplum, SAP Sybase IQ, Teradata and Teradata Aster. Zementis also offers UPPI for Hadoop/Hive, including IBM Pure Data for Hadoop as well as InfoSphere BigInsights. It is in this context that we travelled to Vegas to attend the IBM Information on Demand (IOD) Conference.

I must say, I am always impressed by the IBM universe of products and tools that are being offered for analytics (descriptive and predictive) as well as Big Data in general. Zementis had a booth inside the Pure Data exhibit area and next to all the Pure Data appliances. As you can imagine, traffic was solid not just because of all the blinking lights but also because the conference itself attracts a lot of people. I believe there were 14 thousand attendants this year.

Why in-database scoring? Well, simple. Not all analytic tasks are born the same. If one is confronted with massive volumes of data that need to be scored on a regular basis, in-database scoring sounds like the logical thing to do. In all likelihood, the data in this case is already stored in a database and, with in-database scoring, there is no data movement. Data and models reside together hence scores and predictions flow on an accelerated pace.

Why scoring in Hadoop? Big Data and Hadoop are somewhat synonymous terms these days, since the latter offers an important technological platform to tackle the challenge of analyzing large volumes of data. In fact, predictive analytics is paramount for companies to extract value and insight from such data. By offering the Universal PMML Plug-in (UPPI) for Hadoop, Zementis takes a big step in making its technology available for companies around the globe to easily deploy, execute, and integrate scalable standards-based predictive analytics on a massive parallel scale through the use of Hive, a data warehouse system for Hadoop.

UPPI brings together essential technologies, offering the best combination of open standards and scalability for the application of predictive analytics. It fully supports the Predictive Model Markup Language (PMML), the de facto standard for data mining applications, which enables the integration of predictive models from IBM/SPSS, SAS, R, and many more.

Saturday, November 2, 2013

1-Click Launch for Big Data Scoring: ADAPA on AWS Marketplace

Clients benefit from our solutions by being able to use PMML, the Predictive Model Markup Language, to move their predictive models from IBM SPSS, R, SAS EM, ... and deploy them instantly in a variety of platforms, including the Amazon Elastic Compute Cloud (Amazon EC2).

ADAPA on the Amazon Cloud offers the power of our real-time PMML-based scoring engine on the Amazon Cloud. ADAPA on the Amazon Cloud comes pre-installed on a virtual server on the cloud. We call that an "ADAPA Instance".

The AWS (Amazon Web Services) Marketplace gives you the power of having ADAPA at your fingertips on three different types of virtual machines. Once you select the machine type and the cloud region in which you want it to run (US, Europe, Latin America or Asia-Pacific), all you need to select is 1-Click Launch and moments later your ADAPA instance is up and running, ready for deployment and execution.

Visit us at the AWS Marketplace!

Big Data Scoring through ADAPA with S3 Processing

Zementis makes it super easy to score your big data by connecting your Amazon S3 (Simple Storage Service) bucket to your predictive models deployed in ADAPA on the Amazon Cloud. ADAPA with S3 Processing is intended for mission critical applications that require very high throughput of predictive analytics. While ADAPA provides real-time scoring via a Web-services API, S3 Processing addresses use cases with scoring requirements that involve tens or hundreds of millions of rows at a time.

Wednesday, October 9, 2013

CIO Review: Zementis selected as one of the top 20 most promising big data companies

Selected by a distinguished panel comprising of CEOs, CIOs, VCs, industry analysts and the editorial board of CIO Review, Zementis has been named by CIO Review as one of the "Top 20 Most Promising Big Data Companies in 2013." Congratulations Zementis!

Read CIO Review - FULL ARTICLE

That comes as no surprise since Zementis is all about kicking down barriers for the fast deployment and execution of predictive solutions. By leveraging the PMML (Predictive Model Markup Language) standard, Zementis' products allow for predictive models built anywhere (IBM SPSS, KXEN, KNIME R, SAS, ...) to be deployed right-away on-site, in the cloud (Amazon, IBM, FICO), in-database (Pivotal/Greenplum, SAP Sybase IQ, IBM PureData for Analytics/Netezza, Teradata and Teradata Aster) or in Hadoop (Hive or Datameer).

Predictive analytics has been used for many years to learn patterns from historical data to literally predict the future. Well known techniques include neural networks, decision trees, and regression models. Although these techniques have been applied to a myriad of problems, the advent of big data, cost-efficient processing power, and open standards have propelled predictive analytics to new heights.

Big data involves large amounts of structured and unstructured data that are captured from people (e.g., on-line transactions, tweets, ... ) as well as sensors (e.g., GPS signals in mobile devices). With big data, companies can now start to assemble a 360 degree view of their customers and processes. Luckily, powerful and cost-efficient computing platforms such as the cloud and Hadoop are here to address the processing requirements imposed by the combination of big data and predictive analytics.

Creating predictive solutions is just part of the equation. Once built, they need to be transitioned to the operational environment where they are actually put to use. In the agile world we live today, the Predictive Model Markup Language (PMML) delivers the necessary representational power for solutions to be quickly and easily exchanged between systems, allowing for predictions to move at the speed of business.

Zementis' PMML-based products: ADAPA for real-time scoring and UPPI for big data scoring, are designed from the ground up to deliver the agility necessary for models to be easily deployed in a variety of platforms and to be put to work right-away.

Zementis ADAPA and UPPI kick-down the barriers for big data adoption!

Wednesday, October 2, 2013

R PMML Support: BetteR than EveR

How does it work? Simple! Once you build your model in R using any of the PMML supported model types, pass the model object as an input parameter to the pmml package as shown in the figure below.

pmml package

The pmml package offers export for a variety of model types, including:

   •   ksvm (kernlab): Support Vector Machines
   •   nnet: Neural Networks
   •   rpart: C&RT Decision Trees
   •   lm & glm (stats): Linear and Binary Logistic Regression Models
   •   arules: Association Rules
   •   kmeans and hclust: Clustering Models
   •   multinom (nnet): Multinomial Logistic Regression Models
   •   glm (stats): Generalized Linear Models for classification and regression with
         a wide variety of link functions
   •   randomForest: Random Forest Models for classification and regression
   •   coxph (survival): Cox Regression Models to calculate survival and stratified
         cumulative hazards
   •   naiveBayes (e1071): Naive Bayes Classifiers
   •   glmnet: Linear ElasticNet Regression Models
   •   ada: Stochastic Boosting (coming soon)
   •   svm (e1071): Support Vector Machines (coming soon)

The pmml package can also export data transformations built with the pmmlTransformations package (see below). It can also be used to merge two distinct PMML files into one. For example, if transformations and model were saved into separate PMML files, it can combine both files, as described in Chapter 5 of the PMML book - PMML in Action

Data Transformations - the R pmmlTransformations Package

The pmmlTransformations package transforms data and, when used in conjunction with the pmml package, allows for data transformations to be exported together with the predictive model in a single PMML file. Transformations currently supported are:

   •   Min-max normalization
   •   Z-score normalization
   •   Dummy-fication of categorical variables
   •   Value Mapping
   •   Variable renaming

To learn more about this package, check out the paper we presented at the KDD 2013 PMML Workshop.

Tuesday, September 10, 2013

Predictive model deployment with PMML

Model deployment used to be a big task. Predictive models, once built, needed to be re-coded into production to be able to score new data. This process was prone to errors and could easily take up to six months. Re-coding of predictive models has no place in the big data era we live in. Since data is changing rapidly, model deployment needs to be instantaneous and error-free.

PMML, the Predictive Model Markup Language, is the standard to represent predictive models. Given that PMML can be produced by all the top commercial and open-source data mining tools (e.g., FICO Model Builder, SAS EM, IBM SPSS, R, KNIME, ...), a predictive model can be easily moved into the production environment once it is represented as a PMML file.

Zementis offers ADAPA for real-time scoring and UPPI for big data scoring which make the entire model deployment process a no-brainer. Given that ADAPA and UPPI are universal PMML consumers (accept any version of PMML produced by any PMML-compliant tool), they can make predictive models instantly available for execution inside the production environment.

Check out the Zementis website for details.

Predictive Models with PMML - Upcoming workshop at UCSD Extension - Oct 24-25

October 24-25, 2013
San Diego Supercomputer Center (SDSC), UC San Diego Campus

TO REGISTER, FOLLOW THE LINK BELOW:

http://extension.ucsd.edu/studyarea/index.cfm?vAction=singleCourse&vCourse=CSE-41184&vStudyAreaID=4

The Predictive Model Markup Language (PMML) is the de facto standard to represent data mining and predictive analytic models. With PMML, one can easily share a predictive solution among PMML-compliant applications and systems.

Developed in partnership with the San Diego Supercomputer Center’s (SDSC) Predictive Analytics Center of Excellence (PACE), this 2-day, hands-on workshop, will explore how the PMML language allows for models to be deployed in minutes. You will get to know its business value and the data mining tools and companies supporting PMML. You will also begin to understand the language elements and capabilities and learn how to effectively extract the most out of your PMML code.

Workshop Benefits

Practice PMML on SDSC’s Gordon with the guidance of world class instructors from industry and academia.
Learn how to represent an entire data mining solution using open-standards
Understand how to use PMML effectively as a vehicle for model logging, versioning and deployment
Identify and correct issues with PMML code as well as add missing computations to auto-generated PMML code
PLUS…Receive a comprehensive tour of SDSC to discover its inner workings, extensive capabilities and current projects.

Instructors

Alex Guazzelli, Ph.D., Vice President of Analytics, Zementis, Inc.
Natasha Balac, Ph.D., Director of PACE, SDSC, UC San Diego
Paul Rodriguez, Ph.D., Research Programmer Analyst, SDSC, UC San Diego

Scholarships Available!
Thanks to the generous underwriting of Zementis, three (3) half-tuition scholarships are available. Learn more and apply

Note: Students should have a fundamental knowledge of data mining methods and basic experience with computer programming language. Students must bring a laptop (MAC or PC) each day to fully participate during the hands-on portion of the workshop.

Course Number: CSE-41184 Credit: 2 units

This course is part of the following Certificate Program(s):

Data Mining

TO REGISTER, FOLLOW THE LINK BELOW:

http://extension.ucsd.edu/studyarea/index.cfm?vAction=singleCourse&vCourse=CSE-41184&vStudyAreaID=4

Wednesday, August 21, 2013

R and PMML Support

A PMML package for R that exports all kinds of predictive models is available directly from CRAN.
Traditionally, the pmml package offered support for the following data mining algorithms:

ksvm (kernlab): Support Vector Machines
nnet: Neural Networks
rpart: C&RT Decision Trees
lm & glm (stats): Linear and Binary Logistic Regression Models
arules: Association Rules
kmeans and hclust: Clustering Models

Recently, it has been expanded to support:

multinom (nnet): Multinomial Logistic Regression Models
glm (stats): Generalized Linear Models for classification and regression with a wide variety of link functions
randomForest: Random Forest Models for classification and regression
coxph (survival): Cox Regression Models to calculate survival and stratified cumulative hazards
naiveBayes (e1071): Naive Bayes Classifiers
glmnet: Linear ElasticNet Regression Models

The pmml package can also export data transformations built with the pmmlTransformations package (see below). It can also be used to merge two disctinct PMML files into one. For example, if transformations and model were saved into separate PMML files, it can combine both files into one, as described in Chapter 5 of the PMML book - PMML in Action.

How does it work?

Simple, once you build your model using any of the supported model types, pass the model object as an input parameter to the pmml function as shown in the figure below:

Example - sequence of R commands used to build a linear regression model using lm and the Iris dataset:

Documentation

For more on the pmml package, please take a look at the paper we published in The R Journal. For that, just follow the link below:
1) Paper: PMML: An Open Standard for Sharing Models
Also, make sure to check out the package's documentation from CRAN:
2) CRAN: pmml Package

R PMML Transformations Package

This is a brand new R package. Called pmmlTranformations, this package transforms data and when used in conjunction with the pmml package, it allows for data transformations to be exported together with the predictive model in a single PMML file. Transformations currently supported are:

Min-max normalization
Z-score normalization
Dummy-fication of categorical variables
Value Mapping
Discretization (binning)
Variable renaming

If you would like to contribute code to the pmmlTransformations package, please feel free to contact us.

How does it work?

The pmmlTransformations package works in tandem with the pmml package so that data pre-processing can be represented together with the model in the resulting PMML code.

In R, as shown in the figure below, this process includes three steps:

With the use of the pmmlTransformations package, transform the raw input data as appropriate
Use transformed and raw data as inputs to the modeling function/package (hclust, nnet, glm, ...)
Output the entire solution (data pre-processing + model) in PMML using the pmml package

Example - sequence of R commands used to build a linear regression model using lm with transformed data

Documentation

For more on the pmmlTransformations package, please take a look at the paper we wrote for the KDD 2013 PMML Workshop. For that, just follow the link below:
1) KDD Paper: The R pmmlTransformations Package

Also, make sure to check out the package's documentation from CRAN:
2) CRAN: pmmlTransformations Package

Wednesday, August 7, 2013

Data Transformations - from R to PMML - The pmmlTransformations Package

We are very excited to announce the availability of the R pmmlTransformations package. This package allows you to export data transformations together with your model from R into a PMML file, which you can then be deployed in the Zementis ADAPA or UPPI scoring engines. Real-time or big data scoring made easy with R, PMML, and Zementis.

The pmmlTransformations package provides R users with functions that greatly enhance the available data mining capabilities and PMML support by allowing transformations to be performed on the data before it is used for modeling. The pmmlTransformations package works in tandem with the pmml package so that data pre-processing can be represented together with the model in the resulting PMML code.

In R, this process includes three steps:

With the use of the pmmlTransformations package, transform the raw input data as appropriate
Use transformed and raw data as inputs to the modeling function/package (hclust, nnet, glm, ...)
Output the entire solution (data pre-processing + model) in PMML using the pmml package

Screen_Shot_2013-08-06_at_10.46.57_AM.png

The pmmlTransformations package is available for download in CRAN (as well as the pmml package). Give it a try!

Want to learn more? Check out the paper we published about the pmmlTransformations package at KDD 2013.

Wednesday, July 10, 2013

PMML Workshop at KDD 2013 and UCSD Extension PMML Class

KDD 2013 PMML Workshop

Join us for the KDD PMML Workshop to be held in Chicago on August 11. Organized by the Data Mining Group (DMG), this workshop will feature invited talks and presentations of selected papers.

Zementis will be presenting two papers about PMML-support in R: Coding and representing data transformations and model through the pmmltransformations and pmml packages.

UCSD PMML Class (Coming this Fall)

UCSD Extension has teamed up with the San Diego Supercomputer Center Predictive Analytics Center of Excellence (PACE) and Zementis to offer a PMML class to the data mining community on October 24 and 25.

For more information about this great opportunity to learn the standard that is revolutionizing how predictive solutions are documented and deployed, refer to the UCSD Extension catalog.

Tuesday, May 7, 2013

The Zementis Partnership with FICO

Stuart Wells, FICO CTO, announced the strategic partnership between Zementis and FICO at FICO World on May 2, 2013. FICO clients will now benefit from the outstanding Zementis scoring technology.

How? The Zementis ADAPA scoring engine provides a highly scalable framework to deploy, integrate, and execute complex data mining and predictive models based on the PMML standard. Models built in most commercial and open source data mining tools, such as FICO Model Builder or R, can now instantly be deployed in the FICO Anaytic Cloud.

Customers, application developers and FICO partners will be able to extract value and insight from their predictive models and data immediately, using ADAPA and PMML. This will result in quicker time to innovation and value on their analytic applications.

Read the press release!

Predictive Analytics Deployment

Zementis offers software solutions that enable scalable, real-time execution of predictive analytics across a variety of platforms based on the PMML standard. These include:

ADAPA Scoring Engine: Our solution for real-time scoring. ADAPA is available for on-site deployment as a traditional license or as a service in the Amazon Elastic Compute Cloud (EC2) and IBM SmartCloud Enterprise. And now, with our FICO partnership, ADAPA will also be available in the FICO Analytic Cloud.

UPPI, the Universal PMML Plug-in: The leading solution for Big Data, UPPI provides scoring in-database and for Hadoop. It is available for EMC Greenplum, IBM Netezza, SAP Sybase IQ, Teradata/Aster as well as Hadoop/Hive and Datameer.

Friday, April 12, 2013

The Zementis Partnership with Infocom in Japan

It is our pleasure to announce a strategic partnership with Infocom. If you missed out on our press release, here is the headline:

Zementis and Infocom partner to deliver predictive analytic solutions in Japan.

Dedicated to the Japanese market, Infocom combines strong expertise in data mining and predictive analytics with extensive delivery and consulting capabilities.

Zementis offers software solutions that enable scalable, real-time execution of predictive analytics across a variety of platforms based on the PMML standard. These include the ADAPA Scoring Engine available for on-site deployment or in the cloud, and UPPI, the Universal PMML Plug-in for in-database scoring and Hadoop (available for IBM Netezza, Teradata/Aster, EMC Greenplum, SAP Sybase IQ as well as Hadoop and Datameer).

Infocom will market, distribute and support Zementis's predictive analytics software in Japan.

To take a look at the press release, click HERE.

Additional Online Resources

Visit the Zementis resources pages for videos and articles on our products and PMML
Follow @Zementis on Twitter
Join the PMML discussion forum on LinkedIn

Thursday, April 11, 2013

Predictive Model Markup Language (PMML) Workshop at KDD 2013 in Chicago

Please join us for a Predictive Model Markup Language (PMML) workshop at KDD 2013 in Chicago on August 11, 2013, to exchange exciting new developments, leading practices, and high impact applications in big data, knowledge discovery and data mining which utilize the PMML standard.

The annual ACM SIGKDD conference on Knowledge Discovery and Data Mining (KDD) is the premier international forum for data mining and big data researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. We invite submission of papers describing implementations of the Predictive Model Markup Language (PMML). Submitted papers will go through a competitive peer review process. Please consult the workshop website for full details regarding paper preparation and submission guidelines.

PMML workshop website
http://kdd13pmml.wordpress.com/

KDD conference web site
http://www.kdd.org/kdd2013/

Thursday, March 21, 2013

R PMML Support: BetteR than EveR

Once represented as a PMML file, a predictive solution (data transformations + model) can be readily moved into the operational environment where it can be put to work immediately. That's the promise of PMML.

R is living up to that promise through its strong PMML export capabilities. The latest addition to the list of supported model types is Naive Bayes classifiers. More specifically, the R PMML package allows for PMML export for Naive Bayes models built using the naiveBayes function of the e1071 package.

For more details and for a complete list of supported model types (as well as data pre-processing), click HERE.

Thursday, March 7, 2013

Making the case for PMML and ADAPA

If you are not familiar with PMML, the Predictive Model Markup Language, you may be wondering what all the fuss is about ...

PMML is the de facto standard to represent data mining and predictive analytic solutions. With PMML, one can easily share a predictive solution among PMML-compliant applications and systems For example, you can build your model in R, export it in PMML, and use ADAPA, the Zementis Scoring Engine, to deploy it in production.

Many data mining models are a one-time affair. You use historical data to build the model and use it to analyze ... historical data. Wait! That sounds more like descriptive analytics, not predictive analytics. Well, that is sort of true. To be truly predictive, a data mining model needs to be applied to new data. These are the models that need to be operationally deployed and, from my point of view, these are the solutions that are truly revolutionizing the way we do business and live in the Big Data world.

If you want then to use your data mining model to make predictions when presented with new data, it needs to be a dynamic asset. It cannot be static. You need to be able to build it and instantly put it to use. And, that's where PMML and ADAPA come in handy.

Obviously, a few data mining tools try to lock you in. You happily build the model using tool A, just to realize that you need the same tool to execute it. In this case, you are missing out. Here are some of the benefits of moving your predictive model to ADAPA:

Overcome speed/memory limitations
Dramatically lower your infrastructure cost
Tap into all the advantages of cloud computing with ADAPA on the Cloud (IBM SmartCloud or Amazon EC2)
Produce scores in real-time (using Web Services or Java API), on-demand, or batch-mode
Execute your models directly from Excel, by using the ADAPA Add-in for Excel
Benefit from using a set of PMML-compliant model development tools (best of breed)
Deploy your models in minutes
Manage models via Web Services or a Web console
Upload one or many models into ADAPA at once
Benefit from the seamless integration of business rules and predictive models (yes, for those who need it, ADAPA comes with a business rules engine)

PMML and ADAPA allow you to use best of breed tools (not the same old tool) for the job at hand. Also, you can leverage the expertise from a diverse group of data scientists. That means, not all your data scientists need to be experts on a single tool. They can use different tools that share one thing in common, the PMML standard. And, once represented in PMML, models can be easily understood by all team members. PMML allows for transparency and, in doing so, fosters best practices.

Why not benefit from: 1) an open standard to represent data mining models; and 2) a proven scoring engine that consumes any version of PMML and make it available for execution right away, in real-time?

Keep also in mind that ADAPA's sister product, the Universal PMML Plug-in (UPPI), allows you to move the same PMML file in-database or Hadoop. UPPI is currently available for EMC Greenplum, SAP Sybase IQ, IBM Netezza, and Teradata/Aster. With UPPI for in-database scoring, there is no need to move your data outside the database. Data and models reside inside it and so there is minimal data movement and maximum scoring speed. UPPI is also available for Datameer and will soon be available for Hadoop/Hive.

Making a model operational in minutes has never been easier! And, it is all because of PMML and scoring tools such as ADAPA and UPPI.

Monday, February 25, 2013

The Zementis Partnership with Teradata

The partnership between Zementis and Teradata allows customers with a variety of data mining tools to efficiently deploy predictive models based on the Predictive Model Markup Language (PMML) standard. Focused on Big Data applications, the Universal PMML Plug-in (UPPI) for Teradata enables scalable execution of standards-based predictive analytics directly within the Teradata data warehouse.

To read more about the benefits of running your predictive solutions inside Teradata and Teradata Aster, please visit:

http://www.teradata.com/templates/Partners/PartnerProfile.aspx?id=12884902321

PMML Scoring

Zementis offers a range of products that make possible the deployment of predictive solutions and data mining models built in all the top commercial and open-source data mining vendors. Our products include the ADAPA Scoring Engine for real-time scoring and UPPI, which is currently available for a host of database platforms as well as Hadoop/Datameer. For a list of available platforms, please visit our in-database products page.

Rationale

Not all analytic tasks are born the same. If one is confronted with massive volumes of data that need to be scored on a regular basis, in-database scoring sounds like the logical thing to do. In all likelihood, the data in this case is already stored in a database and, with in-database scoring, there is no data movement. Data and models reside together hence scores and predictions flow on an accelerated pace

Thursday, January 24, 2013

R PMML Support: Data Transformations

R and PMML Export

R is becoming the tool of choice for many data scientists. It is no wonder that many commercial and open-source statistical tools are also embracing R.

Predictive Models

A set of robust predictive analytic techniques is but one set of tools available to data scientists in R. Another important set is the ability to export PMML for a host of predictive models.

By using the pmml package (version 1.2.33 or higher), users can export PMML from R for:

Random Forest Models
Neural Networks
Clustering Models
Cox Regression Models
Linear and Logistic Regression Models
Support Vector Machines
Association Rules
Generalized Linear Models
Random Survival Forest Models

Data Transformations

And now, another R package extends this functionality by providing PMML export for data transformations. The new pmmlTransformations package has just made its way to CRAN (the Comprehensive R Archive Network).

Want to apply a Z-scoring normalization procedure to your continuous input variables before presenting them to a neural network? No problem. Use the pmmlTransformations package in conjunction with the pmml package (version 1.2.33 or higher) to export the entire process (pre-processing + model) into a PMML file.

To look at the package's documentation in CRAN, click HERE.

Agile Predictive Analytics Deployment

Zementis offers a host of products for the agile deployment and execution of your PMML-based solutions. Our ADAPA and UPPI scoring engines are available for:

Hadoop: Datameer and Hadoop/Hive
In-database: EMC Greenplum, IBM Netezza, SAP Sybase IQ, Teradata, and Teradata Aster
Cloud: Amazon EC2 and IBM SmartCloud Enterprise
On-site: On your own servers

Real-time or Big Data requirements? Zementis has you covered.

Contact us today for more information or to schedule a presentation/demo.

Wednesday, January 9, 2013

PMML, Big Data, and Hadoop: Predictive Analytics at Work!

Big Data and Hadoop are somewhat synonymous terms these days, since the latter offers an important technological platform to tackle the challenge of analyzing large volumes of data. By the same token, predictive analytics is paramount for companies to extract value and insight from big data. It is in this context that Zementis brings its standards-based predictive scoring engine into a variety of Big Data platforms, including the cloud as well as in-database. By offering the Universal PMML Plug-in (UPPI) for Hadoop, Zementis takes a big step in making its technology available for companies around the globe to easily deploy, execute, and integrate scalable standards-based predictive analytics on a massive parallel scale through the use of Hive, a data warehouse system for Hadoop, and Datameer, an end-to-end BI solution that works on top of Hadoop.

UPPI brings together essential technologies, offering the best combination of open standards and scalability for the application of predictive analytics. It fully supports the Predictive Model Markup Language (PMML), the de facto standard for data mining applications, which enables the integration of predictive models from IBM/SPSS, SAS, R, and many more.

UPPI for Hadoop/Hive

Hive makes it possible for large datasets stored in Hadoop compatible systems to be easily analyzed. Since it provides a mechanism to project structure onto the data, Hive allows for queries to be made using a SQL-like language called HiveQL.

Once deployed in UPPI, predictive models turn into UDFs (User-defined Functions). These can then be invoked directly in HiveQL. In this way, UPPI offers Hadoop users the best combination of open standards and scalability for the application of predictive analytics.

UPPI for Hadoop/Hive delivers instant and scalable scoring for Big Data while retaining compatibility with most major data mining tools through the PMML Standard. It also brings brings the scalability of Hadoop to the execution of predictive analytics.

UPPI for Datameer

Zementis and Datameer have partnered to deliver standards-based execution of predictive analytics on a massive parallel scale. This joint solution combines the Zementis plug-in for execution of predictive models with the power and scale of Datameer, an end-to-end BI solution that includes data source integration, an analytics engine, visualization and dashboarding.

Datameer uses Apache Hadoop, a Java-based framework that supports the parallel storage and processing of large data sets in a distributed environment, as its back-end storage and processing engine to scale cost-effectively to 4000 servers and petabytes of data. It provides wizard-based data integration to integrate large datasets of structured and unstructured data, integrated analytics with familiar spreadsheet-like interface and over 200 built-in analytic functions and drag and drop reporting and dashboarding visualization for end-users. Open API's for data integration, analytics and dashboarding make it easy to access custom data sources, utilize advanced or custom analytics like predictive modeling as well as custom visualizations.

Predictive Scoring for Hadoop - Advantages

UPPI for Datameer delivers instant and scalable scoring for Big Data while retaining compatibility with most major data mining tools through the PMML Standard. Through its versatile deployment solution, the Zementis/Datameer partnership:

Brings the scalability of Hadoop to the execution of predictive analytics
Supports PMML to avoid time-consuming and expensive one-off predictive analytics projects
Integrates data from multiple data sources and formats without complex data and schema mappings that are time consuming to set up and difficult to change
Provides cost effective storage and processing of large volumes of highly granular data that predictive applications often require
Brings together a 100% standards-based approach to analytics that lowers total cost of ownership and increases reuse control and flexibility for orchestrating critical day-to-day business decisions.

Friday, November 8, 2013

Saturday, November 2, 2013

Wednesday, October 9, 2013

Wednesday, October 2, 2013

Tuesday, September 10, 2013

This course is part of the following Certificate Program(s):

Wednesday, August 21, 2013

How does it work?

Documentation

R PMML Transformations Package

How does it work?

Documentation

Wednesday, August 7, 2013

Wednesday, July 10, 2013

Tuesday, May 7, 2013

Friday, April 12, 2013

Thursday, April 11, 2013

Thursday, March 21, 2013

Thursday, March 7, 2013

Monday, February 25, 2013

PMML Scoring

Rationale

Thursday, January 24, 2013

Wednesday, January 9, 2013

Welcome to the World of Predictive Analytics!