Thursday, January 24, 2013

R PMML Support: Data Transformations


R and PMML Export 
  
R is becoming the tool of choice for many data scientists. It is no wonder that many commercial and open-source statistical tools are also embracing R.

Predictive Models

A set of robust predictive analytic techniques is but one set of tools available to data scientists in R. Another important set is the ability to export PMML for a host of predictive models. 

By using the pmml package (version 1.2.33 or higher), users can export PMML from R for:
  • Random Forest Models
  • Neural Networks
  • Clustering Models
  • Cox Regression Models
  • Linear and Logistic Regression Models
  • Support Vector Machines
  • Association Rules
  • Generalized Linear Models
  • Random Survival Forest Models

Data Transformations

And now, another R package extends this functionality by providing PMML export for data transformations. The new pmmlTransformations package has just made its way to CRAN (the Comprehensive R Archive Network). 

Want to apply a Z-scoring normalization procedure to your continuous input variables before presenting them to a neural network? No problem. Use the pmmlTransformations package in conjunction with the pmml package (version 1.2.33 or higher) to export the entire process (pre-processing + model) into a PMML file. 

To look at the package's documentation in CRAN, click HERE.

Agile Predictive Analytics Deployment

Once represented as a PMML file, a predictive solution (data transformations + model) can be readily moved into the operational environment where it can be put to work immediately. That's the promise of PMML.

Zementis offers a host of products for the agile deployment and execution of your PMML-based solutions. Our ADAPA and UPPI scoring engines are available for:
  • Hadoop: Datameer and Hadoop/Hive
  • In-database: EMC Greenplum, IBM Netezza, SAP Sybase IQ, Teradata, and Teradata Aster
  • Cloud: Amazon EC2 and IBM SmartCloud Enterprise
  • On-site: On your own servers
Real-time or Big Data requirements? Zementis has you covered.

Contact us today for more information or to schedule a presentation/demo.

Wednesday, January 9, 2013

PMML, Big Data, and Hadoop: Predictive Analytics at Work!


Big Data and Hadoop are somewhat synonymous terms these days, since the latter offers an important technological platform to tackle the challenge of analyzing large volumes of data. By the same token, predictive analytics is paramount for companies to extract value and insight from big data. It is in this context that Zementis brings its standards-based predictive scoring engine into a variety of Big Data platforms, including the cloud as well as in-database. By offering the Universal PMML Plug-in (UPPI) for Hadoop, Zementis takes a big step in making its technology available for companies around the globe to easily deploy, execute, and integrate scalable standards-based predictive analytics on a massive parallel scale through the use of Hive, a data warehouse system for Hadoop, and Datameer, an end-to-end BI solution that works on top of Hadoop.
UPPI brings together essential technologies, offering the best combination of open standards and scalability for the application of predictive analytics. It fully supports the Predictive Model Markup Language (PMML), the de facto standard for data mining applications, which enables the integration of predictive models from IBM/SPSS, SAS, R, and many more.
UPPI for Hadoop/Hive
Hive makes it possible for large datasets stored in Hadoop compatible systems to be easily analyzed. Since it provides a mechanism to project structure onto the data, Hive allows for queries to be made using a SQL-like language called HiveQL.
Hadoop HiveOnce deployed in UPPI, predictive models turn into UDFs (User-defined Functions). These can then be invoked directly in HiveQL. In this way, UPPI offers Hadoop users the best combination of open standards and scalability for the application of predictive analytics.

UPPI for Hadoop/Hive delivers instant and scalable scoring for Big Data while retaining compatibility with most major data mining tools through the PMML Standard. It also brings brings the scalability of Hadoop to the execution of predictive analytics.



UPPI for Datameer
Universal PMML Plug-inZementis and Datameer have partnered to deliver standards-based execution of predictive analytics on a massive parallel scale. This joint solution combines the Zementis plug-in for execution of predictive models with the power and scale of Datameer, an end-to-end BI solution that includes data source integration, an analytics engine, visualization and dashboarding.
Datameer uses Apache Hadoop, a Java-based framework that supports the parallel storage and processing of large data sets in a distributed environment, as its back-end storage and processing engine to scale cost-effectively to 4000 servers and petabytes of data. It provides wizard-based data integration to integrate large datasets of structured and unstructured data, integrated analytics with familiar spreadsheet-like interface and over 200 built-in analytic functions and drag and drop reporting and dashboarding visualization for end-users. Open API's for data integration, analytics and dashboarding make it easy to access custom data sources, utilize advanced or custom analytics like predictive modeling as well as custom visualizations.
Predictive Scoring for Hadoop - Advantages
UPPI for Datameer delivers instant and scalable scoring for Big Data while retaining compatibility with most major data mining tools through the PMML Standard. Through its versatile deployment solution, the Zementis/Datameer partnership:
  • Brings the scalability of Hadoop to the execution of predictive analytics
  • Supports PMML to avoid time-consuming and expensive one-off predictive analytics projects
  • Integrates data from multiple data sources and formats without complex data and schema mappings that are time consuming to set up and difficult to change
  • Provides cost effective storage and processing of large volumes of highly granular data that predictive applications often require
  • Brings together a 100% standards-based approach to analytics that lowers total cost of ownership and increases reuse control and flexibility for orchestrating critical day-to-day business decisions.

Welcome to the World of Predictive Analytics!

© Predictive Analytics by Zementis, Inc. - All Rights Reserved.





Copyright © 2009 Zementis Incorporated. All rights reserved.

Privacy - Terms Of Use - Contact Us