The DMG (Data Mining Group) has just released PMML 4.0, the latest and greatest version of the Predictive Model Markup Language.
Zementis, together with SPSS, SAS, IBM, Open Data Group, Salford Systems, Microstrategy and all the other contributing members of the DMG is proud to be part of the making of PMML, the de facto standard to represent data mining models.
Not only PMML can represent a wide range of statistical techniques, but it can also be used to represent the data transformations necessary to transform raw data into meaningful feature detectors. In this way, PMML offers a standard to represent data manipulation and modeling in a single concise way.
Improved Pre-Processing Capabilities
PMML 4.0 extends the range of pre-processing capabilities supported by older versions by adding a range of boolean operations (e.g., and, or, not, equal, notEqual, greaterOrEqual, ...) to the list of built-in functions. These, combined with an IF-THEN-ELSE function which is also new to PMML, allow for the representation of a wide range of feature detectors.
For examples on how to use these new pre-processing capabilities as well as all the standard PMML transformations, please check the PMML Data Pre-Processing Primer.
Time Series Models
PMML 4.0 also extends the existing standard by allowing for the representation of Time Series Models. In particular, it allows for data miners and data mining tools to represent Exponential Smoothing models and offers place holders for ARIMA, Seasonal Trend Decomposition, and Spectral Analysis which are to be supported in the near future.
Other additions are Model Explanation and Multiple Models. Model Explanation allows for evaluation and model performance measures to be part of the PMML file itself. In this way, not only data manipulation and models get to be defined, but also associated ROC Graph, Gains/Lift Charts, Confusion Matrix, Field Correlations, Univariate Statistics, and more.
Multiple Models allows for model composition, ensembles, and segmentation. It replaces the old Model Composition element to offer great flexibility for combining different models types, such as regression and decision trees.
Extending Existing Elements
Last, but not least, PMML 4.0 offers a range of extensions to existing elements, such as the addition of multi-class classification for Support Vector Machines, improved representation for Association Rules, and the addition of Cox Regression Models.
There is no doubt that PMML is here to stay. The announcement of PMML 4.0 attests to the commitment of the leading data mining vendors to be able to represent their solutions through a single language, a language that can be understood by all. It is our vision that users will be free to share models among many solutions, benefiting from an environment in which interoperability is truly attainable.
For more information on PMML and a list of useful links, please check PMML 101. Also, check the article "PMML: An Open Standard for Sharing Models" just published in The R Journal.
We also invite the entire community to join our on-going PMML discussion at the AnalyticBridge website.