The recent release of Chorus 6.2.2 brings five new ETL and Machine Learning operators to your analytics toolbox. These are quiet features that may not have caught your attention yet, but they make a big impact on the analytics functionality you have at your fingertips. Consider this blog post a highlight reel to welcome these new operators to the product!
Available in your sidebar by default, you’ll find four new operators in 6.2.2 — Association Rules, Resampling, Regression Evaluator and Replace Outliers. Regression Evaluator and Replace Outliers round out functionality already available in Hadoop, now adding functionality to their Database source counterparts.
The Resampling and Association Rules operators are new tools for the pre-processing and modeling steps of analysis. Resampling lets you rebalance values in your dataset to support model training. And Association Rules is a modeling operator used to identify patterns in datasets. One common usage is market basket analysis, determining which items are purchased together or in sequence.
6.2.2 also brings a new type of operator, geared specifically to analyses in the financial sector. The Quandl operator provides fast api access to current market data available on quandl.com. Operators such as Monte Carlo Simulation, GARCH and Compute Returns will be added to the product in following releases to further improve financial analysis use cases.
Our Data Science and Engineering teams build new operators in the product using the same framework that is publicly available to all Chorus users. To learn more about making your own custom operators, read our documentation page. Or reach out to your Alpine account manager to inquire about trainings or professional services available.
Chorus 6.2.2 Operator Directory
|Model||Hadoop||Association Rules modeling is a pattern identification algorithm. It is useful when analyzing unsupervised transactional data that is categorical in nature.
Finding frequent patterns can be applied to a variety of business use cases, such as shopping basket analysis, cross-marketing, product clustering, catalog design, store layout, sales campaign analysis, Web log (“click stream”) analysis, and DNA sequence analyses.
|Hadoop||The Quandl operator downloads datasets from Quandl, a leading financial and economic data provider, and generates tabular datasets on HDFS for consumption by other operators.
This can be used to analyze financial market data such as equities or inflation to build trading strategy models and recession forecasts.
|The Regression Evaluator operator computes several commonly used statistical tests to determine model accuracy. Outputs include Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Coefficient of Determination (R2). This is a helpful method to compare different regression models.|
|The Replace Outliers operator assists in cleaning datasets. It reduces the range of values for numeric columns, replacing all the values above and below a certain percentile threshold with the maximum/minimum value within that threshold. This can be used instead of filtering an absolute minimum and maximum value.|
|Resampling||Sample||Hadoop||The Resampling operator is Sample operator that changes the distribution of values in a single column. It can be used either to balance all values in the selected column or to change the proportion of one value.
Up-sampling and down-sampling are critical tools to ensure sufficiently balanced data during model training.