Analytics Blog

Meet the New Operators in Chorus 6.2

With each release of Chorus, we see product enhancements for usability, integrations, security and more. But there is also continuous growth in the ETL and machine learning algorithms available to users. Many of these quietly slip into the operator list in your Chorus sidebar without notice, so we’d like to take this opportunity to introduce you to the new operators available in the Chorus 6.2.

The new operators in this release cover a wide range of functionality, including database versions of the Correlation Filter and Unpivot, new and improved LDA operators, Window Functions, and two new Decision Tree operators. As always, we love to take your feedback into consideration when creating new operators, so if there’s an algorithm or model you’d like to see, let us know!

The new operators in 6.2 represent a variety of best practices. The new Decision Tree and LDA operators replace previous versions of these models. As academic research and open source assets (e.g. MADlib) improve, our team at Alpine Data strives to adapt and offer the best algorithms possible. The new decision tree operators now enable the analysis of both categorical and numeric dependent variables. And the new LDA operators leverage MLLib’s LDA algorithm with Online Variational Bayes optimizer, which is more memory-friendly, especially on a large number of documents or vocabulary.

Alpine Data engineers and data scientists built these operators using the same custom operator framework that is available to our Chorus users. To learn more about how you can get started with developing your own custom operators, read our documentation page. Or reach out to your account manager if your company is interested in training or a professional services engagement to help achieve their goals.

Chorus 6.2 Operator Directory

 Operator  Category  Support Description
Correlation Filter Transform Database/Hadoop Correlation Filter is a Transformation operator that allows the user to filter numeric columns so the remaining columns are not correlated strongly with each other.
  Decision Tree Classification – CART  Model Database/Hadoop Decision Tree Classification (CART) operator uses the MADlib built-in function, tree_train(), to generate a decision tree that predicts the value of a categorical column based on several independent columns.
Decision Tree Regression – CART Model Database Like the Decision Tree Classification – CART, the Decision Tree Regression – CART generates a tree that predicts the value of independent columns. In this operator, the independent columns are numeric rather than categorical.
 LDA Trainer Model  Hadoop LDA (Latent Dirichlet Allocation) is an unsupervised text-mining algorithm used to analyze collections of unstructured documents.

In LDA, each document may be viewed as a mixture of various topics.

   LDA Predictor Transform Hadoop LDA Predictor is a prediction operator that uses both the model trained by the LDA Trainer and a tabular dataset.
   Unpivot Transform Database/Hadoop The Unpivot operator allows you to select columns to be un-pivoted. The columns selected will be removed from the input and will be ‘flattened’ into two new columns with column names and values respectively.
 

Window Functions:

  • Aggregate
  • Lag Lead
  • Rank
Transform Hadoop Unlike the regular Aggregation operator, Window Function: Aggregate lets you compute aggregated variables for each input row, based on the specified frame.

Lag Lead returns the value of the column that is n rows before (lag) or after (lead) the current row.

Rank returns the ordered value based on a specified column.