Alpine’s latest product release adds ten new operators, including the much-requested Python Execute operator. The rest fall into a two primary themes: (1) Data movement — copying between databases, moving to and from Excel, exporting to your Tableau server, and connecting to Hive. (2) Enhanced user experience in data analysis — new and improved ARIMA time series and neural network operators, and fuzzy join functionality to simplify data blending.
You can read an overview of each operator at the bottom of this blog post, and our documentation has a detailed review of the new functionality. But first, I’d like to highlight a few operators that I’m personally very excited to see join the product!
Since Chorus 6.0, we’ve focused on Python support including integrated Python notebooks, PySpark functionality, scheduling and the full gamut of collaboration as part of the Chorus platform. New in 6.3, we offer the Python Execute operator, which lets you insert Python notebooks directly into a workflow, much like our R, SQL, HQL, and Pig Execute operators. You simply need to select the desired notebook in the operator dropdown, and define how inputs and outputs should be connected to surrounding operators in your analysis.
Excel support is another long-awaited customer request available in 6.3. You can keep important Excel files in your Chorus workspace, and pull them directly into a Hadoop cluster or relational database with Import Excel. The operator cleans the file and gets it ready for downstream analysis. Its “sister” operator, Export to Excel, stores outputs from selected operators in a multi-tabbed Excel file in your Chorus workspace. From there, team members can comment on, share and download the workbook.
Export to Tableau provides another way to explore your analytic results in 3rd party applications. The operator creates a TDE file which is then pushed to your specified Tableau server. The new data source will be there waiting for you next time you log in to create a chart or dashboard. If you are interested in using Export to Tableau, reach out to the Alpine Support team for instructions on how to install this operator package.
The Alpine team is continuously developing new operators to increase the breadth of advanced analytics functions available to our customers. If you are interested in learning more about how we select and build these operators, or if you would like to request a specific algorithm for your analysis, please reach out to your Alpine representative.
Chorus 6.3 Operator Directory
|ARIMA Time Series||Model||Database||The ARIMA Time Series operator applies the ARIMA algorithm to an input time series dataset and generates step forecasts for simulation or predictive modeling needs.
Applications of this operator include predicting future retail sales, modeling the evolution of financial market prices, forecasting weather trends, and predicting I.T. server loads.
|Copy Between Databases||Load Data||Database||The Copy Between Databases operator provides a mechanism to copy data from a table on one database server to another database server.
This can be particularly helpful to transfer tables from development data sources to production data sources.
|Export to Excel||Tools||Database / Hadoop||This operator allows you to export multiple inputs as separate tabs to an Excel workbook stored in the current workspace.
If Excel is your preferred method of sharing results, use this operator to keep all of your data for your project within the Chorus workspace, taking advantage of version-control and collaboration features.
|Export to Tableau||Tools||Hadoop||The Export to Tableau operator converts a tabular dataset to the TDE format, and publishes it as a data source on your Tableau Server.
You can use this operator to directly connect outputs from your analytic workflows to Tableau charts and dashboards.
This is an add-on operator. Reach out to Alpine Support for more information on how to add this functionality to your instance.
|Fuzzy Join||Transform||Hadoop||This operator performs a fuzzy matching join to connect two datasets based on nearly-matching string values.
In many cases, user-input data can create slight variations in otherwise matching strings. The Fuzzy Join operator uses preprocessing steps and a specified similarity threshold to correctly join strings from two tables.
|Hive Table||Load Data||Hadoop||Hive Table is a source operator that connects a Hive Hadoop database, allowing the data to be incorporated into the workflow.|
|Import Excel||Load Data||Database / Hadoop||The Import Excel operator parses data from an Excel workbook in your workspace and creates a CSV dataset in your Hadoop File System or Relational Database.
Use this operator to hydrate data from Excel into your workflow, where you can continue analysis with desired transformation and machine learning techniques.
|Load to Hive||Load Data||Hadoop||The Load To Hive operator saves a table directly to a Hive database.
This can be used as a checkpoint for important results in your Hive Hadoop workflow.
|Neural Network||Model||Hadoop||This operator implements the MultiLayer Perceptron Classifier (MLCP) algorithm, a feedforward neural network consisting of multiple layers of nodes in a directed graph.
It is a classification algorithm that can be used for a wide range of applications.
|Python Execute||Tools||Database / Hadoop||The Python Execute operator allows you to run a Jupyter notebook stored in the current workspace as part of an Alpine workflow.
Take your Python and PySpark analyses one step further by incorporating them in a workflow and using substitution inputs.