Deploy models into production environments easily by leveraging both PMML and PFA industry standards.
Track and version analytic workflows and assign roles and data permissioning to various people involved in the model development process, from data engineers to data scientists to business users.
Schedule daily, nightly, and monthly runs to keep your workflows updated and running in production.
Bring the power of machine learning to applications you use everyday without exposing users to the underlying complexities of algorithms and data access.
Visual Workflow Editor
Empower data scientists and business analysts to build complex ETL & Machine Learning workflows using a drag-and-drop interface. Hundreds of operators are shipped out-of-the-box, and custom functions can be created using the Extensibility SDK.
Parallel Compute Engine
Fully utilize parallel computer clusters to run analytics efficiently. Automatically optimize analytic workloads based on your big data environment, all while leveraging MapReduce, Spark or SQL.
Build data pipelines in Python within containerized notebooks and leverage the processing power of Spark. Interact seamlessly with kerberized clusters to make sure your data is secure as you analyze it.
Spark & SQL
Leverage the power of Spark and SQL without worrying about configuring custom parameters for every workflow you build.
Big Data - HDP & DB
Connect to any data source remotely, whether it is Database or Hadoop, and allow users to provision sandboxes and build models without needing to move their data.
Chorus makes model deployment easy by providing turn-key deployment options that can have you up and running within minutes, without needed to go through complex configurations of your data sources and production environments.
Chorus can be readily deployed to either on-premise or cloud environments and supports not only the typical Hadoop distributions, but also AWS Elastic MapReduce (EMR). Chorus easily leverages data residing in RedShift or MySQL instances — sourcing and syncing data to and from S3. Models built in Chorus can be pushed into real-time scoring engines at the click of a button by using either PMML or PFA.
Chorus’ in-cluster approach to compute makes the platform highly performant and scalable. Data scientists and analysts can run algorithms at scale without moving any data or optimizing their algorithms based on complicated database logic.
In addition to analyzing data in place, Chorus utilizes parallel compute clusters to run analytics efficiently. The Parallel Workflow Engine optimizes analytics workflows based on whether the environment is Hadoop or in-database, taking advantage of MapReduce, Spark, or SQL where it is appropriate.The Distributed Execution Engine then pushes the code into the data platform in-parallel across the entire cluster.
Chorus leverages the latest big data security technologies to make sure your data is treated with the highest level of security.
Deploying analytics solutions against security policies such as Kerberos, Ranger, and Sentry can be challenging and many open-source implementations are inadequately suited to these environments. Chorus can be deployed easily to interact with secured clusters, enabling data scientists to perform advanced analytics using key technologies such as Hive and Spark, while working seamlessly with the data security policies created by IT.