Analytics Blog

Using Hive to Perform Advanced Analytics in Hadoop

Hadoop data warehouses have continued to gain popularity with solutions such as Hive, Impala and HAWQ now frequently deployed at customer sites. Access to these warehouses is typically tightly controlled using Ranger or Sentry — ensuring comprehensive data security. Due to the ease with which data can be governed in Hive, an increasing number of… Read more »

Deploying Machine Learning to the Cloud

While enterprises have traditionally deployed Hadoop clusters on their data centers, there is a growing number creating clusters in the cloud. Cloud providers such as AWS and GCP make it almost effortless to spin-up and tear-down Hadoop clusters on-demand and provide a cost-effective approach to on-demand big data systems. However, the current analytics solutions offered… Read more »

How to Use the YARN API to Determine Resources Available for Spark Application Submission (Part 1)

At Alpine we continue to deliver new enterprise analytic features within Chorus. With Chorus 6.1 we launched the ability to deliver sophisticated auto-tuning for Spark jobs. Chorus automatically determines the settings needed to launch a Spark Application by using information on the size of the data being analyzed, the analytical operations being used in the… Read more »