Analytics Blog

Category: Engineering

Using Hive to Perform Advanced Analytics in Hadoop

Hadoop data warehouses have continued to gain popularity with solutions such as Hive, Impala and HAWQ now frequently deployed at customer sites. Access to these warehouses is typically tightly controlled using Ranger or Sentry — ensuring comprehensive data security. Due to the ease with which data can be governed in Hive, an increasing number of… Read more »

How to Use the YARN API to Determine Resources Available for Spark Application Submission (Part 1)

At Alpine we continue to deliver new enterprise analytic features within Chorus. With Chorus 6.1 we launched the ability to deliver sophisticated auto-tuning for Spark jobs. Chorus automatically determines the settings needed to launch a Spark Application by using information on the size of the data being analyzed, the analytical operations being used in the… Read more »

An Introduction to PFA

With Chorus 6.1 we have introduced the support for PFA, the Portable Format for Analytics. Before we get into what PFA is let’s make some observations about the data science process. There are a few important questions we can ask about the process in general: 1.) What is our processing model? 2.) What are our… Read more »