Analytics Blog

Announcing Chorus 6.1

Last week we announced the availability of Chorus 6.1. With this latest release, we’ve continued to deliver new enterprise analytics features, including several marquee items such as enterprise data governance, Spark Autotuning, and support for a developing model interchange specification, PFA. Enterprise data governance: Chorus 6.1 introduces support for administrators to exert fine-grain control over… Read more »


Shifting from Pandas to Spark DataFrames Pt 2

Welcome to the second part of this introduction to Spark DataFrames! Using Spark DataFrames If you successfully installed Spark you should be able to launch the Spark/Scala shell with the “spark-shell” command from any terminal window. A few things automatically happen when you launch this shell. This command defaults to local mode, which means that… Read more »


Shifting from Pandas to Spark Dataframes

Like most data scientists, I frequently use a lot of the same tools: Python, Pandas, scikit-learn, R, relational databases, Hadoop and so on. As part of Alpine Data’s Labs team I am frequently exposed to tools that other companies use – and every company has a different stack. This won’t come as a surprise, but… Read more »


Machine learning: clustering and classification on the campaign trail

*this post originally appeared in Computerworld* Discovering and targeting micro-populations for politics and profit As the election season rampages on, we categorize voters into broad demographics — soccer moms, NASCAR dads, blacks, whites, ALICEs, yuppies — in an attempt to understand and discuss this complex, churning electorate. In doing so we’re tapping into something fundamental about… Read more »


The Need for Open Standards in Predictive Analytics

This week I had the opportunity to participate in a panel discussion at the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. The panel discussion was part of the “Special Session on Standards in Predictive Analytics In the Era of Big and Fast Data” organized by the DMG (Data Mining Group). The panel session… Read more »