Analytics Blog

Category: Spark

All About Deep Tech: Model Operationalization

Model operationalization is a core component of effective data science, and is a key focus at Alpine Data. In previous blogs, I’ve written frequently about model ops, especially the support Chorus provides for exporting models using the PFA and PMML formats. However, what about scoring on data platforms that don’t yet provide PFA or PMML… Read more »


All About Deep Tech: Execution Frameworks

The “All About Deep Tech” blog series is about just what the title suggests: in-depth looks at the cool technology that makes Chorus run smoothly and how we leverage the latest and greatest in the big data industry to keep our customers ahead of the curve. If you missed our last post on AdaptiveML, be… Read more »


Shifting from Pandas to Spark DataFrames Pt 2

Welcome to the second part of this introduction to Spark DataFrames! Using Spark DataFrames If you successfully installed Spark you should be able to launch the Spark/Scala shell with the “spark-shell” command from any terminal window. A few things automatically happen when you launch this shell. This command defaults to local mode, which means that… Read more »


Shifting from Pandas to Spark Dataframes

Like most data scientists, I frequently use a lot of the same tools: Python, Pandas, scikit-learn, R, relational databases, Hadoop and so on. As part of Alpine Data’s Labs team I am frequently exposed to tools that other companies use – and every company has a different stack. This won’t come as a surprise, but… Read more »


Getting Better Performance with Pyspark

Holden Karau is the Principal Software Engineer for IBM’s Spark Technology Center. She is an expert in using Spark’s open source cluster computing system to make data fast to run and fast to write. Holden has co-authored two books on the subject, Learning Spark and Fast Data Processing with Spark, and is currently working on… Read more »