Analytics Blog

Hackathon at Alpine Data Labs

At Alpine Data Labs, we recently completed our first Developer Hackathon! The purpose of that hackathon was to give developers time between releases to work on pet projects, fix annoyances, and try out prospective technologies or features that they were interested in. Developers decided what they wanted to work on so long as it had some remote connection to Alpine products.

Hackathon edited

We’d like to share with you some of the projects that our team chose to work on.

Improved features

Look for the following in the upcoming Alpine  Chorus release!

Operators with notes

Jason decided to create a note field on each operator for documenting what an operator does. Protip: hover over an operator to see the note as a tooltip! He also updated the Hadoop Join operator to generate unique names to avoid name collisions.

Alpine Chorus Improvements

Robbie concentrated on updates to Alpine Chorus, the open source project that Alpine Chorus maintains and uses as our front-end collaboration layer. The SQL editor now has SQL suggestions. Other file types such as Pig and Ruby now will have syntax highlighting. Markdown has been added as a recognized file type so Alpine Chorus is almost a wiki. Finally for Alpine Chorus, we’ve upgraded versions of some of the libraries used in Alpine Chorus.

Vertica JDBC Integration

John worked on the integrating Vertica into Alpine Chorus. In the next release, users will be able to explore data from Vertica and copy data between Vertica and supported Hadoop clusters.

Machine Learning on Spark

Sung tested various implementations of Elastic Net Logistic Regression and Linear Regression on Spark in preparation for adding a Spark Linear Regression operator to Alpine Chorus.

DB implemented L-BFGS, an optimizer used in regressions, and did significant performance testing on the algorithm. The L-BFGS optimizer has been contributed back to Spark MLLib and is now part of Spark 1.0.

If you don’t already have Alpine Chorus, sign up here to get started and see by yourself!

Exploration Work

Many of the projects worked on were personal projects with the intention to investigate new functionality or new technologies. While not all will be available in the product for our next product release, we hope to finish them off in the near future.

Workflow Domain Specific Language (DSL) in Scala

Marek created a prototype of an Alpine Chorus workflow Domain Specific Language (DSL) to allow easier command line or interactive processing for users. Thus a query could be as simple as “use mysource get data with features A and B from mytable limit 100” on either a Hadoop or database source.

Interactive Analysis

Chester integrated Marek’s DSL with Scala-js to prototype an interactive web interface. Thus DSL code can be entered in the browser, sent to Alpine  Chorus to start workflows and then display the results.

Database Performance Improvements

Michael did a prototype on index creation for database operator output which can result in significant performance improvements for subsequent operators.

Internal Improvements

Much of the Hackathon work will not be so obvious to our users. Developers also concentrated on improvements to our underlying development infrastructure.

Make Hadoop Installation at Easy with SBT

In addition to his Vertica work, John integrated several command line Hadoop installations into a simple command line interface. We can now quickly install and run multiple versions of Hadoop for easy use by developers and QA. We are currently in the process of making this part of our CI environment so that we can automatically test every flavor of Hadoop that we support every day.

Integration Test Improvement

After concentrating on database improvements, Michael switched gears and did a lot of additional work improving our automated testing framework.

Code Cleanup and Refactoring

Jenny decided to spend part of her time cleaning up code, reorganizing and removing dead code (particularly from our discontinued Eclipse client) as her Hackathon project. In addition, she also contributed date/time processing fixes to the Hadoop Pig Apache project, based on her work supporting Hadoop date/time data types within Alpine Chorus.

—–
The Alpine Hackathon was a great success, as can be seen by all the amazing projects listed above. This freedom allowed developers to do lots of exploratory work that is usually not possible under normal the normal release driven time schedules. We are excited to make this a regular tradition for our team!

 

Be sure to subscribe to this blog to receive alerts for new posts in this series. You can subscribe at the top right of this page or add this to your Feedly or RSS reader -> http://www.alpinenow.com/feed/.