Analytics Blog

Multinomial Logistic Regression With Apache Spark

We wanted to share with you a great tutorial that was given a few weeks ago by our very own DB Tsai, machine learning engineer at Alpine Data Labs, on our implementation of Multinomial Logistic Regression with Apache Spark.

Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcomes with some extension. In this presentation, you’ll learn about the basic idea of binary logistic regression step by step, and then extend to multinomial one. You’ll see how easy it is with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally the numbers of training data.

However, there is mathematical limitation on scaling vertically – the numbers of training features – while many recent applications from document classification and computational linguistics are of this type. You’ll wee how this problem can be addressed using L-BFGS optimizer instead of Newton optimizer.


If you don’t already have Alpine, sign up here to get started!

Be sure to subscribe to this blog to receive alerts for new posts in this series. You can subscribe at the top right of this page or add this to your Feedly or RSS reader ->