In some of our routine customer engagements with the financial services industry, a question we repeatedly face is around the difficulty of integrating analytics platforms like Alpine with industry-standard data providers such as Thomson Reuters, Bloomberg, FactSet or Quandl. Often enough, this sparks a follow-up question on integrating the Alpine platform with third-party external APIs.
The ability to quickly and efficiently integrate external APIs with the Alpine platform is made possible through Alpine Extensions SDK, a Scala-based framework that enables data scientists and engineers to extend Alpine and create their own operators for use within an Alpine workflow. Such operators can be built to act as data sources, custom transformations, or as encapsulations of a proprietary machine learning or data mining algorithm. You can find more information and guidance on creating your own custom operator in the official Alpine documentation, available here.
In order to demonstrate the ease with which one can integrate third-party APIs, I chose to build an operator that connects with the Quandl API. Quandl offers a clean publicly-accessible API for financial and economic data that is well-designed and has some key features such as rate limits and access control that make it enterprise-worthy. You can find more details about the Quandl API here.
Since Quandl functions as a data aggregation platform, it organizes its data collections into “databases” representing a certain theme. Individual “datasets” within such Quandl databases are assigned names by either the external provider or by Quandl itself. For our purposes, it is enough to know that most data on the Quandl platform can be accessed using two pieces of information: the name of the “database”, and the name of the “dataset” within the database. Quandl’s Datatables API provides an advanced form of data access that is not limited to numeric data; this API route is also supported by our operator but we will ignore it in this post.
Finally, since Quandl practices strict access control via API key to impose rate limits and track dataset usage metrics, our operator makes it mandatory for users to specify an API key.
The Quandl operator acts as a source operator, similar to database or Hadoop source operators in that it does not accept input connections from other operators and passes data to other operators in the workflow. You can learn about building a source operator here.
Here’s an example of a workflow using the Quandl operator to download financial data:
This is what the input dialog of the operator looks like:
Upon execution, the Quandl operator opens a connection to the Quandl API server, fetches the requested data, determines the schema, parses the downloaded data as per the inferred schema, and saves it as a Spark-ready file for further downstream processing.
This is what the output of the operator looks like:
I spent a total of 4 hours writing Scala code to construct the Quandl operator. The Extensions SDK is packed with tutorials, documentation, and walk-through code samples that make it easy for Scala beginners to start building tools for their own needs.
In general, integrating external APIs with the Alpine platform is an exercise that can be accomplished efficiently and quickly. The Extensions SDK provides deep programmatic access to Alpine platform components, enabling the custom operators you write to be fully first-class citizens of the Alpine platform.
If you would like to learn some more about this, come meet us at Strata+Hadoop World in NYC at the end of this month. Our team will be available to answer any of your questions about the product in addition to a personalized demonstration of some of these features.