Just a few days ago, the Netflix developer team came with the announcement to open-source Metaflow, their proprietary Python-based library for data science projects. The Netflix developer team in their address mentioned that Metaflow for more than two years has been used by their company across a variety of data science projects for real-life use cases ranging from natural language processing to operations research.
Metaflow was built and nurtured by Netflix to enhance data scientist’s productivity while working on an extensive range of projects varying from classical statistics to deep learning. Providing a unified API above the infrastructure stack it helps every stage of the data science projects, starting from prototype to production. This is why it is well suited to say that Metaflow as a framework for data science projects helps data scientists more than machines.
Great data science projects always focus on making the value of the data science available for user contexts rather than focusing on engineering.
Metaflow can be used in conjunction with the top data science libraries like PyTorch, Tensorflow, or SciKit Learn. It helps writing the models easily with the typical idiomatic code of Python code that has no significant learning curve.
Metaflow also makes the design of the workflow and running it in a scalable manner. It helps maintain versions while keeping track of all the experiments and data with ease.
For scaling up more, Metaflow helps with inbuilt integrations to AWS cloud services for storage, computing, and machine learning and most notably, this requires no code changes.
The models in a data science project play just a small role. Most of the production savvy projects need to depend a lot on the robust infrastructure stack. With Metaflow, the Netflix team if data scientists and developers touch upon every different layer of an infrastructure stack.
From the data warehouse access to data is ensured. This data warehouse can be just a file folder, a full database or a big several petabyte storage data lake. The modeling code basically helps in executing the data in a compute environment with the help of a job scheduler who will take care of orchestrating the entire process.
After this, the developer team structures the code as an object hierarchy to help the execution of the code. These objective hierarchies can be like Python modules or packages. The Machine Learning model registers the code version and input data.
When the ML model is deployed to production, the questions faced pertinently by the developer’s concerns on how to keep the code perform reliably and how to track code performance.
For addressing these concerns, Metaflow as a framework offers a very comprehensive approach to manage the stack. Metaflow framework may have several prescriptions about the lower stack levels but for the actual data science at the top of the stack, it has very little to say. The best thing is, the dedicated developers can make use of this framework with a whole array of machine learning and data science libraries ranging from Tensorflow, PyTorch, or SciKit Learn.
The low learning curve is guaranteed by Metaflow as it allows writing models and business logic through python-based idioms. On the other hand, whenever it is required, Metaflow leverages the already existing infrastructure.
As a framework for building and deploying data science workflows that come loaded with an array of built-in features, Metaflow stands out. Let’s have a quick look at some of its capabilities.
Metaflow captures snapshots of the code, data, and dependencies automatically in a data store backed by S3 irrespective of the fact that it supports the local filesystem as well. This helps in starting workflows, recreate past results, and evaluate the workflow just in a notebook. This ensures enhancing productivity for data science professionals and developers.
Here are some of the real-life project scenarios in which Metaflow can help
You can go to the project home page at metaflow.org and access the code at github.com/Netflix/metaflow. Metaflow framework offers comprehensive documentation at docs.metaflow.org. You can avail tutorial to get a good start. The online presentations and tutorials can help you set your feet in data science projects with Metaflow.
Metaflow after serving its role inside Netflix is now available as an open-source data science project. One can only aspire to utilize it to the fullest extent for optimum flexibility and productivity in data science projects across different use cases. From now, your Metaflow based project will also make a contribution to the project as it is open-source now.
WRITTEN BY: Atman Rathod
Atman Rathod is the Founding Director at CMARIX Technolabs Pvt. Ltd., a leading web and mobile app development company with 17+ years of experience. Having…
FEW MORE POSTS BY Atman Rathod: