News

Kubeflow 1.0 Machine Learning Toolkit for Kubernetes Goes Live

The Kubeflow community this week announced the first major release of its open-source machine learning (ML) toolkit for Kubernetes.

With Kubeflow 1.0, the maintainers of the project are "graduating" a core set of stable applications needed to develop, build, train and deploy ML models on Kubernetes efficiently.

The list of graduating apps in this release includes:

  • Kubeflow's UI, the central dashboard providing quick access to the components deployed in a Kubeflow cluster;
  • The Jupyter notebook controller, which allows users to create a custom resource Notebook (shared document with live code, equations, visualizations, and narrative text);
  • TensorFlow Operator (TFJob), a Kubernetes customer resource for running TensorFlow training jobs on Kubernetes;
  • PyTorch Operator, for distributed training;
  • kfctl, the Kubeflow command-line interface (CLI) that's used to install and configure Kubeflow for deployment and upgrades;
  • Profile controller and UI for multiuser management.

"Kubeflow's goal is to make it easy for machine learning (ML) engineers and data scientists to leverage cloud assets (public or on-premise) for ML workloads," said Thea Lamkin, Google's open source strategist for AI/ML, in a blog post. "You can use Kubeflow on any Kubernetes-conformant cluster."

In Kubeflow Community User Survey, the results of which were published last December, the ability to use Jupyter notebooks emerged as a popular feature request among data scientists and ML engineers.

"With Kubeflow 1.0, users can use Jupyter to develop models," Lamkin said. "They can then use Kubeflow tools like fairing (Kubeflow's python SDK) to build containers and create Kubernetes resources to train their models. Once they have a model, they can use KFServing to create and deploy a server for inference."

Distributed training was another popular feature request. Kubeflow 1.0 provides Kubernetes custom resources that make distributed training with TensorFlow and PyTorch simple.

Since it was open sourced at Kubecon USA in 2017, the Kubeflow Project has grown "beyond our wildest expectations," Lamkin said, with the support of hundreds of contributors and 30 participating organizations, including Microsoft, Google, IBM, Cisco, Intel, and LinkedIn, among others.

The project evolved from an effort to open source the way Google ran its TensorFlow ML library internally, based on a pipeline called TensorFlow Extended. "It began as just a simpler way to run TensorFlow jobs on Kubernetes," the website explains, "but has since expanded to be a multi-architecture, multi-cloud framework for running entire machine learning pipelines."

"Ultimately, we want to have a set of simple manifests that give you an easy to use ML stack anywhere Kubernetes is already running, and that can self-configure based on the cluster it deploys into," the site states.

"The Kubeflow 1.0 release is a significant milestone, as it positions Kubeflow to be a viable ML Enterprise platform," said Jeff Fogarty, data science engineer at U.S. Bank. "Kubeflow 1.0 delivers material productivity enhancements for ML researchers."

The community has several more applications under development, which are planned for point updates of Kubeflow 1.0, including:

  • Pipelines (beta) for defining complex ML workflows
  • Metadata (beta) for tracking datasets, jobs, and models,
  • Katib (beta) for hyper-parameter tuning
  • Distributed operators for other frameworks like xgboost

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].