Tensorflow

Kubeflow is a framework released by Google for deploying and managing tensorflow tasks in Kubernetes clusters. Its main features include:

JupyterHub service for managing Jupyter notebooks
Tensorflow Training Controller for managing training tasks
TF Serving container for model services

Deployment

Before deploying, ensure that:

A Kubernetes cluster or Minikube is set up, with the kubectl command-line tool configured
ksonnet version 0.8.0 or higher is installed

For Kubernetes clusters with RBAC enabled, first create a cluster role binding for admins:

kubectl create clusterrolebinding tf-admin --clusterrole=cluster-admin --serviceaccount=default:tf-job-operator

Then run the following commands to deploy:

ks init my-kubeflow
cd my-kubeflow
ks registry add kubeflow github.com/google/kubeflow/tree/master/kubeflow
ks pkg install kubeflow/core
ks pkg install kubeflow/tf-serving
ks pkg install kubeflow/tf-job
ks generate core kubeflow-core --name=kubeflow-core
ks apply default -c kubeflow-core

If you have multiple Kubernetes clusters, you can switch to another cluster to deploy, for example:

kubectl config use-context gke
ks env add gke
ks apply gke -c kubeflow-core

After a while, you can see the public IP of the tf-hub-lb service, which is the access address for JupyterHub:

kubectl get svc tf-hub-lb

For clusters that do not support LoadBalancer Service, you can also access it through port forwarding (http://127.0.0.1:8100):

kubectl port-forward tf-hub-0 8100:8000

By default, JupyterHub can be logged in with any username and password. After logging in, you can use custom images to start the Notebook Server, such as:

gcr.io/kubeflow/tensorflow-notebook-cpu
gcr.io/kubeflow/tensorflow-notebook-gpu

Training Example

Using CPU:

ks generate tf-cnn cnn --name=cnn
ks apply gke -c cnn

Using GPU:

ks param set cnn num_gpus 1
ks param set cnn num_workers 1
ks apply default -c cnn

Model Deployment

MODEL_COMPONENT=serveInception
MODEL_NAME=inception
MODEL_PATH=gs://cloud-ml-dev_jlewi/tmp/inception
ks generate tf-serving ${MODEL_COMPONENT} --name=${MODEL_NAME} --namespace=default --model_path=${MODEL_PATH}

ks apply gke -c ${MODEL_COMPONENT}

Reference Documents

Tensorflow

Kubeflow: Google's Kubernetes-Based Framework for Managing TensorFlow Tasks

Kubeflow, crafted by Google, is an exceptional tool for deploying and overseeing TensorFlow processes within Kubernetes environments. It boasts a suite of impressive features, such as:

JupyterHub services for the seamless running of Jupyter notebooks
A dedicated Tensorflow Training Controller for orchestrating training operations
A ready-to-serve TF Serving container aimed at model deployment

How to Deploy

Before ushering into the deployment phase, ensure the following prerequisites are met:

An operational Kubernetes cluster or Minikube, along with the adeptly configured kubectl CLI
Installation of ksonnet version 0.8.0 or higher is complete

In the case of Kubernetes clusters that are fortified with RBAC, kick off by assembling an admin-level cluster role binding:

kubectl create clusterrolebinding tf-admin --clusterrole=cluster-admin --serviceaccount=default:tf-job-operator

Subsequently, embark on the deployment journey with these commands:

ks init my-kubeflow
cd my-kubeflow
ks registry add kubeflow github.com/google/kubeflow/tree/master/kubeflow
ks pkg install kubeflow/core
ks pkg install kubeflow/tf-serving
ks pkg install kubeflow/tf-job
ks generate core kubeflow-core --name=kubeflow-core
ks apply default -c kubeflow-core

Got more than one Kubernetes cluster? No problem! Simply swap over to another and proceed with the deployment, take for instance:

kubectl config use-context gke
ks env add gke
ks apply gke -c kubeflow-core

Hang tight for a bit, and soon the tf-hub-lb service's public IP surfaces, serving as your gateway to JupyterHub:

kubectl get svc tf-hub-lb

In scenarios where the LoadBalancer Service isn't in the cards, reach your destination via port forwarding (http://127.0.0.1:8100):

kubectl port-forward tf-hub-0 8100:8000

JupyterHub's doors are open to any username and password by default. Once inside, spark up your Notebook Server using custom images like:

gcr.io/kubeflow/tensorflow-notebook-cpu
gcr.io/kubeflow/tensorflow-notebook-gpu

Training Showcase

Flexing CPU Muscles:

ks generate tf-cnn cnn --name=cnn
ks apply gke -c cnn

Tapping into GPU Power:

ks param set cnn num_gpus 1
ks param set cnn num_workers 1
ks apply default -c cnn

Model On the Move

MODEL_COMPONENT=serveInception
MODEL_NAME=inception
MODEL_PATH=gs://cloud-ml-dev_jlewi/tmp/inception
ks generate tf-serving ${MODEL_COMPONENT} --name=${MODEL_NAME} --namespace=default --model_path=${MODEL_PATH}

ks apply gke -c ${MODEL_COMPONENT}

Handy Guides

上一页Spark 下一页Serverless

最后更新于1年前