Tensorflow
Tensorflow
Kubeflow is a framework released by Google for deploying and managing tensorflow tasks in Kubernetes clusters. Its main features include:
JupyterHub service for managing Jupyter notebooks
Tensorflow Training Controller for managing training tasks
TF Serving container for model services
Deployment
Before deploying, ensure that:
A Kubernetes cluster or Minikube is set up, with the kubectl command-line tool configured
ksonnet version 0.8.0 or higher is installed
For Kubernetes clusters with RBAC enabled, first create a cluster role binding for admins:
kubectl create clusterrolebinding tf-admin --clusterrole=cluster-admin --serviceaccount=default:tf-job-operator
Then run the following commands to deploy:
ks init my-kubeflow
cd my-kubeflow
ks registry add kubeflow github.com/google/kubeflow/tree/master/kubeflow
ks pkg install kubeflow/core
ks pkg install kubeflow/tf-serving
ks pkg install kubeflow/tf-job
ks generate core kubeflow-core --name=kubeflow-core
ks apply default -c kubeflow-core
If you have multiple Kubernetes clusters, you can switch to another cluster to deploy, for example:
kubectl config use-context gke
ks env add gke
ks apply gke -c kubeflow-core
After a while, you can see the public IP of the tf-hub-lb
service, which is the access address for JupyterHub:
kubectl get svc tf-hub-lb
For clusters that do not support LoadBalancer Service, you can also access it through port forwarding (http://127.0.0.1:8100
):
kubectl port-forward tf-hub-0 8100:8000
By default, JupyterHub can be logged in with any username and password. After logging in, you can use custom images to start the Notebook Server, such as:
gcr.io/kubeflow/tensorflow-notebook-cpu
gcr.io/kubeflow/tensorflow-notebook-gpu
Training Example
Using CPU:
ks generate tf-cnn cnn --name=cnn
ks apply gke -c cnn
Using GPU:
ks param set cnn num_gpus 1
ks param set cnn num_workers 1
ks apply default -c cnn
Model Deployment
MODEL_COMPONENT=serveInception
MODEL_NAME=inception
MODEL_PATH=gs://cloud-ml-dev_jlewi/tmp/inception
ks generate tf-serving ${MODEL_COMPONENT} --name=${MODEL_NAME} --namespace=default --model_path=${MODEL_PATH}
ks apply gke -c ${MODEL_COMPONENT}
Reference Documents
Tensorflow
Kubeflow: Google's Kubernetes-Based Framework for Managing TensorFlow Tasks
Kubeflow, crafted by Google, is an exceptional tool for deploying and overseeing TensorFlow processes within Kubernetes environments. It boasts a suite of impressive features, such as:
JupyterHub services for the seamless running of Jupyter notebooks
A dedicated Tensorflow Training Controller for orchestrating training operations
A ready-to-serve TF Serving container aimed at model deployment
How to Deploy
Before ushering into the deployment phase, ensure the following prerequisites are met:
An operational Kubernetes cluster or Minikube, along with the adeptly configured kubectl CLI
Installation of ksonnet version 0.8.0 or higher is complete
In the case of Kubernetes clusters that are fortified with RBAC, kick off by assembling an admin-level cluster role binding:
kubectl create clusterrolebinding tf-admin --clusterrole=cluster-admin --serviceaccount=default:tf-job-operator
Subsequently, embark on the deployment journey with these commands:
ks init my-kubeflow
cd my-kubeflow
ks registry add kubeflow github.com/google/kubeflow/tree/master/kubeflow
ks pkg install kubeflow/core
ks pkg install kubeflow/tf-serving
ks pkg install kubeflow/tf-job
ks generate core kubeflow-core --name=kubeflow-core
ks apply default -c kubeflow-core
Got more than one Kubernetes cluster? No problem! Simply swap over to another and proceed with the deployment, take for instance:
kubectl config use-context gke
ks env add gke
ks apply gke -c kubeflow-core
Hang tight for a bit, and soon the tf-hub-lb
service's public IP surfaces, serving as your gateway to JupyterHub:
kubectl get svc tf-hub-lb
In scenarios where the LoadBalancer Service isn't in the cards, reach your destination via port forwarding (http://127.0.0.1:8100
):
kubectl port-forward tf-hub-0 8100:8000
JupyterHub's doors are open to any username and password by default. Once inside, spark up your Notebook Server using custom images like:
gcr.io/kubeflow/tensorflow-notebook-cpu
gcr.io/kubeflow/tensorflow-notebook-gpu
Training Showcase
Flexing CPU Muscles:
ks generate tf-cnn cnn --name=cnn
ks apply gke -c cnn
Tapping into GPU Power:
ks param set cnn num_gpus 1
ks param set cnn num_workers 1
ks apply default -c cnn
Model On the Move
MODEL_COMPONENT=serveInception
MODEL_NAME=inception
MODEL_PATH=gs://cloud-ml-dev_jlewi/tmp/inception
ks generate tf-serving ${MODEL_COMPONENT} --name=${MODEL_NAME} --namespace=default --model_path=${MODEL_PATH}
ks apply gke -c ${MODEL_COMPONENT}
Handy Guides
最后更新于