Kubernetes指南
Linux性能优化实战eBPF 核心技术与实战SDN指南个人博客
EN
EN
  • Overview
  • Introduction
    • Kubernetes Introduction
    • Kubernetes Concepts
    • Kubernetes 101
    • Kubernetes 201
    • Kubernetes Cluster
  • Concepts
    • Concepts
    • Architecture
    • Design Principles
    • Components
      • etcd
      • kube-apiserver
      • kube-scheduler
      • kube-controller-manager
      • kubelet
      • kube-proxy
      • kube-dns
      • Federation
      • kubeadm
      • hyperkube
      • kubectl
    • Objects
      • Autoscaling
      • ConfigMap
      • CronJob
      • CustomResourceDefinition
      • DaemonSet
      • Deployment
      • Ingress
      • Job
      • LocalVolume
      • Namespace
      • NetworkPolicy
      • Node
      • PersistentVolume
      • Pod
      • PodPreset
      • ReplicaSet
      • Resource Quota
      • Secret
      • SecurityContext
      • Service
      • ServiceAccount
      • StatefulSet
      • Volume
  • Setup
    • Setup Guidance
    • kubectl Install
    • Single Machine
    • Feature Gates
    • Best Practice
    • Version Support
    • Setup Cluster
      • kubeadm
      • kops
      • Kubespray
      • Azure
      • Windows
      • LinuxKit
      • kubeasz
    • Setup Addons
      • Addon-manager
      • DNS
      • Dashboard
      • Monitoring
      • Logging
      • Metrics
      • GPU
      • Cluster Autoscaler
      • ip-masq-agent
  • Extension
    • API Extension
      • Aggregation
      • CustomResourceDefinition
    • Access Control
      • Authentication
      • RBAC Authz
      • Admission
    • Scheduler Extension
    • Network Plugin
      • CNI
      • Flannel
      • Calico
      • Weave
      • Cilium
      • OVN
      • Contiv
      • SR-IOV
      • Romana
      • OpenContrail
      • Kuryr
    • Container Runtime
      • CRI-tools
      • Frakti
    • Storage Driver
      • CSI
      • FlexVolume
      • glusterfs
    • Network Policy
    • Ingress Controller
      • Ingress + Letsencrypt
      • minikube Ingress
      • Traefik Ingress
      • Keepalived-VIP
    • Cloud Provider
    • Device Plugin
  • Cloud Native Apps
    • Apps Management
      • Patterns
      • Rolling Update
      • Helm
      • Operator
      • Service Mesh
      • Linkerd
      • Linkerd2
    • Istio
      • Deploy
      • Traffic Management
      • Security
      • Policy
      • Metrics
      • Troubleshooting
      • Community
    • Devops
      • Draft
      • Jenkins X
      • Spinnaker
      • Kompose
      • Skaffold
      • Argo
      • Flux GitOps
  • Practices
    • Overview
    • Resource Management
    • Cluster HA
    • Workload HA
    • Debugging
    • Portmap
    • Portforward
    • User Management
    • GPU
    • HugePage
    • Security
    • Audit
    • Backup
    • Cert Rotation
    • Large Cluster
    • Big Data
      • Spark
      • Tensorflow
    • Serverless
  • Troubleshooting
    • Overview
    • Cluster Troubleshooting
    • Pod Troubleshooting
    • Network Troubleshooting
    • PV Troubleshooting
      • AzureDisk
      • AzureFile
    • Windows Troubleshooting
    • Cloud Platform Troubleshooting
      • Azure
    • Troubleshooting Tools
  • Community
    • Development Guide
    • Unit Test and Integration Test
    • Community Contribution
  • Appendix
    • Ecosystem
    • Learning Resources
    • Domestic Mirrors
    • How to Contribute
    • Reference Documents
由 GitBook 提供支持
在本页
  • Tensorflow
  • Deployment
  • Training Example
  • Model Deployment
  • Reference Documents
  • Tensorflow
  • Kubeflow: Google's Kubernetes-Based Framework for Managing TensorFlow Tasks
  • How to Deploy
  • Training Showcase
  • Model On the Move
  • Handy Guides
  1. Practices
  2. Big Data

Tensorflow

上一页Spark下一页Serverless

最后更新于1年前

Tensorflow

Kubeflow is a framework released by Google for deploying and managing tensorflow tasks in Kubernetes clusters. Its main features include:

  • JupyterHub service for managing Jupyter notebooks

  • Tensorflow Training Controller for managing training tasks

  • TF Serving container for model services

Deployment

Before deploying, ensure that:

  • A Kubernetes cluster or Minikube is set up, with the kubectl command-line tool configured

  • version 0.8.0 or higher is installed

For Kubernetes clusters with RBAC enabled, first create a cluster role binding for admins:

kubectl create clusterrolebinding tf-admin --clusterrole=cluster-admin --serviceaccount=default:tf-job-operator

Then run the following commands to deploy:

ks init my-kubeflow
cd my-kubeflow
ks registry add kubeflow github.com/google/kubeflow/tree/master/kubeflow
ks pkg install kubeflow/core
ks pkg install kubeflow/tf-serving
ks pkg install kubeflow/tf-job
ks generate core kubeflow-core --name=kubeflow-core
ks apply default -c kubeflow-core

If you have multiple Kubernetes clusters, you can switch to another cluster to deploy, for example:

kubectl config use-context gke
ks env add gke
ks apply gke -c kubeflow-core

After a while, you can see the public IP of the tf-hub-lb service, which is the access address for JupyterHub:

kubectl get svc tf-hub-lb

For clusters that do not support LoadBalancer Service, you can also access it through port forwarding (http://127.0.0.1:8100):

kubectl port-forward tf-hub-0 8100:8000

By default, JupyterHub can be logged in with any username and password. After logging in, you can use custom images to start the Notebook Server, such as:

  • gcr.io/kubeflow/tensorflow-notebook-cpu

  • gcr.io/kubeflow/tensorflow-notebook-gpu

Training Example

Using CPU:

ks generate tf-cnn cnn --name=cnn
ks apply gke -c cnn

Using GPU:

ks param set cnn num_gpus 1
ks param set cnn num_workers 1
ks apply default -c cnn

Model Deployment

MODEL_COMPONENT=serveInception
MODEL_NAME=inception
MODEL_PATH=gs://cloud-ml-dev_jlewi/tmp/inception
ks generate tf-serving ${MODEL_COMPONENT} --name=${MODEL_NAME} --namespace=default --model_path=${MODEL_PATH}

ks apply gke -c ${MODEL_COMPONENT}

Reference Documents


Tensorflow

Kubeflow: Google's Kubernetes-Based Framework for Managing TensorFlow Tasks

  • JupyterHub services for the seamless running of Jupyter notebooks

  • A dedicated Tensorflow Training Controller for orchestrating training operations

  • A ready-to-serve TF Serving container aimed at model deployment

How to Deploy

Before ushering into the deployment phase, ensure the following prerequisites are met:

  • An operational Kubernetes cluster or Minikube, along with the adeptly configured kubectl CLI

In the case of Kubernetes clusters that are fortified with RBAC, kick off by assembling an admin-level cluster role binding:

kubectl create clusterrolebinding tf-admin --clusterrole=cluster-admin --serviceaccount=default:tf-job-operator

Subsequently, embark on the deployment journey with these commands:

ks init my-kubeflow
cd my-kubeflow
ks registry add kubeflow github.com/google/kubeflow/tree/master/kubeflow
ks pkg install kubeflow/core
ks pkg install kubeflow/tf-serving
ks pkg install kubeflow/tf-job
ks generate core kubeflow-core --name=kubeflow-core
ks apply default -c kubeflow-core

Got more than one Kubernetes cluster? No problem! Simply swap over to another and proceed with the deployment, take for instance:

kubectl config use-context gke
ks env add gke
ks apply gke -c kubeflow-core

Hang tight for a bit, and soon the tf-hub-lb service's public IP surfaces, serving as your gateway to JupyterHub:

kubectl get svc tf-hub-lb

In scenarios where the LoadBalancer Service isn't in the cards, reach your destination via port forwarding (http://127.0.0.1:8100):

kubectl port-forward tf-hub-0 8100:8000

JupyterHub's doors are open to any username and password by default. Once inside, spark up your Notebook Server using custom images like:

  • gcr.io/kubeflow/tensorflow-notebook-cpu

  • gcr.io/kubeflow/tensorflow-notebook-gpu

Training Showcase

Flexing CPU Muscles:

ks generate tf-cnn cnn --name=cnn
ks apply gke -c cnn

Tapping into GPU Power:

ks param set cnn num_gpus 1
ks param set cnn num_workers 1
ks apply default -c cnn

Model On the Move

MODEL_COMPONENT=serveInception
MODEL_NAME=inception
MODEL_PATH=gs://cloud-ml-dev_jlewi/tmp/inception
ks generate tf-serving ${MODEL_COMPONENT} --name=${MODEL_NAME} --namespace=default --model_path=${MODEL_PATH}

ks apply gke -c ${MODEL_COMPONENT}

Handy Guides

, crafted by Google, is an exceptional tool for deploying and overseeing TensorFlow processes within Kubernetes environments. It boasts a suite of impressive features, such as:

Installation of version 0.8.0 or higher is complete

ksonnet
Introducing Kubeflow - A Composable, Portable, Scalable ML Stack Built for Kubernetes
https://github.com/google/kubeflow
Kubeflow
ksonnet
Introducing Kubeflow - A Composable, Portable, Scalable ML Stack Built for Kubernetes
https://github.com/google/kubeflow