Kubernetes 从 v1.8 开始支持 原生的 Apache Spark 应用(需要 Spark 支持 Kubernetes,比如 v2.3),可以通过 spark-submit
命令直接提交 Kubernetes 任务。比如计算圆周率
bin/spark-submit \--deploy-mode cluster \--class org.apache.spark.examples.SparkPi \--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \--kubernetes-namespace default \--conf spark.executor.instances=5 \--conf spark.app.name=spark-pi \--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.4.0 \--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.4.0 \local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0.jar
或者使用 Python 版本
bin/spark-submit \--deploy-mode cluster \--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \--kubernetes-namespace <k8s-namespace> \--conf spark.executor.instances=5 \--conf spark.app.name=spark-pi \--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.4.0 \--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.4.0 \--jars local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0.jar \--py-files local:///opt/spark/examples/src/main/python/sort.py \local:///opt/spark/examples/src/main/python/pi.py 10
Kubernetes 示例 github 上提供了一个详细的 spark 部署方法,由于步骤复杂,这里简化一些部分让大家安装的时候不用去多设定一些东西。
一个 kubernetes 群集, 可参考 集群部署
kube-dns 正常运作
namespace-spark-cluster.yaml
apiVersion: v1kind: Namespacemetadata:name: "spark-cluster"labels:name: "spark-cluster"
$ kubectl create -f examples/staging/spark/namespace-spark-cluster.yaml
这边原文提到需要将 kubectl 的执行环境转到 spark-cluster, 这边为了方便我们不这样做, 而是将之后的佈署命名空间都加入 spark-cluster
建立一个 replication controller, 来运行 Spark Master 服务
kind: ReplicationControllerapiVersion: v1metadata:name: spark-master-controllernamespace: spark-clusterspec:replicas: 1selector:component: spark-mastertemplate:metadata:labels:component: spark-masterspec:containers:- name: spark-masterimage: gcr.io/google_containers/spark:1.5.2_v1command: ["/start-master"]ports:- containerPort: 7077- containerPort: 8080resources:requests:cpu: 100m
$ kubectl create -f spark-master-controller.yaml
创建 master 服务
spark-master-service.yaml
kind: ServiceapiVersion: v1metadata:name: spark-masternamespace: spark-clusterspec:ports:- port: 7077targetPort: 7077name: spark- port: 8080targetPort: 8080name: httpselector:component: spark-master
$ kubectl create -f spark-master-service.yaml
检查 Master 是否正常运行
$ kubectl get pod -n spark-clusterspark-master-controller-qtwm8 1/1 Running 0 6d
$ kubectl logs spark-master-controller-qtwm8 -n spark-cluster17/08/07 02:34:54 INFO Master: Registered signal handlers for [TERM, HUP, INT]17/08/07 02:34:54 INFO SecurityManager: Changing view acls to: root17/08/07 02:34:54 INFO SecurityManager: Changing modify acls to: root17/08/07 02:34:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)17/08/07 02:34:55 INFO Slf4jLogger: Slf4jLogger started17/08/07 02:34:55 INFO Remoting: Starting remoting17/08/07 02:34:55 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:7077]17/08/07 02:34:55 INFO Utils: Successfully started service 'sparkMaster' on port 7077.17/08/07 02:34:55 INFO Master: Starting Spark master at spark://spark-master:707717/08/07 02:34:55 INFO Master: Running Spark version 1.5.217/08/07 02:34:56 INFO Utils: Successfully started service 'MasterUI' on port 8080.17/08/07 02:34:56 INFO MasterWebUI: Started MasterWebUI at http://10.2.6.12:808017/08/07 02:34:56 INFO Utils: Successfully started service on port 6066.17/08/07 02:34:56 INFO StandaloneRestServer: Started REST server for submitting applications on port 606617/08/07 02:34:56 INFO Master: I have been elected leader! New state: ALIVE
若 master 已经被建立与运行, 我们可以透过 Spark 开发的 webUI 来察看我们 spark 的群集状况, 我们将佈署 specialized proxy
spark-ui-proxy-controller.yaml
kind: ReplicationControllerapiVersion: v1metadata:name: spark-ui-proxy-controllernamespace: spark-clusterspec:replicas: 1selector:component: spark-ui-proxytemplate:metadata:labels:component: spark-ui-proxyspec:containers:- name: spark-ui-proxyimage: elsonrodriguez/spark-ui-proxy:1.0ports:- containerPort: 80resources:requests:cpu: 100margs:- spark-master:8080livenessProbe:httpGet:path: /port: 80initialDelaySeconds: 120timeoutSeconds: 5
$ kubectl create -f spark-ui-proxy-controller.yaml
提供一个 service 做存取, 这边原文是使用 LoadBalancer type, 这边我们改成 NodePort, 如果你的 kubernetes 运行环境是在 cloud provider, 也可以参考原文作法
spark-ui-proxy-service.yaml
kind: ServiceapiVersion: v1metadata:name: spark-ui-proxynamespace: spark-clusterspec:ports:- port: 80targetPort: 80nodePort: 30080selector:component: spark-ui-proxytype: NodePort
$ kubectl create -f spark-ui-proxy-service.yaml
部署完后你可以利用 kubecrl proxy 来察看你的 Spark 群集状态
$ kubectl proxy --port=8001
可以透过 http://localhost:8001/api/v1/proxy/namespaces/spark-cluster/services/spark-master:8080/
察看, 若 kubectl 中断就无法这样观察了, 但我们再先前有设定 nodeport 所以也可以透过任意台 node 的端口 30080 去察看(例如 http://10.201.2.34:30080
)。
要先确定 Matser 是再运行的状态
spark-worker-controller.yaml
kind: ReplicationControllerapiVersion: v1metadata:name: spark-worker-controllernamespace: spark-clusterspec:replicas: 2selector:component: spark-workertemplate:metadata:labels:component: spark-workerspec:containers:- name: spark-workerimage: gcr.io/google_containers/spark:1.5.2_v1command: ["/start-worker"]ports:- containerPort: 8081resources:requests:cpu: 100m
$ kubectl create -f spark-worker-controller.yamlreplicationcontroller "spark-worker-controller" created
透过指令察看运行状况
$ kubectl get pod -n spark-clusterspark-master-controller-qtwm8 1/1 Running 0 6dspark-worker-controller-4rxrs 1/1 Running 0 6dspark-worker-controller-z6f21 1/1 Running 0 6dspark-ui-proxy-controller-d4br2 1/1 Running 4 6d
也可以透过上面建立的 WebUI 服务去察看
基本上到这边 Spark 的群集已经建立完成了
我们可以利用 Zeppelin UI 经由 web notebook 直接去执行我们的任务, 详情可以看 Zeppelin UI 与 Spark architecture
zeppelin-controller.yaml
kind: ReplicationControllerapiVersion: v1metadata:name: zeppelin-controllernamespace: spark-clusterspec:replicas: 1selector:component: zeppelintemplate:metadata:labels:component: zeppelinspec:containers:- name: zeppelinimage: gcr.io/google_containers/zeppelin:v0.5.6_v1ports:- containerPort: 8080resources:requests:cpu: 100m
$ kubectl create -f zeppelin-controller.yamlreplicationcontroller "zeppelin-controller" created
然后一样佈署 Service
zeppelin-service.yaml
kind: ServiceapiVersion: v1metadata:name: zeppelinnamespace: spark-clusterspec:ports:- port: 80targetPort: 8080nodePort: 30081selector:component: zeppelintype: NodePort
$ kubectl create -f zeppelin-service.yaml
可以看到我们把 NodePort 设再 30081, 一样可以透过任意台 node 的 30081 port 访问 zeppelin UI。
通过命令行访问 pyspark(记得把 pod 名字换成你自己的):
$ kubectl exec -it zeppelin-controller-8f14f -n spark-cluster pysparkPython 2.7.9 (default, Mar 1 2015, 12:57:24)[GCC 4.9.2] on linux2Type "help", "copyright", "credits" or "license" for more information.17/08/14 01:59:22 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.Welcome to____ __/ __/__ ___ _____/ /___\ \/ _ \/ _ `/ __/ '_//__ / .__/\_,_/_/ /_/\_\ version 1.5.2/_/Using Python version 2.7.9 (default, Mar 1 2015 12:57:24)SparkContext available as sc, HiveContext available as sqlContext.>>>
接着就能使用 Spark 的服务了, 如有错误欢迎更正。
zeppelin 的镜像非常大, 所以再 pull 时会花上一些时间, 而 size 大小的问题现在也正在解决中, 详情可参考 issue #17231
在 GKE 的平台上, kubectl post-forward
可能有些不稳定, 如果你看现 zeppelin 的状态为 Disconnected
,port-forward
可能已经失败你需要去重新启动它, 详情可参考 #12179