Apache DolphinScheduler — это распределенная система планирования задач с открытым исходным кодом, призванная помочь пользователям автоматизировать планирование и управление сложными задачами. DolphinScheduler поддерживает несколько типов задач и может запускаться как в одноузловых, так и в кластерных средах. Ниже мы расскажем, как добиться автоматической упаковки и развертывания DolphinScheduler на одном узле/кластере.
Автоматизированная упаковка
Предварительные требования: Maven, JDK.
Чтобы выполнить извлечение и упаковку кода, выполните следующий сценарий оболочки. Упакованный вывод будет расположен по адресу/opt/action/dolphinscheduler/dolphinscheduler-dist/target/apache-dolphinscheduler-dev-SNAPSHOT-bin.tar.gz.
.
# Clone the repository and switch to the dev branch clone_init(){ sudo su - root <<EOF cd /opt/action git clone [email protected]:apache/dolphinscheduler.git cd Dolphinscheduler git fetch origin dev git checkout -b dev origin/dev EOF } # Build the project build(){ sudo su - root <<EOF cd /opt/action/Dolphinscheduler mvn -B clean install -Prelease -Dmaven.test.skip=true -Dcheckstyle.skip=true -Dmaven.javadoc.skip=true EOF }
Автономное развертывание
Требуемая среда
JDK, ZooKeeper, MySQL
Настройте необходимую среду: JDK, ZooKeeper и MySQL.
Инициализируйте среду ZooKeeper (рекомендуемая версия 3.8 или выше).
Скачать установочный пакет можно с официального сайта.
bash
sudo su - root <<EOF # Navigate to the /opt directory (you can choose your installation directory) cd /opt # Extract the ZooKeeper installation package tar -xvf apache-zookeeper-3.8.0-bin.tar.gz # Rename the extracted folder sudo mv apache-zookeeper-3.8.0-bin zookeeper # Navigate to the zookeeper directory cd zookeeper/ # Create a directory called zkData under /opt/zookeeper to store ZooKeeper data files mkdir zkData # Navigate to the conf folder cd conf/ # Copy the zoo_sample.cfg file and rename it to zoo.cfg, as ZooKeeper recognizes only zoo.cfg as the configuration file cp zoo_sample.cfg zoo.cfg # Modify the zoo.cfg configuration sed -i 's/\/tmp\/zookeeper/\/opt\/zookeeper\/conf/g' zoo.cfg # Stop any previous ZooKeeper services ps -ef | grep QuorumPeerMain | grep -v grep | awk '{print "kill -9 " $2}' | sh # Use 'vim zoo.cfg' to further edit the zoo.cfg configuration sh /opt/zookeeper/bin/zkServer.sh start EOF
Детали установки JDK и MySQL опущены для краткости.
Конфигурация инициализации
Инициализация файлов конфигурации
- Создайте каталоги для файлов инициализации. (В этом примере используется /opt/action/tool.)
mkdir -p /opt/action/tool mkdir -p /opt/Dsrelease
- Создайте файл инициализации с именем common.properties в /opt/action/tool.
Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # user data local directory path, please make sure the directory exists and have read write permissions data.basedir.path=/tmp/dolphinscheduler # resource storage type: HDFS, S3, NONE resource.storage.type=HDFS # resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended resource.upload.path=/dolphinscheduler # whether to startup kerberos hadoop.security.authentication.startup.state=false # java.security.krb5.conf path java.security.krb5.conf.path=/opt/krb5.conf # login user from keytab username [email protected] # login user from keytab path login.user.keytab.path=/opt/hdfs.headless.keytab # kerberos expire time, the unit is hour kerberos.expire.time=2 # resource view suffixs #resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js # if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path hdfs.root.user=root # if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir fs.defaultFS=file:/// aws.access.key.id=minioadmin aws.secret.access.key=minioadmin aws.region=us-east-1 aws.endpoint=http://localhost:9000 # resourcemanager port, the default value is 8088 if not specified resource.manager.httpaddress.port=8088 # if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx # if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace aws2 to actual resourcemanager hostname yarn.application.status.address=http://aws2:%s/ws/v1/cluster/apps/%s # job history status url when application number threshold is reached(default 10000, maybe it was set to 1000) yarn.job.history.status.address=http://aws2:19888/ws/v1/history/mapreduce/jobs/%s # datasource encryption enable datasource.encryption.enable=false # datasource encryption salt datasource.encryption.salt=!@#$%^&* # data quality option data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar #data-quality.error.output.path=/tmp/data-quality-error-data # Network IP gets priority, default inner outer # Whether hive SQL is executed in the same session support.hive.oneSession=false # use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; if set false, executing user is the deploy user and doesn't need sudo permissions sudo.enable=true # network interface preferred like eth0, default: empty #dolphin.scheduler.network.interface.preferred= # network IP gets priority, default: inner outer #dolphin.scheduler.network.priority.strategy=default # system env path #dolphinscheduler.env.path=dolphinscheduler_env.sh # development state development.state=false # rpc port alert.rpc.port=50052 # Url endpoint for zeppelin RESTful API zeppelin.rest.url=http://localhost:8080
- Создайте файлы инициализации с именем
core-site.xml
.
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://aws1</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>aws1:2181</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> </configuration>
- Создайте файлы инициализации
hdfs-site.xml
в /opt/action/tool.
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/opt/bigdata/hadoop/ha/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/opt/bigdata/hadoop/ha/dfs/data</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>aws2:50090</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>/opt/bigdata/hadoop/ha/dfs/secondary</value> </property> <property> <name>dfs.nameservices</name> <value>aws1</value> </property> <property> <name>dfs.ha.namenodes.aws1</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.aws1.nn1</name> <value>aws1:8020</value> </property> <property> <name>dfs.namenode.rpc-address.aws1.nn2</name> <value>aws2:8020</value> </property> <property> <name>dfs.namenode.http-address.aws1.nn1</name> <value>aws1:50070</value> </property> <property> <name>dfs.namenode.http-address.aws1.nn2</name> <value>aws2:50070</value> </property> <property> <property> <name>dfs.datanode.address</name> <value>aws1:50010</value> </property> <property> <name>dfs.datanode.ipc.address</name> <value>aws1:50020</value> </property> <property> <name>dfs.datanode.http.address</name> <value>aws1:50075</value> </property> <property> <name>dfs.datanode.https.address</name> <value>aws1:50475</value> </property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://aws1:8485;aws2:8485;aws3:8485/mycluster</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/bigdata/hadoop/ha/dfs/jn</value> </property> <property> <name>dfs.client.failover.proxy.provider.aws1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_dsa</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> </configuration>
- Загрузите файлы JAR инициализации
mysql-connector-java-8.0.16.jar
иojdbc8.jar
в /opt/action/tool.
Замените файлы инициализации
cd /opt/Dsrelease sudo rm -r $today/ echo "rm -r $today" cd /opt/release cp $packge_tar /opt/Dsrelease cd /opt/Dsrelease tar -zxvf $packge_tar mv $packge $today p_api_lib=/opt/Dsrelease/$today/api-server/libs/ p_master_lib=/opt/Dsrelease/$today/master-server/libs/ p_worker_lib=/opt/Dsrelease/$today/worker-server/libs/ p_alert_lib=/opt/Dsrelease/$today/alert-server/libs/ p_tools_lib=/opt/Dsrelease/$today/tools/libs/ p_st_lib=/opt/Dsrelease/$today/standalone-server/libs/ p_api_conf=/opt/Dsrelease/$today/api-server/conf/ p_master_conf=/opt/Dsrelease/$today/master-server/conf/ p_worker_conf=/opt/Dsrelease/$today/worker-server/conf/ p_alert_conf=/opt/Dsrelease/$today/alert-server/conf/ p_tools_conf=/opt/Dsrelease/$today/tools/conf/ p_st_conf=/opt/Dsrelease/$today/standalone-server/conf/ cp $p0 $p4 $p_api_lib cp $p0 $p4 $p_master_lib cp $p0 $p4 $p_worker_lib cp $p0 $p4 $p_alert_lib cp $p0 $p4 $p_tools_lib cp $p0 $p4 $p_st_lib echo "cp $p0 $p_api_lib" cp $p1 $p2 $p3 $p_api_conf cp $p1 $p2 $p3 $p_master_conf cp $p1 $p2 $p3 $p_worker_conf cp $p1 $p2 $p3 $p_alert_conf cp $p1 $p2 $p3 $p_tools_conf cp $p1 $p2 $p3 $p_st_conf echo "cp $p1 $p2 $p3 $p_api_conf" } define_param(){ packge_tar=apache-dolphinscheduler-dev-SNAPSHOT-bin.tar.gz packge=apache-dolphinscheduler-dev-SNAPSHOT-bin p0=/opt/action/tool/mysql-connector-java-8.0.16.jar p1=/opt/action/tool/common.properties p2=/opt/action/tool/core-site.xml p3=/opt/action/tool/hdfs-site.xml p4=/opt/action/tool/ojdbc8.jar today=`date +%m%d` }
Замените содержимое файла конфигурации
sed -i 's/spark2/spark/g' /opt/Dsrelease/$today/worker-server/conf/dolphinscheduler_env.sh cd /opt/Dsrelease/$today/bin/env/ sed -i '$a\export SPRING_PROFILES_ACTIVE=permission_shiro' dolphinscheduler_env.sh sed -i '$a\export DATABASE="mysql"' dolphinscheduler_env.sh sed -i '$a\export SPRING_DATASOURCE_DRIVER_CLASS_NAME="com.mysql.jdbc.Driver"' dolphinscheduler_env.sh #自定义修改mysql配置 sed -i '$a\export SPRING_DATASOURCE_URL="jdbc:mysql://ctyun6:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&allowMultiQueries=true&allowPublicKeyRetrieval=true"' dolphinscheduler_env.sh sed -i '$a\export SPRING_DATASOURCE_USERNAME="root"' dolphinscheduler_env.sh sed -i '$a\export SPRING_DATASOURCE_PASSWORD="root@123"' dolphinscheduler_env.sh echo "替换jdbc配置成功" #自定义修改zookeeper配置 sed -i '$a\export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}' dolphinscheduler_env.sh sed -i '$a\export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-ctyun6:2181}' dolphinscheduler_env.sh echo "替换zookeeper配置成功" sed -i 's/resource.storage.type=HDFS/resource.storage.type=NONE/' /opt/Dsrelease/$today/master-server/conf/common.properties sed -i 's/resource.storage.type=HDFS/resource.storage.type=NONE/' /opt/Dsrelease/$today/worker-server/conf/common.properties sed -i 's/resource.storage.type=HDFS/resource.storage.type=NONE/' /opt/Dsrelease/$today/alert-server/conf/common.properties sed -i 's/resource.storage.type=HDFS/resource.storage.type=NONE/' /opt/Dsrelease/$today/api-server/conf/common.properties sed -i 's/hdfs.root.user=root/resource.hdfs.root.user=root/' /opt/Dsrelease/$today/master-server/conf/common.properties sed -i 's/hdfs.root.user=root/resource.hdfs.root.user=root/' /opt/Dsrelease/$today/worker-server/conf/common.properties sed -i 's/hdfs.root.user=root/resource.hdfs.root.user=root/' /opt/Dsrelease/$today/alert-server/conf/common.properties sed -i 's/hdfs.root.user=root/resource.hdfs.root.user=root/' /opt/Dsrelease/$today/api-server/conf/common.properties sed -i 's/fs.defaultFS=file:/resource.fs.defaultFS=file:/' /opt/Dsrelease/$today/master-server/conf/common.properties sed -i 's/fs.defaultFS=file:/resource.fs.defaultFS=file:/' /opt/Dsrelease/$today/worker-server/conf/common.properties sed -i 's/fs.defaultFS=file:/resource.fs.defaultFS=file:/' /opt/Dsrelease/$today/alert-server/conf/common.properties sed -i 's/fs.defaultFS=file:/resource.fs.defaultFS=file:/' /opt/Dsrelease/$today/api-server/conf/common.properties sed -i '$a\resource.hdfs.fs.defaultFS=file:///' /opt/Dsrelease/$today/api-server/conf/common.properties echo "替换common.properties配置成功" # 替换master worker内存 api alert也可进行修改,具体根据当前服务器硬件配置而定,但要遵循Xms=Xmx=2Xmn的规律 cd /opt/Dsrelease/$today/ sed -i 's/Xms4g/Xms2g/g' worker-server/bin/start.sh sed -i 's/Xmx4g/Xmx2g/g' worker-server/bin/start.sh sed -i 's/Xmn2g/Xmn1g/g' worker-server/bin/start.sh sed -i 's/Xms4g/Xms2g/g' master-server/bin/start.sh sed -i 's/Xmx4g/Xmx2g/g' master-server/bin/start.sh sed -i 's/Xmn2g/Xmn1g/g' master-server/bin/start.sh echo "master worker内存修改完成" }
Удалить конфигурацию HDFS
echo "start to delete hdfs configuration" sudo rm /opt/Dsrelease/$today/api-server/conf/core-site.xml sudo rm /opt/Dsrelease/$today/api-server/conf/hdfs-site.xml sudo rm /opt/Dsrelease/$today/worker-server/conf/core-site.xml sudo rm /opt/Dsrelease/$today/worker-server/conf/hdfs-site.xml sudo rm /opt/Dsrelease/$today/master-server/conf/core-site.xml sudo rm /opt/Dsrelease/$today/master-server/conf/hdfs-site.xml sudo rm /opt/Dsrelease/$today/alert-server/conf/core-site.xml sudo rm /opt/Dsrelease/$today/alert-server/conf/hdfs-site.xml echo "stop deleting hdfs configuration" }
Инициализация MySQL
init_mysql(){ sql_path="/opt/Dsrelease/$today/tools/sql/sql/dolphinscheduler_mysql.sql" sourceCommand="source $sql_path" echo $sourceCommand echo " start source:" mysql -hlocalhost -uroot -proot@123 -D "dolphinscheduler" -e "$sourceCommand" echo "stop source:" }
Запустите службу DolphinScheduler.
stop_all_server(){ cd /opt/Dsrelease/$today ./bin/dolphinscheduler-daemon.sh stop api-server ./bin/dolphinscheduler-daemon.sh stop master-server ./bin/dolphinscheduler-daemon.sh stop worker-server ./bin/dolphinscheduler-daemon.sh stop alert-server ps -ef|grep api-server|grep -v grep|awk '{print "kill -9 " $2}' |sh ps -ef|grep master-server |grep -v grep|awk '{print "kill -9 " $2}' |sh ps -ef|grep worker-server |grep -v grep|awk '{print "kill -9 " $2}' |sh ps -ef|grep alert-server |grep -v grep|awk '{print "kill -9 " $2}' |sh } run_all_server(){ cd /opt/Dsrelease/$today ./bin/dolphinscheduler-daemon.sh start api-server ./bin/dolphinscheduler-daemon.sh start master-server ./bin/dolphinscheduler-daemon.sh start worker-server ./bin/dolphinscheduler-daemon.sh start alert-server }
Развертывание кластера
- Откройте внешние порты для MySQL и ZooKeeper.
- Разверните кластеры и начните
Скопируйте инициализированные папки на указанные серверы и запустите назначенные службы для завершения развертывания кластера. Обязательно включите общий экземпляр ZooKeeper и MySQL.
Присоединяйтесь к сообществу
Добро пожаловать, заполните эту форму и станьте докладчиком Apache DolphinScheduler: https://forms.gle/MS2fQDhG4n9Hh2yv5:)
Есть много способов принять участие и внести свой вклад в сообщество DolphinScheduler, в том числе:
Документы, перевод, Вопросы и ответы, тесты, коды, статьи,основные доклады,и т. д.
Мы предполагаем, что первый PR (документ, код), который будет способствовать, будет простым и его следует использовать для ознакомления с процессом подачи заявки и стилем сотрудничества с сообществом.
Итак, сообщество составило следующий список задач, подходящих для новичков: https://github.com/apache/dolphinscheduler/contribute
Список проблем, не связанных с новичками:
https://github.com/apache/dolphinscheduler/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22+
Как внести свой вклад:
Репозиторий кода GitHub: https://github.com/apache/dolphinscheduler
Официальный сайт:https://dolphinscheduler.apache.org/
Список рассылки:dev@[email protected]
Твиттер:@DolphinSchedule
YouTube:https://www.youtube.com/@apachedolphinscheduler
Slack:https://s.apache.org/dolphinscheduler-slack
Руководство для участников:https://dolphinscheduler.apache.org/en-us/community/index.html
Ваша звезда для проекта очень важна, не стесняйтесь зажечь звезду для Apache DolphinScheduler ❤️