mlflow详细安装部署

1、安装docker# 安装工具sudo yum install -y yum-utils# 添加yum仓库配置sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.reposudo yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.rep# 更新yum缓存sudo yum makecache fast# 安装dockeryum install -y docker-ce docker-ce-cli containerd.io# 查看安装状态docker info# 执行以下命令新建配置国内源加速cat <<EOF > /etc/docker/daemon.json{"registry-mirrors": ["https://docker.mirrors.ustc.edu.cn","http://hub-mirror.c.163.com"],"max-concurrent-downloads": 10,"log-driver": "json-file","log-level": "warn","log-opts": {"max-size": "10m","max-file": "3"},"data-root": "/var/lib/docker"}EOF# 启动服务systemctl start docker# 设置开机自启systemctl enable docker# 查看状态systemctl status docker2、Docker安装minio# 拉取镜像docker pull minio/minio# 运行容器,如果9000端口被占用请修改docker run -d -p 9000:9000 --name minio \-e "MINIO_ACCESS_KEY=minio" \-e "MINIO_SECRET_KEY=minio123" \-v /opt/minio/data:/data \-v /opt/minio/config:/root/.minio \minio/minio server /data \--console-address ":9000" --address ":9090"3、访问minio界面

  • 地址:<安装节点ip>:9000
  • 用户名:minio
  • 密码:minio123
  • 创建Bucket:点击Create Bucket 输入名称 mlflow 并创建
4、安装Anaconda3【mlflow详细安装部署】# 拉取包wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.11-Linux-x86_64.sh# 安装命令,一路回车+yesbash Anaconda3-2021.11-Linux-x86_64.sh# 将conda添加至环境变量vim /etc/profole# 在文件底部添加,注意根据实际修改的anaconda安装路径export PATH=/root/anaconda3/bin:$PATH# 使环境变量生效source /etc/profile# 修改为清华源,否则创建环境会因网络情况缓慢或者失败conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forgeconda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/conda config --set show_channel_urls yes5、创建激活conda环境# 创建conda环境并安装python3.8,时间比较长请耐心等待conda create -n mlflow-1.11.0 python==3.8# 如果出现以下提示请耐心等待系统自动尝试下一个镜像源:Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.# 注意开启新终端,执行以下命令激活conda环境conda activate mlflow-1.11.06、安装所需依赖包# 依次执行安装mlfow tracking server python需要的依赖包pip install mlflow==1.11.0pip install mysqlclient==1.4.6pip install boto37、启动mlflow tracking server# 暴露出minio url以及需要的ID和KEY,因为mlflow tracking server在上传模型文件时需要export AWS_ACCESS_KEY_ID=minioexport AWS_SECRET_ACCESS_KEY=minio123export MLFLOW_S3_ENDPOINT_URL=http://localhost:9000# 在MySQL中创建库mlflowcreate database if not exists `mlflow`;# 启动mlflow server , 注意根据实际情况修改mysql信息mlflow server \--backend-store-uri mysql://<mysql用户名>:'<mysql密码>'@localhost/mlflow \--host 0.0.0.0 -p 5002 \--default-artifact-root s3://mlflow8、启动可能出现的问题# 问题一:TypeError: Descriptors cannot not be created directly.If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.If you cannot immediately regenerate your protos, some other possible workarounds are: 1. Downgrade the protobuf package to 3.20.x or lower. 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).# 原因是protobuf版本问题 , 解决方案是在当前canda环境先卸载再指定版本安装pip uninstall protobufpip install protobuf==3.19.0# 问题二:ImportError: libmysqlclient.so.20: cannot open shared object file: No such file or directory# 原因是在/usr/lib64/中没有libmysqlclient.so.20 , 解决方案是找到当前系统中libmysqlclient.so.20的路径,然后创建一个软连接到/usr/lib64/libmysqlclient.so.20[root@node1 ~]# find / -name "libmysqlclient.so.20"/usr/local/mysql/lib/libmysqlclient.so.20[root@node1 ~]# ln -s /usr/local/mysql/lib/libmysqlclient.so.20 /usr/lib64/libmysqlclient.so.20# 问题三:sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (2002, "Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)")# 原因是找不到tmp下的mysql.sock文件,解决方案是需要找到mysql.sock所在的目录,然后建立/tmp/mysql.sock软连接到该文件上[root@node1 ~]# find / -name "mysql.sock"/var/lib/mysql/mysql.sock[root@node1 ~]# ln -s /var/lib/mysql/mysql.sock /tmp/mysql.sock

推荐阅读