aws上传文件、删除文件、图像识别

目录

  • aws的上传、删除s3文件以及图像识别文字功能
    • 准备工作
      • 安装aws cli
      • 初始化配置AWS CLI
      • s3存储桶开通
      • 图像识别文字功能开通
      • aws的sdk
    • 上传文件
      • 方法一
      • 方法二
    • 删除文件
    • 图像识别文字
      • 识别发票、账单这种key,value的形式
      • 单纯的识别文字
aws的上传、删除s3文件以及图像识别文字功能准备工作安装aws cli根据自己的操作系统 , 下载相应的安装包安装 。安装过程很简单,在此不再赘述 。
在安装完成之后,运行以下两个命令来验证AWS CLI是否安装成功 。参考以下示例 , 在MacOS上打开Terminal程序 。如果是Windows系统,打开cmd 。
  • where aws / which aws 查看AWS CLI安装路径
  • aws --version 查看AWS CLI版本
【aws上传文件、删除文件、图像识别】zonghan@MacBook-Pro ~ % aws --versionaws-cli/2.0.30 Python/3.7.4 Darwin/21.6.0 botocore/2.0.0dev34zonghan@MacBook-Pro ~ % which aws/usr/local/bin/aws初始化配置AWS CLI在使用AWS CLI前,可使用aws configure命令,完成初始化配置 。
zonghan@MacBook-Pro ~ % aws configureAWS Access Key ID [None]: AKIA3GRZL6WIQEXAMPLEAWS Secret Access Key [None]: k+ci5r+hAcM3x61w1exampleDefault region name [None]: ap-east-1Default output format [None]: json
  • AWS Access Key ID 及AWS Secret Access Key可在AWS管理控制台获取,AWS CLI将会使用此信息作为用户名、密码连接AWS服务 。
    点击AWS管理控制台右上角的用户名 --> 选择Security Credentials

aws上传文件、删除文件、图像识别

文章插图
  • 点击Create New Access Key以创建一对Access Key ID 及Secret Access Key,并保存(且仅能在创建时保存)

aws上传文件、删除文件、图像识别

文章插图
  • Default region name,用以指定要连接的AWS 区域代码 。每个AWS区域对应的代码可通过 此链接查找 。
  • Default output format,用以指定命令行输出内容的格式,默认使用JSON作为所有输出的格式 。也可以使用以下任一格式:JSON(JavaScript Object Notation)YAML: 仅在 AWS CLI v2 版本中可用TextTable
更多详细的配置请看该文章
s3存储桶开通该电脑配置的认证用户在aws的s3上有权限访问一个s3的存储桶,这个一般都是管理员给你开通
图像识别文字功能开通该电脑配置的认证用户在aws的Amazon Textract的权限,这个一般都是管理员给你开通
aws的sdkimport boto3from botocore.exceptions import ClientError, BotoCoreError安装上述boto3的模块,一般会同时安装botocore模块
上传文件方法一使用upload_file方法来上传文件
import loggingimport boto3from botocore.exceptions import ClientErrorimport osdef upload_file(file_path, bucket, file_name=None):"""Upload a file to an S3 bucket:param file_name: File to upload:param bucket: Bucket to upload to:param object_name: S3 object name. If not specified then file_name is used:return: True if file was uploaded, else False"""# If S3 object_name was not specified, use file_nameif object_name is None:object_name = os.path.basename(file_name)# Upload the files3_client = boto3.client('s3')# s3 = boto3.resource('s3')try:response = s3_client.upload_file(file_path, bucket, file_name)# response = s3.Bucket(bucket).upload_file(file_name, object_name)except ClientError as e:logging.error(e)return Falsereturn True方法二使用PutObject来上传文件
import loggingimport osimport boto3from botocore.exceptions import ClientError, BotoCoreErrorfrom django.conf import settingsfrom celery import shared_tasklogger = logging.getLogger(__name__)def upload_file_to_aws(file_path, bucket, file_name=None):"""Upload a file to an S3 bucket:param file_path: File to upload:param file_name: S3 object name. If not specified then file_path is used:return: True if file was uploaded, else False"""# If S3 object_name was not specified, use file_nameif file_name is None:file_name = os.path.basename(file_path)# Upload the files3 = boto3.resource('s3')try:with open(file_path, 'rb') as f:data = https://www.huyubaike.com/biancheng/f.read()obj = s3.Object(bucket, file_name)obj.put(Body=data)except BotoCoreError as e:logger.info(e)return Falsereturn True删除文件def delete_aws_file(file_name, bucket):try:s3_client = boto3.client("s3")s3_client.delete_object(Bucket=bucket, Key=file_name)except Exception as e:logger.info(e)图像识别文字识别发票、账单这种key,value的形式def get_labels_and_values(result, field):if "LabelDetection" in field:key = field.get("LabelDetection")["Text"]value = https://www.huyubaike.com/biancheng/field.get("ValueDetection")["Text"]if key and value:if key.endswith(":"):key = key[:-1]result.append({key: value})def process_text_detection(bucket, document):try:client = boto3.client("textract", region_name="ap-south-1")response = client.analyze_expense(Document={"S3Object": {"Bucket": bucket, "Name": document}})except Exception as e:logger.info(e)raise "An unknown error occurred on the aws service"result = {}for expense_doc in response["ExpenseDocuments"]:for line_item_group in expense_doc["LineItemGroups"]:for line_items in line_item_group["LineItems"]:for expense_fields in line_items["LineItemExpenseFields"]:get_labels_and_values(result, expense_fields)for summary_field in expense_doc["SummaryFields"]:get_labels_and_values(result, summary_field)return resultdef get_extract_info(bucket, document):return process_text_detection(bucket, document)

推荐阅读