5GB보다 큰 파일에 대해 Amazon-S3 Etag를 계산하는 알고리즘은 무엇입니까?

program tip

5GB보다 큰 파일에 대해 Amazon-S3 Etag를 계산하는 알고리즘은 무엇입니까?

radiobox 2020. 11. 24. 07:50

5GB보다 큰 파일에 대해 Amazon-S3 Etag를 계산하는 알고리즘은 무엇입니까?

5GB보다 작은 Amazon S3에 업로드 된 파일에는 파일의 MD5 해시 인 ETag가 있으므로 로컬 파일이 S3에 입력 한 파일과 동일한 지 쉽게 확인할 수 있습니다.

그러나 파일이 5GB보다 크면 Amazon은 ETag를 다르게 계산합니다.

예를 들어, 5,970,150,664 바이트 파일을 380 개 부분으로 멀티 파트 업로드했습니다. 이제 S3는 ETag가 6bcf86bed8807b8e78f0fc6e0a53079d-380. 내 로컬 파일에는 .md5 해시가 702242d3703818ddefe6bf7da2bed757있습니다. 대시 뒤의 숫자는 멀티 파트 업로드의 파트 수라고 생각합니다.

또한 새로운 ETag (대시 앞)가 여전히 MD5 해시라고 생각하지만 멀티 파트 업로드 과정에서 일부 메타 데이터가 포함되어 있습니다.

Amazon S3와 동일한 알고리즘을 사용하여 ETag를 계산하는 방법을 아는 사람이 있습니까?

방금 확인했습니다. 짐작할 수있을 정도로 간단하게 만든 Amazon에 대한 모자.

14MB 파일을 업로드했고 파트 크기가 5MB라고 가정 해 보겠습니다. 각 부분에 해당하는 3 개의 MD5 체크섬, 즉 처음 5MB, 두 번째 5MB 및 마지막 4MB의 체크섬을 계산합니다. 그런 다음 연결의 체크섬을 가져옵니다. MD5 체크섬은 이진 데이터의 16 진 표현이므로 ASCII 또는 UTF-8로 인코딩 된 연결이 아닌 디코딩 된 이진 연결의 MD5를 선택해야합니다. 완료되면 하이픈과 부품 수를 추가하여 ETag를 가져옵니다.

다음은 콘솔에서 Mac OS X에서 수행하는 명령입니다.

$ dd bs=1m count=5 skip=0 if=someFile | md5 >>checksums.txt
5+0 records in
5+0 records out
5242880 bytes transferred in 0.019611 secs (267345449 bytes/sec)
$ dd bs=1m count=5 skip=5 if=someFile | md5 >>checksums.txt
5+0 records in
5+0 records out
5242880 bytes transferred in 0.019182 secs (273323380 bytes/sec)
$ dd bs=1m count=5 skip=10 if=someFile | md5 >>checksums.txt
2+1 records in
2+1 records out
2599812 bytes transferred in 0.011112 secs (233964895 bytes/sec)

이 시점에서 모든 체크섬은 checksums.txt. 그것들을 연결하고 16 진수를 디코딩하고 로트의 MD5 체크섬을 얻으려면 다음을 사용하십시오.

$ xxd -r -p checksums.txt | md5

그리고 이제 "-3"을 추가하여 ETag를 얻습니다. 세 부분이 있기 때문입니다.

md5Mac OS X에서는 체크섬 만 작성하지만 md5sumLinux에서는 파일 이름도 출력 한다는 점은 주목할 가치가 있습니다. 이를 제거해야하지만 체크섬 만 출력하는 몇 가지 옵션이 있다고 확신합니다. 공백에 대해 걱정할 필요가 없습니다 xxd.

참고 : aws-cli 를 통해 업로드 한 경우 aws s3 cp청크 크기가 8MB 일 가능성이 높습니다. 문서 에 따르면 이것이 기본값입니다.

업데이트 : https://github.com/Teachnova/s3md5 에서이 구현에 대해 들었 습니다. OS X에서는 작동하지 않습니다. 여기에 OS X 용 스크립트로 작성한 요점이 있습니다.

bash 구현

파이썬 구현

알고리즘은 말 그대로 (python 구현의 readme에서 복사) 다음과 같습니다.

md5 청크
md5 문자열을 함께 glob
glob을 바이너리로 변환
md5 globbed 청크 md5s의 바이너리
바이너리의 md5 문자열 끝에 "-Number_of_chunks"를 추가합니다.

동일한 알고리즘, 자바 버전 : (BaseEncoding, Hasher, Hashing 등은 guava 라이브러리 에서 제공됩니다.

/**
 * Generate checksum for object came from multipart upload</p>
 * </p>
 * AWS S3 spec: Entity tag that identifies the newly created object's data. Objects with different object data will have different entity tags. The entity tag is an opaque string. The entity tag may or may not be an MD5 digest of the object data. If the entity tag is not an MD5 digest of the object data, it will contain one or more nonhexadecimal characters and/or will consist of less than 32 or more than 32 hexadecimal digits.</p> 
 * Algorithm follows AWS S3 implementation: https://github.com/Teachnova/s3md5</p>
 */
private static String calculateChecksumForMultipartUpload(List<String> md5s) {      
    StringBuilder stringBuilder = new StringBuilder();
    for (String md5:md5s) {
        stringBuilder.append(md5);
    }

    String hex = stringBuilder.toString();
    byte raw[] = BaseEncoding.base16().decode(hex.toUpperCase());
    Hasher hasher = Hashing.md5().newHasher();
    hasher.putBytes(raw);
    String digest = hasher.hash().toString();

    return digest + "-" + md5s.size();
}

도움이 될 수 있는지 확실하지 않음 :

우리는 현재 못생긴 (그러나 지금까지 유용한)을 해킹하고있는 수정 하는 잘못된 ETag를 버킷에있는 파일을 변경 적용하는 구성 다중 업로드 된 파일에를; 이는 Amazon에서 실제 md5 서명과 일치하도록 ETag를 변경하는 md5 재 계산을 트리거합니다.

우리의 경우 :

파일 : bucket / Foo.mpg.gpg

ETag 획득 : "3f92dffef0a11d175e60fb8b958b4e6e-2"
수행 뭔가를 파일 (로 이름을 변경 다른 사람의 사이에서, 가짜 헤더 같은 메타 데이터를 추가)
얻은 Etag : "c1d903ca1bb6dc68778ef21e74cc15b0"

우리는 알고리즘을 모르지만 ETag를 "수정"할 수 있기 때문에 그것에 대해 걱정할 필요가 없습니다.

여기에 대한 답변을 바탕으로 다중 부분 및 단일 부분 파일 ETag를 모두 올바르게 계산하는 Python 구현을 작성했습니다.

def calculate_s3_etag(file_path, chunk_size=8 * 1024 * 1024):
    md5s = []

    with open(file_path, 'rb') as fp:
        while True:
            data = fp.read(chunk_size)
            if not data:
                break
            md5s.append(hashlib.md5(data))

    if len(md5s) == 1:
        return '"{}"'.format(md5s[0].hexdigest())

    digests = b''.join(m.digest() for m in md5s)
    digests_md5 = hashlib.md5(digests)
    return '"{}-{}"'.format(digests_md5.hexdigest(), len(md5s))

기본 chunk_size는 공식 aws cli도구에서 사용하는 8MB 이며 2 개 이상의 청크에 대해 멀티 파트 업로드를 수행합니다. Python 2와 3 모두에서 작동합니다.

위의 답변에서 누군가 5G보다 큰 파일에 대해 md5를 얻는 방법이 있는지 물었습니다.

MD5 값 (5G보다 큰 파일의 경우)을 얻기 위해 줄 수있는 대답은 메타 데이터에 수동으로 추가하거나 정보를 추가하는 업로드를 수행하는 프로그램을 사용하는 것입니다.

예를 들어 s3cmd를 사용하여 파일을 업로드하고 다음 메타 데이터를 추가했습니다.

$ aws s3api head-object --bucket xxxxxxx --key noarch/epel-release-6-8.noarch.rpm 
{
  "AcceptRanges": "bytes", 
  "ContentType": "binary/octet-stream", 
  "LastModified": "Sat, 19 Sep 2015 03:27:25 GMT", 
  "ContentLength": 14540, 
  "ETag": "\"2cd0ae668a585a14e07c2ea4f264d79b\"", 
  "Metadata": {
    "s3cmd-attrs": "uid:502/gname:staff/uname:xxxxxx/gid:20/mode:33188/mtime:1352129496/atime:1441758431/md5:2cd0ae668a585a14e07c2ea4f264d79b/ctime:1441385182"
  }
}

ETag를 사용하는 직접적인 솔루션은 아니지만 원하는 메타 데이터 (MD5)를 액세스 할 수있는 방식으로 채우는 방법입니다. 누군가가 메타 데이터없이 파일을 업로드하면 여전히 실패합니다.

AWS 설명서에 따르면 ETag는 멀티 파트 업로드 또는 암호화 된 객체에 대한 MD5 해시가 아닙니다. http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html

PUT 객체, POST 객체 또는 복사 작업 또는 AWS Management Console을 통해 생성되고 SSE-S3 또는 일반 텍스트로 암호화 된 객체에는 객체 데이터의 MD5 다이제스트 인 ETag가 있습니다.

PUT 객체, POST 객체 또는 복사 작업 또는 AWS Management Console을 통해 생성되고 SSE-C 또는 SSE-KMS로 암호화 된 객체에는 객체 데이터의 MD5 다이제스트가 아닌 ETag가 있습니다.

멀티 파트 업로드 또는 파트 복사 작업에 의해 객체가 생성되는 경우 ETag는 암호화 방법에 관계없이 MD5 다이제스트가 아닙니다.

다음은 루비의 알고리즘입니다.

require 'digest'

# PART_SIZE should match the chosen part size of the multipart upload
# Set here as 10MB
PART_SIZE = 1024*1024*10 

class File
  def each_part(part_size = PART_SIZE)
    yield read(part_size) until eof?
  end
end

file = File.new('<path_to_file>')

hashes = []

file.each_part do |part|
  hashes << Digest::MD5.hexdigest(part)
end

multipart_hash = Digest::MD5.hexdigest([hashes.join].pack('H*'))
multipart_etag = "#{multipart_hash}-#{hashes.count}"

Ruby의 Shortest Hex2Bin 및 S3에 대한 멀티 파트 업로드 덕분에 ...

다음은 ETag를 계산하는 PHP 버전입니다.

function calculate_aws_etag($filename, $chunksize) {
    /*
    DESCRIPTION:
    - calculate Amazon AWS ETag used on the S3 service
    INPUT:
    - $filename : path to file to check
    - $chunksize : chunk size in Megabytes
    OUTPUT:
    - ETag (string)
    */
    $chunkbytes = $chunksize*1024*1024;
    if (filesize($filename) < $chunkbytes) {
        return md5_file($filename);
    } else {
        $md5s = array();
        $handle = fopen($filename, 'rb');
        if ($handle === false) {
            return false;
        }
        while (!feof($handle)) {
            $buffer = fread($handle, $chunkbytes);
            $md5s[] = md5($buffer);
            unset($buffer);
        }
        fclose($handle);

        $concat = '';
        foreach ($md5s as $indx => $md5) {
            $concat .= hex2bin($md5);
        }
        return md5($concat) .'-'. count($md5s);
    }
}

$etag = calculate_aws_etag('path/to/myfile.ext', 8);

그리고 여기에 예상되는 ETag에 대해 확인할 수있는 향상된 버전이 있습니다. 알지 못하는 경우 청크 크기를 추측 할 수도 있습니다!

function calculate_etag($filename, $chunksize, $expected = false) {
    /*
    DESCRIPTION:
    - calculate Amazon AWS ETag used on the S3 service
    INPUT:
    - $filename : path to file to check
    - $chunksize : chunk size in Megabytes
    - $expected : verify calculated etag against this specified etag and return true or false instead
        - if you make chunksize negative (eg. -8 instead of 8) the function will guess the chunksize by checking all possible sizes given the number of parts mentioned in $expected
    OUTPUT:
    - ETag (string)
    - or boolean true|false if $expected is set
    */
    if ($chunksize < 0) {
        $do_guess = true;
        $chunksize = 0 - $chunksize;
    } else {
        $do_guess = false;
    }

    $chunkbytes = $chunksize*1024*1024;
    $filesize = filesize($filename);
    if ($filesize < $chunkbytes && (!$expected || !preg_match("/^\\w{32}-\\w+$/", $expected))) {
        $return = md5_file($filename);
        if ($expected) {
            $expected = strtolower($expected);
            return ($expected === $return ? true : false);
        } else {
            return $return;
        }
    } else {
        $md5s = array();
        $handle = fopen($filename, 'rb');
        if ($handle === false) {
            return false;
        }
        while (!feof($handle)) {
            $buffer = fread($handle, $chunkbytes);
            $md5s[] = md5($buffer);
            unset($buffer);
        }
        fclose($handle);

        $concat = '';
        foreach ($md5s as $indx => $md5) {
            $concat .= hex2bin($md5);
        }
        $return = md5($concat) .'-'. count($md5s);
        if ($expected) {
            $expected = strtolower($expected);
            $matches = ($expected === $return ? true : false);
            if ($matches || $do_guess == false || strlen($expected) == 32) {
                return $matches;
            } else {
                // Guess the chunk size
                preg_match("/-(\\d+)$/", $expected, $match);
                $parts = $match[1];
                $min_chunk = ceil($filesize / $parts /1024/1024);
                $max_chunk =  floor($filesize / ($parts-1) /1024/1024);
                $found_match = false;
                for ($i = $min_chunk; $i <= $max_chunk; $i++) {
                    if (calculate_aws_etag($filename, $i) === $expected) {
                        $found_match = true;
                        break;
                    }
                }
                return $found_match;
            }
        } else {
            return $return;
        }
    }
}

짧은 대답은 각 부분의 128 비트 바이너리 md5 다이제스트를 가져 와서 문서로 연결하고 해당 문서를 해시한다는 것입니다. 이 답변에 제시된 알고리즘 은 정확합니다.

참고 : 하이픈이 포함 된 멀티 파트 ETAG 양식은 blob을 "터치"하면 하이픈이없는 양식으로 변경됩니다 (콘텐츠를 수정하지 않은 경우에도). 즉, 완료된 멀티 파트 업로드 객체 (PUT-COPY라고도 함)를 복사하거나 내부 복사를 수행하면 S3는 간단한 버전의 알고리즘으로 ETAG를 다시 계산합니다. 즉, 대상 객체에는 하이픈이없는 etag가 있습니다.

You've probably considered this already, but if your files are less than 5GB, and you already know their MD5s, and upload parallelization provides little to no benefit (e.g. you are streaming the upload from a slow network, or uploading from a slow disk), then you may also consider using a simple PUT instead of a multipart PUT, and pass your known Content-MD5 in your request headers -- amazon will fail the upload if they don't match. Keep in mind that you get charged for each UploadPart.

또한 일부 클라이언트에서는 PUT 작업의 입력에 대해 알려진 MD5를 전달하면 클라이언트가 전송 중에 MD5를 다시 계산하는 것을 방지 할 수 있습니다. 예를 들어 boto3 (python)에서는 client.put_object () 메서드 의 ContentMD5매개 변수를 사용합니다 . 매개 변수를 생략하고 이미 MD5를 알고있는 경우 클라이언트는 전송 전에 다시 계산하는주기를 낭비하게됩니다.

dd 및 xxd와 같은 외부 도우미를 사용하지 않고 iOS 및 macOS 용 솔루션이 있습니다. 방금 찾았으므로 그대로보고하고 나중에 개선 할 계획입니다. 현재로서는 Objective-C와 Swift 코드 모두에 의존합니다. 우선 Objective-C에서이 도우미 클래스를 만듭니다.

AWS3MD5Hash.h

#import <Foundation/Foundation.h>

NS_ASSUME_NONNULL_BEGIN

@interface AWS3MD5Hash : NSObject

- (NSData *)dataFromFile:(FILE *)theFile startingOnByte:(UInt64)startByte length:(UInt64)length filePath:(NSString *)path singlePartSize:(NSUInteger)partSizeInMb;

- (NSData *)dataFromBigData:(NSData *)theData startingOnByte:(UInt64)startByte length:(UInt64)length;

- (NSData *)dataFromHexString:(NSString *)sourceString;

@end

NS_ASSUME_NONNULL_END

AWS3MD5Hash.m

#import "AWS3MD5Hash.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define SIZE 256

@implementation AWS3MD5Hash


- (NSData *)dataFromFile:(FILE *)theFile startingOnByte:(UInt64)startByte length:(UInt64)length filePath:(NSString *)path singlePartSize:(NSUInteger)partSizeInMb {


   char *buffer = malloc(length);


   NSURL *fileURL = [NSURL fileURLWithPath:path];
   NSNumber *fileSizeValue = nil;
   NSError *fileSizeError = nil;
   [fileURL getResourceValue:&fileSizeValue
                           forKey:NSURLFileSizeKey
                            error:&fileSizeError];

   NSInteger __unused result = fseek(theFile,startByte,SEEK_SET);

   if (result != 0) {
      free(buffer);
      return nil;
   }

   NSInteger result2 = fread(buffer, length, 1, theFile);

   NSUInteger difference = fileSizeValue.integerValue - startByte;

   NSData *toReturn;

   if (result2 == 0) {
       toReturn = [NSData dataWithBytes:buffer length:difference];
    } else {
       toReturn = [NSData dataWithBytes:buffer length:result2 * length];
    }

     free(buffer);

     return toReturn;
 }

 - (NSData *)dataFromBigData:(NSData *)theData startingOnByte:  (UInt64)startByte length:(UInt64)length {

   NSUInteger fileSizeValue = theData.length;
   NSData *subData;

   if (startByte + length > fileSizeValue) {
        subData = [theData subdataWithRange:NSMakeRange(startByte, fileSizeValue - startByte)];
    } else {
       subData = [theData subdataWithRange:NSMakeRange(startByte, length)];
    }

        return subData;
    }

- (NSData *)dataFromHexString:(NSString *)string {
    string = [string lowercaseString];
    NSMutableData *data= [NSMutableData new];
    unsigned char whole_byte;
    char byte_chars[3] = {'\0','\0','\0'};
    NSInteger i = 0;
    NSInteger length = string.length;
    while (i < length-1) {
       char c = [string characterAtIndex:i++];
       if (c < '0' || (c > '9' && c < 'a') || c > 'f')
           continue;
       byte_chars[0] = c;
       byte_chars[1] = [string characterAtIndex:i++];
       whole_byte = strtol(byte_chars, NULL, 16);
       [data appendBytes:&whole_byte length:1];
    }

        return data;
}


@end

이제 일반 빠른 파일을 만듭니다.

AWS Extensions.swift

import UIKit
import CommonCrypto

extension URL {

func calculateAWSS3MD5Hash(_ numberOfParts: UInt64) -> String? {


    do {

        var fileSize: UInt64!
        var calculatedPartSize: UInt64!

        let attr:NSDictionary? = try FileManager.default.attributesOfItem(atPath: self.path) as NSDictionary
        if let _attr = attr {
            fileSize = _attr.fileSize();
            if numberOfParts != 0 {



                let partSize = Double(fileSize / numberOfParts)

                var partSizeInMegabytes = Double(partSize / (1024.0 * 1024.0))



                partSizeInMegabytes = ceil(partSizeInMegabytes)

                calculatedPartSize = UInt64(partSizeInMegabytes)

                if calculatedPartSize % 2 != 0 {
                    calculatedPartSize += 1
                }

                if numberOfParts == 2 || numberOfParts == 3 { // Very important when there are 2 or 3 parts, in the majority of times
                                                              // the calculatedPartSize is already 8. In the remaining cases we force it.
                    calculatedPartSize = 8
                }


                if mainLogToggling {
                    print("The calculated part size is \(calculatedPartSize!) Megabytes")
                }

            }

        }

        if numberOfParts == 0 {

            let string = self.memoryFriendlyMd5Hash()
            return string

        }




        let hasher = AWS3MD5Hash.init()
        let file = fopen(self.path, "r")
        defer { let result = fclose(file)}


        var index: UInt64 = 0
        var bigString: String! = ""
        var data: Data!

        while autoreleasepool(invoking: {

                if index == (numberOfParts-1) {
                    if mainLogToggling {
                        //print("Siamo all'ultima linea.")
                    }
                }

                data = hasher.data(from: file!, startingOnByte: index * calculatedPartSize * 1024 * 1024, length: calculatedPartSize * 1024 * 1024, filePath: self.path, singlePartSize: UInt(calculatedPartSize))

                bigString = bigString + MD5.get(data: data) + "\n"

                index += 1

                if index == numberOfParts {
                    return false
                }
                return true

        }) {}

        let final = MD5.get(data :hasher.data(fromHexString: bigString)) + "-\(numberOfParts)"

        return final

    } catch {

    }

    return nil
}

   func memoryFriendlyMd5Hash() -> String? {

    let bufferSize = 1024 * 1024

    do {
        // Open file for reading:
        let file = try FileHandle(forReadingFrom: self)
        defer {
            file.closeFile()
        }

        // Create and initialize MD5 context:
        var context = CC_MD5_CTX()
        CC_MD5_Init(&context)

        // Read up to `bufferSize` bytes, until EOF is reached, and update MD5 context:
        while autoreleasepool(invoking: {
            let data = file.readData(ofLength: bufferSize)
            if data.count > 0 {
                data.withUnsafeBytes {
                    _ = CC_MD5_Update(&context, $0, numericCast(data.count))
                }
                return true // Continue
            } else {
                return false // End of file
            }
        }) { }

        // Compute the MD5 digest:
        var digest = Data(count: Int(CC_MD5_DIGEST_LENGTH))
        digest.withUnsafeMutableBytes {
            _ = CC_MD5_Final($0, &context)
        }
        let hexDigest = digest.map { String(format: "%02hhx", $0) }.joined()
        return hexDigest

    } catch {
        print("Cannot open file:", error.localizedDescription)
        return nil
    }
}

struct MD5 {

    static func get(data: Data) -> String {
        var digest = [UInt8](repeating: 0, count: Int(CC_MD5_DIGEST_LENGTH))

        let _ = data.withUnsafeBytes { bytes in
            CC_MD5(bytes, CC_LONG(data.count), &digest)
        }
        var digestHex = ""
        for index in 0..<Int(CC_MD5_DIGEST_LENGTH) {
            digestHex += String(format: "%02x", digest[index])
        }

        return digestHex
    }
    // The following is a memory friendly version
    static func get2(data: Data) -> String {

    var currentIndex = 0
    let bufferSize = 1024 * 1024
    //var digest = [UInt8](repeating: 0, count: Int(CC_MD5_DIGEST_LENGTH))

    // Create and initialize MD5 context:
    var context = CC_MD5_CTX()
    CC_MD5_Init(&context)


    while autoreleasepool(invoking: {
        var subData: Data!
        if (currentIndex + bufferSize) < data.count {
            subData = data.subdata(in: Range.init(NSMakeRange(currentIndex, bufferSize))!)
            currentIndex = currentIndex + bufferSize
        } else {
            subData = data.subdata(in: Range.init(NSMakeRange(currentIndex, data.count - currentIndex))!)
            currentIndex = currentIndex + (data.count - currentIndex)
        }
        if subData.count > 0 {
            subData.withUnsafeBytes {
                _ = CC_MD5_Update(&context, $0, numericCast(subData.count))
            }
            return true
        } else {
            return false
        }

    }) { }

    // Compute the MD5 digest:
    var digest = Data(count: Int(CC_MD5_DIGEST_LENGTH))
    digest.withUnsafeMutableBytes {
        _ = CC_MD5_Final($0, &context)
    }

    var digestHex = ""
    for index in 0..<Int(CC_MD5_DIGEST_LENGTH) {
        digestHex += String(format: "%02x", digest[index])
    }

    return digestHex

}
}

이제 다음을 추가하십시오.

#import "AWS3MD5Hash.h"

Objective-C Bridging 헤더에 추가합니다. 이 설정에 문제가 없습니다.

사용 예

이 설정을 테스트하기 위해 AWS 연결 처리를 담당하는 객체 내에서 다음 메서드를 호출 할 수 있습니다.

func getMd5HashForFile() {


    let credentialProvider = AWSCognitoCredentialsProvider(regionType: AWSRegionType.USEast2, identityPoolId: "<INSERT_POOL_ID>")
    let configuration = AWSServiceConfiguration(region: AWSRegionType.APSoutheast2, credentialsProvider: credentialProvider)
    configuration?.timeoutIntervalForRequest = 3.0
    configuration?.timeoutIntervalForResource = 3.0

    AWSServiceManager.default().defaultServiceConfiguration = configuration

    AWSS3.register(with: configuration!, forKey: "defaultKey")
    let s3 = AWSS3.s3(forKey: "defaultKey")


    let headObjectRequest = AWSS3HeadObjectRequest()!
    headObjectRequest.bucket = "<NAME_OF_YOUR_BUCKET>"
    headObjectRequest.key = self.latestMapOnServer.key




    let _: AWSTask? = s3.headObject(headObjectRequest).continueOnSuccessWith { (awstask) -> Any? in

        let headObjectOutput: AWSS3HeadObjectOutput? = awstask.result

        var ETag = headObjectOutput?.eTag!
        // Here you should parse the returned Etag and extract the number of parts to provide to the helper function. Etags end with a "-" followed by the number of parts. If you don't see this format, then pass 0 as the number of parts.
        ETag = ETag!.replacingOccurrences(of: "\"", with: "")

        print("headObjectOutput.ETag \(ETag!)")

        let mapOnDiskUrl = self.getMapsDirectory().appendingPathComponent(self.latestMapOnDisk!)

        let hash = mapOnDiskUrl.calculateAWSS3MD5Hash(<Take the number of parts from the ETag returned by the server>)

        if hash == ETag {
            print("They are the same.")
        }

        print ("\(hash!)")

        return nil
    }



}

서버에서 반환 한 ETag의 ETag 끝에 "-"가 없으면 0을 전달하여 AWSS3MD5Hash를 계산합니다. 문제가 발생하면 의견을 말하십시오. 나는 신속한 유일한 해결책을 위해 노력하고 있으며, 완료되는 즉시이 답변을 업데이트 할 것입니다. 감사

node.js 구현-

const fs = require('fs');
const crypto = require('crypto');

const chunk = 1024 * 1024 * 5; // 5MB

const md5 = data => crypto.createHash('md5').update(data).digest('hex');

const getEtagOfFile = (filePath) => {
  const stream = fs.readFileSync(filePath);
  if (stream.length < chunk) {
    return md5(stream);
  }
  const md5Chunks = [];
  const chunksNumber = Math.ceil(stream.length / chunk);
  for (let i = 0; i < chunksNumber; i++) {
    const chunkStream = stream.slice(i * chunk, (i + 1) * chunk);
    md5Chunks.push(md5(chunkStream));
  }

  return `${md5(Buffer.from(md5Chunks.join(''), 'hex'))}-${chunksNumber}`;
};

아니,

지금까지는 로컬 파일의 일반 파일 ETag와 Multipart 파일 ETag 및 MD5를 일치시키는 솔루션이 없습니다.

참고 URL : https://stackoverflow.com/questions/12186993/what-is-the-algorithm-to-compute-the-amazon-s3-etag-for-a-file-larger-than-5gb

'program tip' 카테고리의 다른 글

문장에서 물음표와 콜론은 의미합니까? (0)	2020.11.24
Java 개발에 사용되는 프로그래밍 언어는 무엇입니까? (0)	2020.11.24
jquery, 지우기 / 비우기 tbody 요소의 모든 내용? (0)	2020.11.23
라이브러리 사용 방법 (0)	2020.11.23
Python 용 Postgres를 설치하는 중 오류 (psycopg2) (0)	2020.11.23

현재글5GB보다 큰 파일에 대해 Amazon-S3 Etag를 계산하는 알고리즘은 무엇입니까?

radiobox

5GB보다 큰 파일에 대해 Amazon-S3 Etag를 계산하는 알고리즘은 무엇입니까?

5GB보다 큰 파일에 대해 Amazon-S3 Etag를 계산하는 알고리즘은 무엇입니까?

'program tip' 카테고리의 다른 글

'program tip'의 다른글

티스토리툴바

5GB보다 큰 파일에 대해 Amazon-S3 Etag를 계산하는 알고리즘은 무엇입니까?

5GB보다 큰 파일에 대해 Amazon-S3 Etag를 계산하는 알고리즘은 무엇입니까?

'program tip' 카테고리의 다른 글

'program tip'의 다른글

관련글

티스토리툴바