Huntington Bank: Redacting sensitive data from 400M+ documents with AWS

Huntington Bank: Redacting sensitive data from 400M+ documents with AWS

Huntington Bank: Redacting sensitive data from 400M+ documents with AWS

https://aws.amazon.com/blogs/machine-learning/huntington-bank-redacting-sensitive-data-from-400m-documents-with-aws/

Publish Date: 2026-06-24 14:24:00

Source Domain: aws.amazon.com

  • Large Scale Document Processing Initiative: Huntington National Bank, faced with a repository of over 400 million documents, aimed to redact sensitive customer data systematically and efficiently in a few months instead of years.

  • Technical Solution:

    • The initiative employed a scalable workflow combining multiple AWS services, including Amazon Textract, SageMaker, Step Functions, and Lambda to process and redact documents.
    • Critical compliance and security requirements included data encryption, stringent access controls, PCI DSS compliance, and replication to on-premises storage.
  • Data Transfer and Security: Documents were securely moved from on-premises storage to Amazon S3 using AWS DataSync, Direct Connect, S3, and KMS for encryption, addressing over 400 million encrypted documents in transit and at rest.

  • Scalable Detection and Processing:

    • Amazon Textract and orchestration through AWS Step Functions reduced manual reviews and improved accuracy while efficiently processing millions of documents daily, scaled via the Step Functions map state and CloudWatch monitoring.
    • Service quotas and throttling were managed to maintain high throughput.
  • Redaction and Verification:

    • Automated workflows orchestrated by AWS Step Functions enabled high-volume redaction of detected sensitive fields, ensuring accuracy above 95% with metadata updates for each document.
    • Redacted documents were then synchronized back to on-premises storage using AWS DataSync.
  • Conclusion and Future Plans:

    • Huntington successfully processed roughly 10 million documents per day, drastically reducing the timeline and cost, maintaining high compliance and security standards.
    • The framework will be leveraged for future high-volume redaction requirements.