AWS Data Pipeline logo

AWS Data Pipeline

Moves data between AWS services and on-premises sources on a schedule.

Made by Amazon Web Services (AWS)

    What is AWS Data Pipeline?

    AWS Data Pipeline is a web service that enables users to process and move data between a variety of AWS compute and storage services, as well as on-premises data sources, on a scheduled basis. This service provides a centralized management system for defining and executing data-driven workflows

    Highlights

    • Supports cross-service data movement: Facilitates the transfer of data between different AWS services, including compute (e.g., Amazon EMR) and storage (e.g., Amazon S3) offerings
    • Handles on-premises data integration: Allows the inclusion of on-premises data sources within the defined workflows
    • Workflow automation: Enables the creation of scheduled data processing jobs, such as running periodic Amazon EMR analyses on Amazon S3 data and loading the results into a relational database
    • Flexible activity types: Supports a variety of activity types, including Amazon EMR jobs, SQL queries, and custom scripts, to accommodate diverse data processing needs.

    Social

    Features

      • Daily replication of AmazonDynamoDB data to

      • Hourly analysis of Amazon S3‐based log data

      • You can find (and use) a variety of popular AWS

      • Periodic replication of on-premise JDBC database