AWS Data Pipeline - Overview

What is AWS Data Pipeline?

AWS Data Pipeline is a web service that enables users to process and move data between a variety of AWS compute and storage services, as well as on-premises data sources, on a scheduled basis. This service provides a centralized management system for defining and executing data-driven workflows

Highlights

Supports cross-service data movement: Facilitates the transfer of data between different AWS services, including compute (e.g., Amazon EMR) and storage (e.g., Amazon S3) offerings
Handles on-premises data integration: Allows the inclusion of on-premises data sources within the defined workflows
Workflow automation: Enables the creation of scheduled data processing jobs, such as running periodic Amazon EMR analyses on Amazon S3 data and loading the results into a relational database
Flexible activity types: Supports a variety of activity types, including Amazon EMR jobs, SQL queries, and custom scripts, to accommodate diverse data processing needs.

Social

Features

- Daily replication of AmazonDynamoDB data to
- Hourly analysis of Amazon S3‐based log data
- You can find (and use) a variety of popular AWS
- Periodic replication of on-premise JDBC database