What is AWS Data Pipeline?
AWS Data Pipeline is a web service that enables users to process and move data between a variety of AWS compute and storage services, as well as on-premises data sources, on a scheduled basis. This service provides a centralized management system for defining and executing data-driven workflows
Highlights
- Supports cross-service data movement: Facilitates the transfer of data between different AWS services, including compute (e.g., Amazon EMR) and storage (e.g., Amazon S3) offerings
- Handles on-premises data integration: Allows the inclusion of on-premises data sources within the defined workflows
- Workflow automation: Enables the creation of scheduled data processing jobs, such as running periodic Amazon EMR analyses on Amazon S3 data and loading the results into a relational database
- Flexible activity types: Supports a variety of activity types, including Amazon EMR jobs, SQL queries, and custom scripts, to accommodate diverse data processing needs.
Features
Daily replication of AmazonDynamoDB data to
Hourly analysis of Amazon S3‐based log data
You can find (and use) a variety of popular AWS
Periodic replication of on-premise JDBC database