Nice article but I would like to suggest a few more options.
- Glue is great to define the schema of your data in S3. but it's quite limited on what sort of processing you can apply to your data. Apache spark running on EMR or Lambda functions that consume from Kinesis are better options for this use case
- redshift is great but it requires an extra pipeline job to invest the data from S3. AWS Athena can be a great alternative to query data directly from S3
- for scheduling I would suggest something like Airflow
Nice article Amrut.
Also, thanks for the mention!
Thanks Saurabh! Means a lot coming from you👍
Nice article but I would like to suggest a few more options.
- Glue is great to define the schema of your data in S3. but it's quite limited on what sort of processing you can apply to your data. Apache spark running on EMR or Lambda functions that consume from Kinesis are better options for this use case
- redshift is great but it requires an extra pipeline job to invest the data from S3. AWS Athena can be a great alternative to query data directly from S3
- for scheduling I would suggest something like Airflow
Great options! 👍