4 Comments

Nice article Amrut.

Also, thanks for the mention!

Expand full comment
author

Thanks Saurabh! Means a lot coming from you👍

Expand full comment

Nice article but I would like to suggest a few more options.

- Glue is great to define the schema of your data in S3. but it's quite limited on what sort of processing you can apply to your data. Apache spark running on EMR or Lambda functions that consume from Kinesis are better options for this use case

- redshift is great but it requires an extra pipeline job to invest the data from S3. AWS Athena can be a great alternative to query data directly from S3

- for scheduling I would suggest something like Airflow

Expand full comment
author

Great options! 👍

Expand full comment