Morgatz (10) [Avatar] Offline
#1
DynamoDb works great for the purpose of this course and my personal project. The queries are well-defined for the frontend interface and I am able to design my schema based on the limited defined queries for each table.

However, now that I want an administration tool that can look up orders I want the ability to find orders by status, orders that belong to a certain user with the name "Tom", users in the city of New York, etc.

What is the suggestion for this admin service that needs more flexibility in querying data? I've read about using dynamodb streams to trigger a lambda function which pipes the data to firehose -> s3, then using Athena to query that data. Do you think that's a good solution? The SoR will always be the dynamodb tables belonging to the service that owns the data, but it seems reasonable to pipe data to s3 for better querying capabilities.

Or would it be better for this new admin service to have its own rdbms that subscribes to changes from the other services and stores the data as it sees fit for its purpose?


Thoughts?
Yan Cui (62) [Avatar] Offline
#2
I think both approach can work, but depends on your budget, the amount of data you're dealing with, etc.

The Athena route is more complicated to set up - a lot of resources to provision: firehose stream, CloudWatch log group, CloudWatch log stream, S3 bucket, AWS Glue crawler, Athena table, etc. etc. and there's a lot of things you can tinker with like file format and compression format etc. that can have a telling impact on performance. But it's also super scalable and flexible once you have it setup - I've done this at work myself for an admin tool.

The RDBMS approach is a lot simpler to setup, but is probably gonna cost you more, and have challenges down the road if we're talking about a 10s of TBs of data.
Morgatz (10) [Avatar] Offline
#3
Thanks for the response Yan,

After some thinking I've decided to push data to elasticsearch ad it gives administrators / CSR the ability to easily look up documents like orders, transactions ,etc. I'm relatively green with using elasticsearch, but it was pretty straightforward pushing data into it. I just need to fine-tune the indices.

For my setup I have a function in each my services subscribed to dynamodb streams of the root entity table, transforms the data as required, then fires it off to a SNS topic. I then have a separate data indexing service with a queue that subscribes to these SNS topics (orders, payment transactions, etc). A lambda function triggered by this queue creates the relevant documents in elasticsearch based on the queue item being processed.

So far this approach has worked for providing much more flexible querying on the data. There's also a data analyzing side of elasticsearch which I haven't got into which is an added bonus