jarias (6) [Avatar] Offline
#1
Hi!

I would like to know you point of view about a pattern I usually implement in my services.

I use DynamoDB heavily for my application, I have a large number of tables and lambdas writing, reading and reacting to streams from them. Usually the tables are shared among different services so I try to encapsulate common code to maintain my codebase DRY and repeatable.

I found myself calling CRUD operations on the same table from different places, so I decided to create a common data layer for my tables, for that I created a shared library encapsulating models and CRUD-like operations. Then every lambda that need access to the model just loads that module and operates with the database.

After viewing your lessons I started to think that perhaps it would be better to create a service for such data access and publish it as an internal API to ve invoked from the other services. I implement both REST and GraphQL APIs in my application, so reusing the same library for both technologies was very convenient (using an internal API would be 2 http calls).

Also I find DynamoDB native SDK very verbose, I come from a long path on the MEAN stack using MongoDB with mongoose and I feel more comfortable using an ODM, do you have a personal preference?

Thanks!
Yan Cui (68) [Avatar] Offline
#2
Hi ya,

Folks from the microservices world would shy away from sharing the same database across multiple "bounded contexts" (as in, an API with multiple Lambda functions is still one bounded context) as it creates implicit coupling between these bounded contexts via the shared schema.

Having a service in front of the shared data gives you protection, and allows the service to make changes without breaking all the clients using a number of means, e.g.
* move to another database - maybe DynamoDB is not the right fit anymore, and you wanna use RDS instead - without breaking existing clients by keeping the API surface consistent, but the service would read from RDS instead, and optionally fallback to DynamoDB for data missing in RDS if it's a lengthy migration process (imagine there are TBs of data in DynamoDB that needs to be moved without downtime to the service)
* change the schema of the DynamoDB table, adding additional attributes, or indices, or simply changing how you map "find me X" operation to DynamoDB request - maybe you decided to turn the table into a time series data of all historical values rather than storing only the snapshot, then a GET would need to be changed to a QUERY

It's impossible to do this type of migration without disruption via shared library, as it requires coordinated update and deployment to all functions that use the same library, and you have to take into account the time it takes for Lambda to swap in the new code as well as the fact that whilst you do that there are function invocations that are still executing!

Also, if all these separate APIs are all using the same DynamoDB table directly then you also have a single point of failure in the system. One of my previous employers for instance, endured a 6-hour long outage when the site's help page was making expensive queries against the shared SqlServer database which then took down the entire site...

There are of course trade-offs with this approach, as you mentioned already, it adds another HTTP call which adds latency. You can employ various caching strategies to help mitigate this additional latency overhead:
- in the caller's HTTP client
- in API Gateway layer for the internal API
- in the internal API's Lambda function (so it bypasses calling DynamoDB)
- in DynamoDB (enabling DAX)
but I'd say the most important thing is to understand what is your latency requirement? Is an extra Xms a fair trade for the extra flexibility, and (potentially) better resilience you get back?

I say potentially, as it requires extra engineering effort to get that better resilience - e.g. using short timeouts, circuit breakers and maybe fallback to default values, etc. I find Hystrix's wiki to be a good source for reading to see the patterns that Netflix has baked into the library: https://github.com/Netflix/Hystrix/wiki/How-it-Works
Michael Nygaard's book is also required reading in this space: https://www.amazon.co.uk/Release-Design-Deploy-Production-Ready-Software/dp/1680502395
Then there are also more advanced stuff for more specific problems like reducing the 99.9 percentile latencies when you have a large no. of inter-dependent services: https://static.googleusercontent.com/media/research.google.com/en//people/jeff/Berkeley-Latency-Mar2012.pdf

I hear what you're saying about DynamoDB's API being verbose.. even with the DocumentClient which is kinda their idea of a lightweight ORM.. However, what I find with MongoDB and Mongoose is that, whilst it makes developer's life really easy and maps a lot of common operations directly onto equivalent database operations (which is great for convenience) but it also makes you forget to "design" your system.
The moment you need to scale you run into terrible performance problems, which technically is not MongoDB's fault and many experts would tell you that you should know better than to do X because it's clearly not going to scale. But, MongoDB and Mongoose affords the wrong behaviours and make it easy to do the wrong thing - like those doors with handle bar on both sides but you can only push from one side, the handle bar on the opposite side affords the pulling action even though it's the wrong thing to do.
jarias (6) [Avatar] Offline
#3
Wow, thanks for the info.

I guess I got to get my thinking straight when it comes to serverless and start focusing on the big picture of the whole system. I usually tend to oversimplify and minimise the number of services I create in order to reduce latency/costs, but as you point out this additional layer of protection and good engineering could be key for software to scale up.

I will take a look at the references and try to implement this patter from now on.
Yan Cui (68) [Avatar] Offline
#4
No probs smilie

As with most things, it's a trade-off, I like to think of the different forces I have to deal with - performance, scalability, resilience and simplicity - as a portfolio of currencies, and budget myself with a minimum reserve for each (e.g. 99 percentile latency must be < 1s) and then try to look for trades that gives me a net gain so to speak - i.e. if I can trade off a little bit of performance (say, 50-100ms, but not enough to raise the 99 percentile beyond 1s) in exchange for a lot more resilience then it's a good trade-off, and at the same time, I can make another trade to exchange some simplicity for better performance, then all and all I'll end up better off than when I started.