Architecture Pitfalls: Don't let your persistence layer bleed into your presentation layer
Often, when developers start building a new Java web application they begin by defining the set of JPA entities that form their persistence layer (e.g. using an ORM like Hibernate). These entities are then returned in response to various queries, and passed through to service layer, where they ultimately reach the presentation layer. When you add something new to your JPA model, it automatically appears in the results return to the client — this seems an appealing feature to many, at first blush.
The belief this is a good idea is exacerbated by many frameworks' examples using this problematic pattern out-of-the-box, because doing it correctly historically involved lots of boilerplate-y glue code that clutters your samples.
However, what appeared a good method to get your new project rolling rapidly soon becomes a morass for several reasons:
By letting your data layer spill out to your clients, you are implicitly binding together your data model and your public API. They are now the same thing, and any time you need to change your data model, you are changing your public API — sometimes radically.
Ultimately, the specifics of the persistence layer implementation should be a detail that your consumers don't know about.
You strictly limit your ability to refactor your data model, as any changes will automatically be reflected in the payloads returned to your users. Now if you want to change a datatype, or find that your existing model is inefficient, you can't change it without rewriting your client to match.
This is API brittleness, and imposes significant work on downstream teams as they have to continually refactor their applications in response to data model changes that they should not care about.
It is critical to the long-term maintainability of any software that the persistence layer can be changed without rewriting the clients that depend on it.
Often, JPA data models contain a significant number of entity relationships, allowing you to access associated data conveniently.
By passing relationship-containing entities to your presentation layer, you are typically forcing the entire collection to be fetched so that it can be serialized into the response, irrespective of whether it is appropriate or necessary. I have seen cases where several megabytes of data were inadvertently returned in relationships that were not relevant to the specific response.
There are some workarounds that can keep this anti-pattern limping along for a bit longer, typically involving labelling certain fields with @JsonIgnore or nulling out the field before returning it. A number of other hacks exist which are variations on this theme, but whilst they solve the immediate data bloat issue, they are blunt instruments that affect calls elsewhere in your API which may not want to ignore the relationship — but will now have those objects 'disappear'.
Like beheading the proverbial hydra, solving one issue with a hack typically causes two more problems to pop up in its place.
In data models it is typically permitted to have cycles to model bidirectional relationships. However, many serialization frameworks really do not appreciate this, and you may end up with infinite loops as your JSON serializer chases its own tail until your program crashes.
You can mitigate by using @JsonIgnore or @JsonManagedReference / @JsonBackReference and similar serialization tricks. Indeed, some languages and frameworks support replacing "seen-before" objects with references. But, this happening is a glaring sign that your abstractions are leaky.
If making changes to the data model causes instability elsewhere in the application, your engineering team will go to extreme lengths to avoid refactoring — for fear of causing an unanticipated incident.
I call this refactoring anxiety. It causes development to slow down and technical debt to pile up as your engineering team avoids change at all costs.
Resistance to refactoring is antithetical to modern agile software design practices that encourage frequent, smaller refactoring, rather than infrequent 'big bang' changes that are more likely to fail.
Unexpected Data Modification
Transactional boundaries start becoming unclear, and actions automatically triggered by the presentation layer serializing your entities might require a transaction (or even start one). It also isn't good for performance to have mega-sized transactions, and you are often forced to have a transaction spanning from the presentation layer — even if you don't really want to start it there.
Furthermore, because you are operating directly on data model entities in many different contexts (e.g. business logic layer, presentation layer), it's very easy to accidentally trigger an action that your ORM will end up persisting. This is particularly true when you have cascade relationships. Whoops.
How to avoid this?
The moral of the story is: don't mess around with persistence entities unless you really mean to interact with your persistence layer in some manner.
I'll write another blog (and link here in future) for some advice and patterns on the variety of approaches for solving this; as always, there are several different ways, and it depends a bit upon which architecture you are using — there are lots of different vocabularies, and people often don't really agree on what they mean. But, I'd like to provide an abridged version here to get people started:
- Consider creating a representation that consumers of your persistence layer should see that is independent of your persistence model — i.e. a set of classes that precisely and stably represents what you want consumers of your persistence layer to consume.
- Consider using projections (special-purpose representations) instead of complex entity wrangling and logic in the client. There are several great tools for doing this, and I'll be covering this in more detail in the next blog, but jOOQ and BlazePersistence are two of my favourites. More on this in future blogs...
- Depending on your architecture these representations will have different names, but the concept is similar. Some people reflexively shout "ANTI-PATTERN" when certain terminology is mentioned (e.g. DTO) — but inherently we're talking about abstracting the design of your persistence layer from presentation (and other layers, in most architectures suited for larger applications). Consider it like an API contract that you really want to avoid changing.
- Reduce boilerplate and hand-coding of 'dumb glue code' by using mappers such as MapStruct or ModelMapper (my favourite is MapStruct as it uses code generation rather than reflection). In many cases, you just want to map stuff across in fairly simple patterns. These mapper frameworks allow you to do that.
- I suggest doing this from the start, as it's often difficult to retrofit due to consumers inadvertently depending on behaviour you never intended (e.g. additional data sneaking into responses to certain requests).
Vlad is very well known in the Java and JPA community. His book is a great guide to all things JPA and Hibernate, with many patterns that are applicable across different ORM stacks. If you are having performance problems with your Hibernate/JPA application, High-Performance Java Persistence is very helpful.
HPJP is fairly broad in its purview, covering the fundamentals of relational databases, and how to design your application to be sympathetic to the underlying technologies. It's also a good reference when kicking off new projects, helping instil best practices in your engineering and architecture teams. It's much easier to establish beneficial design patterns early on, rather than piling up technical debt that's difficult to undo later.
There's a fair bit of jOOQ content here, too, so if you are wanting a book that can cover a bit of everything, then HPJP can be a good option.