The Repository Pattern Done Right

Photo by C M on Unsplash

The Repository Pattern Done Right

The power of immutable collections

💡
This is an old article that I wrote a few years ago in one of my many blogs and got lost somewhere until someone mentioned it on Github. Since it has been useful to others, I decided to republish it here and give it a small update. Enjoy!

The repository pattern is one of the most well-established patterns in Domain Driven Design. Its origins can be traced as early as when Object Oriented Programming was born.

Of course, as it happens with almost every pattern or tool, you can use it terribly the first time (or even the second, or the third one). The only way to improve upon that is good literature and seeing other, more appropriate, uses of the pattern/tool. Refining your use of tools and patterns this way is, with almost all certainty, the only way to grow as a developer. Years of experience don’t count much if you have been doing the same thing, the same way, over and over again.

This is why I implement and use repositories very differently now than the first time I started. This is probably because of the experience (both good and bad) that I’ve accumulated over the years. I’ve also read quite a lot on the topic and certainly, I’m not the only one that has experienced issues implementing repositories in my applications.

So, over the years, I’ve come to a definition of repositories, and is this one:

Repositories are a specific and immutable abstraction over a collection of domain objects.

~ Yours Truly

Let me tell you what I mean by that.

Warning: Active Record Users

Repositories tend to work with ORMs – even though is not a requirement, it’s very common practice. However, not any kind of ORM can be used for working with repositories. I think a word of warning is necessary for users of Active Record ORMs (I’m talking about you, Yii and Laravel users). I’ve read several blog posts (like this one, or this other one) that promise an implementation of repositories the Laravel Way™, which is not the repository pattern, but a poorly abstracted interface over Eloquent. Don’t get me wrong: Active Record ORMs are great at what they do (which is, in my opinion, to provide an nice API over records on a database) but unfortunately, they just don’t fit the requirements for the repository pattern. Don’t try to use Active Record ORMs for repositories, they just don’t fit the use case. If you are using Active Record, embrace the fact that you already coupled your data model to your persistence layer. If you won’t take my word for it, take Jeffrey Way’s.

Repositories are Abstractions

Just to continue with the thread, the main reason why Active Record ORMs don’t fit the repository pattern is because repositories are abstractions, and Active Record Data Models are not. When you create a data model in Laravel, for example, you are not fetching a pure data class, but a whole lot of other stuff related to persistence, like your database connections, mutators and all sorts of stuff. All that lives in your data model, and that renders it unusable for the level of abstraction required for the repository pattern.

To be fair with the Eloquent guys, this is true of Doctrine repositories also. If you are using doctrine repositories as they are, you are not abstracting anything away. You are coupled to Doctrine, which is in turn coupled to a relational database engine. That leaves you in the same place as using Eloquent (a bit better though, because your data model is a pure data class).

In the Symfony world, it’s common to see something like this:

<?php

class SomeController
{
    public function someMethod(Request $request): Response
    {
        // This repository is the doctrine's library one
        $repo = $this->getRepository(User::class);
        $users = $repo->findAll();
        return $this->json($users);
    }
}

If you do this, stop. You are not using a proper abstraction here. It’s true: the Doctrine repository is an abstraction over the EntityManager, QueryBuilder, Connection and a bunch of other stuff, but is a doctrine-specific abstraction. You need a domain-specific abstraction. One abstraction that is only yours, your own contract.

So what we should do then? We just define an interface:

<?php

class User
{
    // This is your data class
}

interface UserRepository
{
    /**
     * @return iterable|User[]
     */
    public function all(): iterable;

    public function add(User $user): void;

    public function remove(User $user): void;

    public function ofId(string $userId): ?User; 
}

This is a proper abstraction. Your User class is a class that just contains data. Your UserRepository interface is your contract. You can use the Doctrine repository behind it, but it won’t matter this time, because you will type hint the interface to all other classes using it. This way you effectively decouple yourself of any persistence library/engine and get an abstraction you can use all around your codebase.

Repositories are Specific

Note how the UserRepository we defined is model specific. A lot of people like to save work by creating a generic repository that becomes no more than a query abstraction over the persistence library used. Just don’t do this:

<?php

interface Repository
{
    /**
     * @return iterable|object[]
     */
    public function all(string $repositoryClass): iterable;
}

Remember one of the principles of DDD: clear language intent. One repository interface for each model conveys more meaning to that specific repository/model than a generic one. For example, only users can be filtered by email, not buildings.

Besides with one generic repository for everything, you won’t be able to type your concrete model classes to the return or argument types. It’s the longer route but is the most convenient and flexible.

UPDATE: I still maintain this is not a great idea, but the typing argument is probably not as valid as it used to be since the PHP tooling ecosystem has embraced generics annotations so well that this solution is workable from a type-system point of view. Just a friendly reminder that because you can do something, it does not mean that you should.

Repositories are Collections

I would say that the “Aha!” moment in repositories for me is when I realized that they are just an abstraction over a collection of objects. This blew my mind and gave me a new challenge; the challenge of implementing repositories as if they were an in-memory collection.

For starters, I dumped all methods like all(), allActiveUsers() or allActiveUsersOfThisMonth(). If you have read the two famous posts about taming repositories, first the one of Anne at Easybib and then the one of Benjamin Eberlei in response, you should know that methods like that in a repository can grow wild. Also, the specification pattern is great, but it is quite complex to implement well for this particular use case: we can do better and simpler than that.

Collections APIs have many distinctive features. You can slice collections, filter them, add or remove elements from them, as well as finding specific elements. But we don’t want a general collection API, remember? We want to implement a specific API for every model, so it conveys meaning.

So, our UserRepository interface could look this way:

<?php

interface UserRepository extends Countable, IteratorAggregate
{
    public function add(User $user): void;

    public function remove(User $user): void;

    public function ofId(string $userId): ?User;

    public function ofEmail(string $email): ?User;

    public function withActiveStatus(): self;

    public function registeredAfter(DateTimeInterface $date): self;

    public function registeredBefore(DateTimeInterface $date): self;

    public function getIterator(): Iterator;

    public function slice(int $start, int $size = 20): self;

    public function count(): int;
}

Pay special attention to the last three methods. These are the only methods that could potentially be in a Repository base interface, because all of them will be sliceable, countable and iterable.

<?php

interface Repository extends IteratorAggregate, Countable
{
    public function getIterator(): Iterator;

    public function slice(int $start, int $size = 20): self;

    public function count(): int;
}

So by doing this, all of your repositories will be sliceable (think pagination there), iterable and countable. The idea is that you apply the filtering methods (all the methods that return self) and then iterate to execute the internal query, just like an in-memory collection. You wouldn’t note the difference at all if an implementation is switched to another one.

This is good OOP. All the persistence details are completely hidden from us, the API is composable and fits our needs for a repository. It looks neat and using it is simple and easy to understand:

<?php

class SomeService
{
    public function __construct(UserRepository $users)
    {
        $this->users = $users;
    }

    public function someMethod()
    {
        $users = $this->users
            ->withActiveStatus()
            ->registeredBefore(new DateTime('now'))
            ->registeredAfter(new DateTime('-30days'));

        $count = $users->count();

        return $users;
    }
}

But here’s the question: how do we go about implementing an API like this? If you are a good observer, you might have realized that the filters return an instance of themselves, modifying the internal state of the repository. So in the next query, we will have the filters of the previous query applied, right?

Repositories are Immutable

Well, that could be right, if we really are modifying the internal state. But in reality, we are cloning the repository reference, preserving the original one not to affect subsequent queries accidentally. This is an implementation detail, but a very important one. If we change, let’s say, the state of the repository reference that lives inside our DI Container, then we are done: we cannot use that reference again. So the idea is to make it immutable.

Let me show you the final API, implemented in Doctrine ORM. I’m going to write some comments and doc blocks in the code explaining some things.

<?php
declare(strict_types=1);

namespace RepositoryExample\Common;

use Doctrine\ORM\EntityManagerInterface;
use Doctrine\ORM\QueryBuilder;
use Doctrine\ORM\Tools\Pagination\Paginator;
use Iterator;

/**
 * Class DoctrineORMRepository
 * 
 * This is a custom abstract Doctrine ORM repository. It is meant to be extended by
 * every Doctrine ORM repository existing in your project.
 * 
 * The main features and differences with the EntityRepository provided by Doctrine is
 * that this one implements our repository contract in an immutable way.
 * 
 */
abstract class DoctrineORMRepository implements Repository
{
    /**
     * This is Doctrine's Entity Manager. It's fine to expose it to the child class.
     * 
     * @var EntityManagerInterface
     */
    protected $manager;
    /**
     * We don't want to expose the query builder to child classes.
     * This is so we are sure the original reference is not modified.
     * 
     * We control the query builder state by providing clones with the `query`
     * method and by cloning it with the `filter` method.
     *
     * @var QueryBuilder
     */
    private $queryBuilder;

    /**
     * DoctrineORMRepository constructor.
     * @param EntityManagerInterface $manager
     * @param string $entityClass
     * @param string $alias
     */
    public function __construct(EntityManagerInterface $manager, string $entityClass, string $alias)
    {
        $this->manager = $manager;
        $this->queryBuilder = $this->manager->createQueryBuilder()
            ->select($alias)
            ->from($entityClass, $alias);
    }

    /**
     * @inheritDoc
     */
    public function getIterator(): Iterator
    {
        yield from new Paginator($this->queryBuilder->getQuery());
    }

    /**
     * @inheritDoc
     */
    public function slice(int $start, int $size = 20): Repository
    {
        return $this->filter(static function (QueryBuilder $qb) use ($start, $size) {
            $qb->setFirstResult($start)->setMaxResults($size);
        });
    }

    /**
     * @inheritDoc
     */
    public function count(): int
    {
        $paginator = new Paginator($this->queryBuilder->getQuery());
        return $paginator->count();
    }

    /**
     * Filters the repository using the query builder
     *
     * It clones it and returns a new instance with the modified
     * query builder, so the original reference is preserved.
     *
     * @param callable $filter
     * @return $this
     */
    protected function filter(callable $filter): self
    {
        $cloned = clone $this;
        $filter($cloned->queryBuilder);
        return $cloned;
    }

    /**
     * Returns a cloned instance of the query builder
     *
     * Use this to perform single result queries.
     *
     * @return QueryBuilder
     */
    protected function query(): QueryBuilder
    {
        return clone $this->queryBuilder;
    }

    /**
     * We allow cloning only from this scope.
     * Also, we clone the query builder always.
     */
    protected function __clone()
    {
        $this->queryBuilder = clone $this->queryBuilder;
    }
}

This API can be improved of course, but the main principle is the immutability of it. Note how we don’t expose the QueryBuilder. This is because it’s dangerous: an inexperienced developer could apply filters to it and mutate the original reference, causing a massive bug. Instead, we provide two convenience methods for child classes, filter and query. The first one takes a callable which in turn takes a cloned instance of the QueryBuilder as an argument. The second one just returns a cloned QueryBuilder so the child class can query anything.

Then, we use that API in our UserRepository and implement the remaining methods.

<?php
declare(strict_types=1);

namespace RepositoryExample\User;

use DateTime;
use Doctrine\DBAL\Types\Types;
use Doctrine\ORM\EntityManagerInterface;
use Doctrine\ORM\NonUniqueResultException;
use Doctrine\ORM\NoResultException;
use Doctrine\ORM\QueryBuilder;
use DomainException;
use RepositoryExample\Common\DoctrineORMRepository;

/**
 * Class DoctrineORMUserRepository
 * @package RepositoryExample\User
 */
final class DoctrineORMUserRepository extends DoctrineORMRepository implements UserRepository
{
    protected const ENTITY_CLASS = User::class;
    protected const ALIAS = 'user';

    /**
     * DoctrineORMUserRepository constructor.
     * @param EntityManagerInterface $manager
     */
    public function __construct(EntityManagerInterface $manager)
    {
        parent::__construct($manager, self::ENTITY_CLASS, self::ALIAS);
    }

    public function add(User $user): void
    {
        $this->manager->persist($user);
        // I usually implement flushing in a Command Bus middleware.
        // But you can flush immediately if you like.
    }

    public function remove(User $user): void
    {
        $this->manager->remove($user);
        // I usually implement flushing in a Command Bus middleware.
        // But you can flush immediately if you like.
    }

    public function ofId(string $id): ?User
    {
        $object = $this->manager->find(self::ENTITY_CLASS, $id);
        if ($object instanceof User) {
            return $object;
        }
        return null;
    }

    /**
     * @param string $email
     * @return User|null
     */
    public function ofEmail(string $email): ?User
    {
        try {
            $object = $this->query()
                ->where('user.email = :email')
                ->setParameter('email', $email)
                ->getQuery()->getSingleResult();
        } catch (NoResultException $e) {
            return null;
        } catch (NonUniqueResultException $e) {
            throw new DomainException('More than one result found');
        }
        return $object;
    }

    public function withActiveStatus(): UserRepository
    {
        return $this->filter(static function (QueryBuilder $qb) {
            $qb->where('user.active = true');
        });
    }

    public function registeredBefore(DateTime $time): UserRepository
    {
        return $this->filter(static function (QueryBuilder $qb) use ($time) {
            $qb->where('user.registeredAt < :before')
                ->setParameter(':before', $time, Types::DATETIME_MUTABLE);
        });
    }

    public function registeredAfter(DateTime $time): UserRepository
    {
        return $this->filter(static function (QueryBuilder $qb) use ($time) {
            $qb->where('user.registeredAt > :after')
                ->setParameter(':after', $time, Types::DATETIME_MUTABLE);
        });
    }
}

The result is nice to work with. I’ve taken this approach in several projects so far and it feels great. The method names convey meaning and work well. Creating different implementations like a Doctrine Mongo ODM, Filesystem or In-Memory it’s trivial. Implementors just need to take into account the immutability aspect of it, but that’s all really.

A Posteriori Advice and Warnings

If I'm writing this section now after three years, that is a good thing. First, it means that I still use this pattern all the time. Secondly, it means that I use it better than before. So, here are some nuances that I've come to consider over time.

Limit these APIs to what your Command Handlers need

I've come to exercise more reluctance to use my repository interfaces in every place in the codebase that I need to query for something. I think they are great for using them in my command handlers, because of the expressiveness and the richness of the API of my models. The whole point of them is to make state mutations as clear and close to the domain language as they can be.

However, for most read operations that will end up sending some data over the wire as JSON, I think using the repository pattern as presented here is overkill. The REST API is not a domain concern. It seems pointless to fetch primitives from a database, hydrate them into those rich domain objects (and pay the price for that in performance) to just discard all that and pass them through a serializer that will convert them into primitives again.

My preferred approach now for sending data over the wire as JSON representations is to have a single service that knows about the Entity Manager and does all that. Usually, that service turns off hydration and knows certain things like properties that should be not exposed, etc.

Maybe I do this because I've realized that querying is not a Domain concern if the results of that query are not going to be used in a business action. Again, returning records for a JSON API is not a business action, so I don't pass that through domain objects or constructs like my models and my repositories. This might be the reason why Matthias Noback considers query buses unnecessary, and I very much agree with his reasoning.

However, when you need to filter and query specific things for implementing a state mutation, is when you need all the power and expressiveness that a rich domain and repositories implemented this way give you.

Did you find this article valuable?

Support Matías Navarro-Carter by becoming a sponsor. Any amount is appreciated!