To Array is Human

To Array is Human

To be dogmatic should not.

I was just browsing LinkedIn this morning (because yeah, that's what I do on holidays apparently) and I got to this post, for which the author got a lot of undeserved criticism.

I tried to jump in defense of the author, mainly because he was trying to explain a pattern that is valid in a variety of circumstances, and people were trying to criticize it by pointing at all the circumstances where the pattern is not valid. Following the metaphor of tooling in software, this is analogous to say "Don't use a hammer, because you can't turn screws with it".

One user in particular went on to write a blog post about this "anti-pattern". So well, because I also like to write and I'm in the mood to rant a little (I had too much relaxation time these holidays already) I decided to compose a response to that blog post. If you want full context, you might want to read both the LinkedIn post and the blog in response.

Identifying the Real Issue

The blog post is extensive, but the main ideas are that, quote "using array as a type hint is (usually) an anti-pattern". Says the main problem with using arrays as type hints, quote "is that arrays are a very broad type in PHP. They represent multiple, disparate data structures", and because of this, when you type hint array to a return function, quote "you never know what you are going to get".

Before anything else, a Senior Developer must clarify concepts and point at exactly the root of disagreement. Otherwise, we are doing a disservice to our profession and the people who heed our advice. So in light of that, I must ask. Is anything of what he said about arrays wrong? Not at all. All those things I quoted there from him are facts of life. They are not wrong.

The problem though, is that the issue discussed was not about type-hinting as arrays in general but type-hinting arrays in a very specific context and set of circumstances.

I can almost see the reasoning behind the critic's mind: "Arrays are bad, we shouldn't use them. This uses arrays, therefore is wrong". But an experienced person would say "Okay, I know arrays are bad for these reasons. Given those reasons, is this particular use of array as type-hint bad? Why? What are the pros and cons?" But there was none of that. We'll talk about why that happens in the second section.

The author explains, correctly I must say, that arrays are bad because they are a bag of states that can hold pretty much anything. Now, that something can hold anything is not an issue per se, but rather could become an issue (and most certainly will) when you pass-around that array to multiple functions and calls. This is the real problem with arrays. That's when is hard to keep track of the keys that may or may not be in your array, and the mutations that could happen along the call stack. This is the reason why this becomes a debugging nightmare when something fails. DTOs are much more appropriate structures to be passed-around application layers.

However, utility functions like the one demonstrated in the LinkedIn post don't fall into this category. Let's look at one of the examples I like the most. This beautiful, simple, and elegant function to cut a string in two parts.

<?php

/**
 * Cuts a string in two by the first occurrence of substring.
 *
 * The substring is not included in the result
 *
 * @return array{0: string, 1: string, 2: bool}
 *
 * @psalm-pure
 */
function str_cut(string $string, string $substring): array
{
    $len = \strlen($substring);
    $i = \strpos($string, $substring);

    if (!\is_int($i)) {
        return [$string, '', false];
    }

    return [
        \substr($string, 0, $i),
        \substr($string, $i + $len),
        true,
    ];
}

$a = 'foo=bar';
$b = 'foo';

[$left, $right, $ok] = str_cut($a, '=');
// [0: 'foo', 1: 'bar', 2: true]

[$left, $right, $ok] = str_cut($b, '=');
// [0: 'foo', 1: '', 2: false]

The function's return argument is type-hinted as an array. But, does this use of array as a type-hint present the problems previously mentioned with arrays?

No, for two simple reasons:

  1. The function is pure. This means it has no side effects and given the same input, it will return the same output. In other words, the array is created inside the function, consistently. There is no possibility that this array can be in a state that we don't mean it to be.

  2. The array is destructured immediately (it's the purpose of this API), so it cannot be passed-around. So the array type hint (which is not a problem in this case because it is stable) does not propagate to the rest of the codebase. It is meant to be used destructured. If you have ever used React (const [state, setState] = useState()) then you understand this concept very well.

Some of the pros of this approach:

  1. You can name your variables however you like. This immediately tells other developers in the code what this value contains. So there is no ambiguity about the contents of the value.

  2. You can benefit from type checkers like Psalm or PHPStan. Even PHP Storm will help you with this syntax. So it's pretty hard to get wrong.

Some of the cons:

  1. If you don't have a modern IDE, you might need to look at the function documentation to be able to use it and see what you will get in return, otherwise, there is a chance you'll get it wrong. But if this is your case, I think you have bigger problems anyway.

Now, I'm not saying this is The Only Right Way™ a function like this could be implemented. You could use an associative array with named keys or even a DTO. If you want to do that, go ahead. If I were reviewing a PR with code like that, I would certainly not fail it. But I'd probably recommend against both of those approaches because they result in more verbose client code and have slightly worse (so small that it doesn't matter much) performance.

The Real Issue: Dogmatism

The real issue here is dogmatism. I define dogmatism as the failure to consider the context when applying a rule, a principle, or our own experience. In this case, there is a well-known rule in the PHP world, that comes from experience, that arrays are tricky data structures to work with, and if you litter your methods and functions with them and use them all around, you are most certainly going to have issues. I find it hard to find people nowadays who wouldn't prefer a good old typed DTO instead of an array as a data transfer mechanism between application layers.

The issue is when we take this knowledge, and make it an absolute rule with no context: "You shouldn't use arrays in your return methods". I believe this is the "tweet" mentality in action. In every programming debate, context is key. 99% of the answers to all engineering questions are "It depends". Most of our programming wars would be much more educational for the people watching them if we just consider the context instead of assuming the other is just plain wrong.

In this case, the context was completely ignored. People jumped in masses to attack someone for doing something perfectly fine, with little to zero drawbacks whatsoever. You wouldn't do it the way it was shown on the LinkedIn post? Good for you, but preference doesn't invalidate other approaches. If you want to say something is wrong or an "anti-pattern", you must provide evidence that applies to the particular use case: something that was completely missing from this discussion.

When you hear someone saying, "You should/shouldn't do this" the most important two questions you should ask are, first: "Why?" and once you have understood the reasons behind it, the second question is even more important: "Do the reasons apply in this particular case?".

I mean, that's pretty much all the wisdom you need to navigate the turbid waters of engineering discussions.

Did you find this article valuable?

Support Matías Navarro-Carter by becoming a sponsor. Any amount is appreciated!