Findes in Java Stream

Java Stream: Find Methods

4 min readJan 9, 2020

This is the second post in the Java stream series. Hooray! After reading the previous post that tackled matches (Java stream predicate methods), a friend of mine warned me to be careful about using Java streams:

“Streams are good, but it does not mean that you should plug-in streams in every line of code you have. There is some performance overhead to consider too, even with sequential streams. With parallel streams it is even trickier. Brian Goetz (the Java concurrency and multithreading God) gives a nice sum up”, A.Z.

It’s a fascinating question “to use or not to use Java stream?” and it definitely requires looking into in the future. That’s why I’ve decided to share A.Z.’s comment with you. However, my present goal is to introduce the stream’s features, and not to discuss the stream’s viability.

In this post we deal with two methods: findFirst() and findAny(). I call this group “findes”. It’s easy to confuse these two, let’s clear up their uses.

First, some basics:

Java Stream

Stream is a sequence of elements supporting sequential or parallel operations.

Sequential Stream: Any stream operation in Java, unless explicitly specified as parallel, is processed sequentially. The sequential stream works just like a for-each-loop using a single thread to process the pipeline.

Parallel Stream: Parallel streams divide the task into segments and run the operations in different threads, utilizing multiple cores of the computer, which can result in a substantial increase in performance. The primary motivation behind using parallel streams is to leverage parallel programming using stream processing, even if the whole program may not be parallelized.

Streams documentation tells us:

“Streams may or may not have a defined encounter order. Whether or not a stream has an encounter order depends on the source and the intermediate operations. Certain stream sources (such as List or Arrays) are intrinsically ordered, whereas others (such as HashSet) are not”.

These characteristics are very important for understanding “findes”, keep it in mind.

.findAny()

The first method in the pair — findAny(). It returns a Optional describing some element of the stream, or Optional.empty() if the stream is empty. So far, simple enough, right?

In a sequential pipeline, the operation will most likely return the first element, but there are no guarantees; findAny() can pick up any element in the stream. We don’t know for sure if a value that was returned once will be returned again. Like a roulette wheel; theoretically, the ball can land in the same pocket a few times, but practically, you can only hope.

Now it sounds a little bit curious, doesn’t it? Why does Java, a respectable programming language, have a method that doesn’t guarantee a result? These guarantees were dropped to optimize for performance; if there’s no attempt to retrieve a specific value, the implementation is simpler and therefore may be faster.

.findFirst()

But what if you really want to get the first element? Use findFirst(). Note that this method doesn’t accept any predicate. It just returns a Optional describes the first element in the stream.

Remember the preface, regarding Java stream order? Here it comes: When there is no encounter order, findFirst() returns a random element from the stream. But if the encounter order exists, it will always behave deterministically.

The good news is that the behavior of findFirst() the method doesn’t change in the parallel scenario as opposed to findAny()

It’s common to use “findes” methods after filtering, to find elements that survived the filter operation.

Let’s use good old Contact class for examples:

To find a contact by email in a given stream, you can use findFirst() or findAny(). The code should look like this:

This code is built on the implicit assumption that there can only be one contact with any given email.

Now, maybe this is a known invariant, protected by dedicated parts of the system (database uniqueness constraint, for instance). In that case, it’s a very reasonable assumption, and it’s totally fine to use “findes” expecting to get only a single result.

But, maybe the contacts were just loaded from an external source that makes no guarantees about the uniqueness of their emails. Or maybe the search term allows for a surprisingly unforeseen result of multiple matches. Therefore this code can contain the potential bug. Bug 😱!

I don’t want to deter you from using “findes”, there is nothing inherently wrong with findFirst() and findAny(). Just remember, that it’s easy to use them in a way that leads to bugs within the modeled domain logic. Unless we have this scenario in mind and enforce it with tests or other tools, we might simply overlook it until it manifests in production.

Conclusion

findAny() and findFirst() are terminal-short-circuiting operations of stream interface, that pick up any (mostly the first) element satisfying the intermediate operations. Both methods can be used in conjunction with other intermediate stream operations (with filter() the operation, for example).

findFirst() can guarantee the same result when running on the same source. The reason to use findAny() is to give a more flexible alternative to findFirst(). This gives the implementing stream more flexibility in case it is a parallel stream, and it’s useful if you are not interested in getting a specific element. In any case, before you use a parallel stream at all consider if it is worth the effort and added code complexity.

If you are using “findes” to select the filter survivor, just keep in mind, that intermediate operation can return more than one element and make sure that your assumption about a given stream is correct.