Streams API in Java
Process collections declaratively with the Streams API — map, filter, reduce, collect, flatMap, sorted, distinct, and parallel streams.
What is a Stream?
A Stream is a sequence of elements that supports a pipeline of operations. It does not store data — it processes data from a source (like a List, array, or file) through a chain of intermediate operations and ends with a terminal operation. Streams are lazy: intermediate operations do nothing until a terminal operation is called.
1. Stream vs Collection
A Collection stores data. A Stream describes computations on data. You consume a stream — once used, it is gone.
2. Creating Streams
Streams can be created from collections, arrays, individual values, ranges, and infinite generators.
3. The Stream Pipeline
A stream pipeline has three parts: a source, zero or more intermediate operations, and exactly one terminal operation.
Intermediate Operations
Intermediate operations return a new Stream and are lazy — they do not execute until a terminal operation is called. They can be chained in any order. Common ones are: `filter`, `map`, `flatMap`, `sorted`, `distinct`, `limit`, `skip`, and `peek`.
1. filter and map
`filter` keeps elements that match a predicate. `map` transforms each element into something else.
2. flatMap
`flatMap` transforms each element into a stream of values and flattens all those streams into a single stream.
3. sorted, distinct, limit, skip
`sorted` orders elements. `distinct` removes duplicates. `limit` caps the count. `skip` skips the first N elements.
4. peek
`peek` lets you inspect elements as they flow through the pipeline — primarily for debugging. It does not modify elements.
Terminal Operations
Terminal operations consume the stream and produce a result or side effect. Once a terminal operation is called, the stream is processed and closed. Common terminals: `forEach`, `collect`, `reduce`, `count`, `sum`, `min`, `max`, `findFirst`, `findAny`, `anyMatch`, `allMatch`, `noneMatch`.
1. forEach, count, sum
`forEach` performs an action on each element. `count` returns how many elements. `sum` / `average` work on numeric streams.
2. min, max, findFirst, findAny
`min` and `max` return the smallest/largest element. `findFirst` returns the first element. All return `Optional` because the stream might be empty.
3. anyMatch, allMatch, noneMatch
These short-circuit terminal operations test elements against a predicate and return a boolean.
4. reduce
`reduce` combines all elements into a single result by repeatedly applying a binary operator.
collect and Collectors
`collect` is the most powerful terminal operation. It works with `Collectors` to gather stream elements into collections, maps, strings, and more. `Collectors.toList()`, `toSet()`, `toMap()`, `groupingBy()`, `joining()`, and `counting()` are the most common.
1. Collecting to List and Set
`Collectors.toList()` gathers elements into an ArrayList. `Collectors.toSet()` gathers into a HashSet (removes duplicates).
2. Collecting to Map
`Collectors.toMap(keyMapper, valueMapper)` converts stream elements into a Map.
3. groupingBy
`Collectors.groupingBy` groups elements by a classifier function, producing a `Map<K, List<V>>`.
4. joining
`Collectors.joining` concatenates String stream elements into a single string, with optional delimiter, prefix, and suffix.
Numeric Streams
Java provides specialised primitive streams — `IntStream`, `LongStream`, and `DoubleStream` — to avoid the boxing overhead of `Stream<Integer>`. They have extra methods like `sum()`, `average()`, `min()`, `max()`, `range()`, and `rangeClosed()` built in.
1. IntStream Basics
`IntStream` works directly with primitive `int` values. No boxing to Integer — faster and more memory-efficient.
2. mapToInt and boxed
Convert between object streams and primitive streams with `mapToInt`, `mapToObj`, and `boxed`.
Parallel Streams
Parallel streams split the data across multiple CPU cores and process chunks simultaneously using the ForkJoinPool. You enable them with `.parallelStream()` instead of `.stream()`, or `.parallel()` on an existing stream. They are best for large, CPU-bound, stateless operations. For small data or I/O-bound work, parallel streams can be slower due to overhead.
1. Using parallelStream
`parallelStream()` splits work across multiple threads automatically. The order of results is not guaranteed.
2. When Not to Use Parallel Streams
Parallel streams with shared mutable state cause data races. Order-dependent operations and small datasets are also poor fits.
Real-World Stream Examples
The real power of streams comes from chaining multiple operations together to transform data in a clean, readable way. These examples show patterns you will regularly use in backend Java code.
1. Building a Summary Report
Combine filter, map, groupingBy, and joining to produce a formatted report from a list of records.
2. Search, Transform, and Format
A common backend pattern: search a dataset, transform matched records, and format them for an API response.