🌊 Java Streams API — map, filter, reduce and Collectors
Master the Java Streams API — map, filter, reduce, collect, Collectors (groupingBy, joining), lazy evaluation, parallel streams and intermediate vs terminal operations. With examples.
The Streams API (Java 8) processes collections declaratively — describe what to compute, not how to loop.
🔧 Pipeline structure
Source → intermediate operations (lazy) → terminal operation (triggers execution):
List<String> result = names.stream() // source
.filter(n -> n.length() > 3) // intermediate (lazy)
.map(String::toUpperCase) // intermediate (lazy)
.sorted() // intermediate (lazy)
.collect(Collectors.toList()); // terminal (runs it)⚙️ Common operations
- filter — keep matching elements
- map — transform each element
- flatMap — flatten nested streams
- reduce — fold to a single value
- sorted, distinct, limit, skip
📦 Terminal operations
- collect — gather into a List/Set/Map
- forEach, count, anyMatch, findFirst
- reduce, min/max, sum (on primitive streams)
🗂️ Collectors — the power tool
Map<Dept, List<Employee>> byDept =
employees.stream().collect(Collectors.groupingBy(Employee::dept));
String csv = names.stream().collect(Collectors.joining(", "));⏱️ Lazy & parallel
Intermediate ops are lazy — nothing runs until the terminal op. .parallelStream() splits work across cores, but only helps for large, CPU-bound, stateless pipelines.
💻 Code Examples
filter + map + collect
List<String> names = List.of("Sam","Alexander","Bo","Charlotte");
List<String> longUpper = names.stream()
.filter(n -> n.length() > 3)
.map(String::toUpperCase)
.toList();
System.out.println(longUpper);Output:
[ALEXANDER, CHARLOTTE]
groupingBy
var words = List.of("apple","banana","avocado","cherry");
Map<Character,List<String>> byFirst = words.stream()
.collect(Collectors.groupingBy(w -> w.charAt(0)));
System.out.println(byFirst);Output:
{a=[apple, avocado], b=[banana], c=[cherry]}reduce to a sum
int sum = List.of(1,2,3,4).stream()
.reduce(0, Integer::sum);
System.out.println(sum);Output:
10
⚠️ Common Mistakes
- Reusing a stream after a terminal operation — streams are single-use; create a new one.
- Expecting intermediate ops to run on their own — they're lazy; nothing happens without a terminal op.
- Using parallelStream() on small or IO-bound work — overhead makes it slower; benchmark first.
- Mutating shared state inside forEach — breaks in parallel; prefer collect/reduce.
🎯 Interview Questions
Real questions asked at top product and service-based companies.
Q1.What is the difference between intermediate and terminal operations?Beginner
Intermediate operations (filter, map, sorted) return a new stream and are lazy — they don't execute until a terminal operation. Terminal operations (collect, forEach, count, reduce) produce a result or side effect and trigger the pipeline.
Q2.What does Collectors.groupingBy do?Intermediate
It collects stream elements into a Map, grouping them by a classifier function's result. Each key maps to a list (or a downstream-collected value) of elements sharing that key — like SQL GROUP BY.
Q3.What's the difference between map and flatMap?Intermediate
map transforms each element 1-to-1. flatMap transforms each element into a stream and flattens all of them into one stream — useful for flattening nested collections (List<List<T>> → stream of T).
Q4.Are streams lazy? What does that mean?Beginner
Yes. Intermediate operations build a pipeline but do no work until a terminal operation runs. Laziness enables short-circuiting (e.g., findFirst stops early) and fusing operations into a single pass.
Q5.When should you use parallelStream()?Advanced
Only for large datasets with CPU-bound, stateless, independent operations where the splitting/merging overhead is worth it. Avoid for small collections, IO-bound work, or operations with shared mutable state or ordering dependencies.
🧠 Quick Summary
- Streams = declarative pipelines: source → intermediate → terminal.
- Intermediate ops (map/filter/sorted) are lazy; terminal ops run the pipeline.
- Collectors (groupingBy, joining, toMap) gather results powerfully.
- reduce folds a stream to one value; flatMap flattens nested streams.
- Streams are single-use; parallelStream only helps for big CPU-bound work.