Advanced⏱️ 12 min📘 Topic 19 of 22

🌊 Java Streams API — map, filter, reduce and Collectors

Master the Java Streams API — map, filter, reduce, collect, Collectors (groupingBy, joining), lazy evaluation, parallel streams and intermediate vs terminal operations. With examples.

The Streams API (Java 8) processes collections declaratively — describe what to compute, not how to loop.

🔧 Pipeline structure

Source → intermediate operations (lazy) → terminal operation (triggers execution):

List<String> result = names.stream()    // source
  .filter(n -> n.length() > 3)          // intermediate (lazy)
  .map(String::toUpperCase)             // intermediate (lazy)
  .sorted()                             // intermediate (lazy)
  .collect(Collectors.toList());        // terminal (runs it)

⚙️ Common operations

  • filter — keep matching elements
  • map — transform each element
  • flatMap — flatten nested streams
  • reduce — fold to a single value
  • sorted, distinct, limit, skip

📦 Terminal operations

  • collect — gather into a List/Set/Map
  • forEach, count, anyMatch, findFirst
  • reduce, min/max, sum (on primitive streams)

🗂️ Collectors — the power tool

Map<Dept, List<Employee>> byDept =
  employees.stream().collect(Collectors.groupingBy(Employee::dept));

String csv = names.stream().collect(Collectors.joining(", "));

⏱️ Lazy & parallel

Intermediate ops are lazy — nothing runs until the terminal op. .parallelStream() splits work across cores, but only helps for large, CPU-bound, stateless pipelines.

💻 Code Examples

filter + map + collect

List<String> names = List.of("Sam","Alexander","Bo","Charlotte");
List<String> longUpper = names.stream()
  .filter(n -> n.length() > 3)
  .map(String::toUpperCase)
  .toList();
System.out.println(longUpper);
Output:
[ALEXANDER, CHARLOTTE]

groupingBy

var words = List.of("apple","banana","avocado","cherry");
Map<Character,List<String>> byFirst = words.stream()
  .collect(Collectors.groupingBy(w -> w.charAt(0)));
System.out.println(byFirst);
Output:
{a=[apple, avocado], b=[banana], c=[cherry]}

reduce to a sum

int sum = List.of(1,2,3,4).stream()
  .reduce(0, Integer::sum);
System.out.println(sum);
Output:
10

⚠️ Common Mistakes

  • Reusing a stream after a terminal operation — streams are single-use; create a new one.
  • Expecting intermediate ops to run on their own — they're lazy; nothing happens without a terminal op.
  • Using parallelStream() on small or IO-bound work — overhead makes it slower; benchmark first.
  • Mutating shared state inside forEach — breaks in parallel; prefer collect/reduce.

🎯 Interview Questions

Real questions asked at top product and service-based companies.

Q1.What is the difference between intermediate and terminal operations?Beginner
Intermediate operations (filter, map, sorted) return a new stream and are lazy — they don't execute until a terminal operation. Terminal operations (collect, forEach, count, reduce) produce a result or side effect and trigger the pipeline.
Q2.What does Collectors.groupingBy do?Intermediate
It collects stream elements into a Map, grouping them by a classifier function's result. Each key maps to a list (or a downstream-collected value) of elements sharing that key — like SQL GROUP BY.
Q3.What's the difference between map and flatMap?Intermediate
map transforms each element 1-to-1. flatMap transforms each element into a stream and flattens all of them into one stream — useful for flattening nested collections (List<List<T>> → stream of T).
Q4.Are streams lazy? What does that mean?Beginner
Yes. Intermediate operations build a pipeline but do no work until a terminal operation runs. Laziness enables short-circuiting (e.g., findFirst stops early) and fusing operations into a single pass.
Q5.When should you use parallelStream()?Advanced
Only for large datasets with CPU-bound, stateless, independent operations where the splitting/merging overhead is worth it. Avoid for small collections, IO-bound work, or operations with shared mutable state or ordering dependencies.

🧠 Quick Summary

  • Streams = declarative pipelines: source → intermediate → terminal.
  • Intermediate ops (map/filter/sorted) are lazy; terminal ops run the pipeline.
  • Collectors (groupingBy, joining, toMap) gather results powerfully.
  • reduce folds a stream to one value; flatMap flattens nested streams.
  • Streams are single-use; parallelStream only helps for big CPU-bound work.