[Java] 자바 8 - (2) Stream : 데이터 처리 연산

728x90

자바 8 - (2) Stream : 데이터 처리 연산

Stream은 자바 8에서 새로 추가된 기능입니다. Stream을 데이터 요소의 시퀀스를 처리하는 기능으로 데이터 처리 파이프라인을 만들 수 있으며 원본 데이터를 변경하지 않고 중간 연산과 최종 연산을 이용하여 원하는 데이터 결과를 얻는 함수형 스타일의 API입니다.

예제 코드는 Github에서 확인하실 수 있습니다.

1. Stream의 정의

Stream이란 데이터 처리 연산을 지원하도록 소스에서 추출된 연속된 소스를 말합니다. 중간 연산과 최종 연산을 통해 데이터를 처리하며, 중간 연산은 실제로 수행되지 않고 최종 연산이 호출될 때까지 지연 실행됩니다.

private List<Coffee> coffeeList = new ArrayList<>();

@BeforeEach
void init() {
    coffeeList = Arrays.asList(
            new Coffee(2000, 680, Brands.MEGA),
            new Coffee(3000, 1000, Brands.MEGA),
            new Coffee(4500, 355, Brands.STARBUCKS),
            new Coffee(5000, 473, Brands.STARBUCKS),
            new Coffee(5500, 592, Brands.STARBUCKS),
            new Coffee(2000, 625, Brands.PAIKDABANG),
            new Coffee(3000, 946, Brands.PAIKDABANG),
            new Coffee(4500, 355, Brands.TWOSOME),
            new Coffee(5000, 414, Brands.TWOSOME),
            new Coffee(3200, 420, Brands.EDIYA),
            new Coffee(4200, 650, Brands.EDIYA)
    );
}

@Test
void stream_test() {
    List<String> brandList = coffeeList.stream()        // 데이터 소스
            .filter(coffee -> coffee.getPrice() < 3000) // 중간 연산
            .map(Coffee::getBrands)                     // 중간 연산
            .map(Brands::getDesc)                       // 중간 연산
            .limit(3)                                   // 중간 연산
            .collect(Collectors.toList());              // 최종 연산

    System.out.println("brandList = " + brandList);
}
// 결과
brandList = [메가커피, 빽다방]

Stream의 구성은 다음과 같습니다.

데이터 소스(Data Source)
원본 데이터를 처리하기 위해 갖는 데이터로 배열, 컬렉션, 파일 등이 있다.
ex) 컬렉션의 'stream()', Arrays.stream(T[] array), Stream.of(T... values) 등
중간 연산(Intermediate Operations)
데이터 소스를 가공하거나 필터링하는 작업을 수행하는 연산으로 필요에 따라 여러 번(파이프라인) 연결할 수 있으며 실제로 수행되지 않고 중간 Stream을 반환한다.
ex) filter(Predicate<T> predicate), map(Function<T, R> mapper, sorted(), limit(long maxsize) 등
최종 연산(Terminal Operations)
Stream을 처리하여 결과를 반환하거나 출력하는 작업을 수행하는 연산으로 Stream의 요소를 소모하여 한 번만 호출할 수 있다.
ex) collect(Collector<T, A, R> collector), forEach(Consumer<T> action), count() 등

1-1. 데이터 소스(Data Source)

데이터 소스(Data Source)는 원본 데이터를 처리하기 위해 Stream에서 생성되는 데이터로 컬렉션, 배열, I/O 자원 등의 데이터를 이용하여 생성할 수 있으며 정렬된 데이터로 만들어진 데이터 소스는 정렬이 그대로 유지가 됩니다. 또한 데이터 소스는 'filter', 'sorted', 'map' 등과 같이 데이터 처리 연산을 수행할 수 있는 인터페이스를 제공합니다.

데이터 소스와 Collection의 큰 차이점은 계산하는 시점입니다. Collection은 현재 자료구조가 포함하는 모든 값을 메모리에 저장하는 구조로 Collection에 추가하려는 모든 요소들은 미리 계산을 한 이후에 추가되어야 합니다. 반면에 Stream을 통해 만들어진 데이터 소스는 이론적으로 요청할 때만 요소를 계산하는 고정된 자료구조로 되어있습니다.

1-2. 데이터 처리 연산

Stream은 'filter', 'map', 'reduce', 'find', 'collect', 'forEach' 등과 같이 함수형 프로그래밍 언어에서 일반적으로 지원하는 연산과 데이터베이스에서 일반적으로 사용되는 연산들을 지원하며 순차적, 병렬로 실행할 수 있습니다.

중간 연산 : 데이터를 가공하거나 필터링 작업 등 연결할 수 있는 Stream 연산
최종 연산 : Stream의 처리하여 결과를 반환하는 Stream을 닫는 연산

2. Stream에서 지원하는 다양한 연산

연산	메서드	반환 형식	사용하는 함수명	함수 디스크립터
중간 연산	filter	Stream<T>	Predicate<T>	T → boolean
	distinct	Stream<T>
	takeWhile	Stream<T>	Predicate<T>	T → boolean
	dropWhile	Stream<T>	Predicate<T>	T → boolean
	skip	Stream<T>	long
	limit	Stream<T>	long
	map	Stream<R>	Function<T, R>	T → R
	flatMap	Stream<R>	Function<T, Stream<R>>	T → Stream<R>
	sorted	Stream<R>	Comparator<T>	(T, T) -> int
최종 연산	anyMatch	boolean	Predicate<T>	T → boolean
	noneMatch	boolean	Predicate<T>	T → boolean
	allMatch	boolean	Predicate<T>	T → boolean
	findAny	Optional<T>
	findFirst	Optional<T>
	forEach	void	Consumer<T>	T -> void
	collect	R	Collector<T, A, R>
	reduce	Optional<T>	BinaryOperator<T>	(T, T) -> T
	count	long

2-1. 필터링

filter

'filter' 메서드는 Predicate를 인수를 받아 일치하는 모든 요소들을 포함하는 Stream을 반환합니다.

List<Coffee> filterList = coffeeList.stream()
        .filter(coffee -> coffee.getBrands().equals(Brands.MEGA))
        .collect(Collectors.toList());

distinct

'distinct' 메서드는 고유 요소로 이루어진 Stream을 반환합니다. 고유 여부는 Stream에서 만든 객체의 hashCode, equals로 결정합니다.

@Test
void stream_distinct() {
    List<String> filterList = coffeeList.stream()
            .filter(coffee -> coffee.getPrice() <= 3000)
            .map(Coffee::getBrands)
            .map(Brands::getDesc)
            .collect(Collectors.toList());
    System.out.println("filterList = " + filterList);

    List<String> distinctList = coffeeList.stream()
            .filter(coffee -> coffee.getPrice() <= 3000)
            .map(Coffee::getBrands)
            .map(Brands::getDesc)
            .distinct()
            .collect(Collectors.toList());
    System.out.println("distinctList = " + distinctList);
}

// 결과
filterList = [메가커피, 메가커피, 빽다방, 빽다방]
distinctList = [메가커피, 빽다방]

2-2. 슬라이싱

takeWhile, dropWhile

'filter' 메서드와 비슷한 메서드로 Stream의 요소를 효과적으로 선택할 수 있는 메서드로 정렬된 상태에서 특정 조건이 성립되면 반복 작업을 중단하기 때문에 성능을 향상시킬 수 있는 메서드입니다. 'filter' 메서드의 경우 전체 Stream의 요소를 필터링 처리하지만 'takeWhile'와 'dropWhile'의 경우 조건이 성립되면 나머지 Stream의 요소들은 필터링하지 않고 결과 값을 반환합니다.

takeWhile - Predicate의 반환 값이 false가 나오면 반복 작업 중단 후 현재까지 요소들을 반환
dropWhile - Predicate의 반환 값이 false가 나오면 반복 작업 중단 후 나머지 요소들을 반환

@Test
void stream_slicing() {
    List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 10);
    List<Integer> filterList = list.stream()
            .filter(i -> {
                System.out.println("filter i = " + i);
                return i < 5;
            })
            .collect(Collectors.toList());
    System.out.println("filterList = " + filterList);

    List<Integer> takeWhileList = list.stream()
            .takeWhile(i -> {
                System.out.println("takeWhile i = " + i);
                return i < 5;
            })
            .collect(Collectors.toList());
    System.out.println("takeWhileList = " + takeWhileList);

    List<Integer> dropWhileList = list.stream()
            .dropWhile(i -> {
                System.out.println("dropWhile i = " + i);
                return i < 5;
            })
            .collect(Collectors.toList());
    System.out.println("dropWhileList = " + dropWhileList);
}
// 결과
filter i = 1
filter i = 2
filter i = 3
filter i = 4
filter i = 5
filter i = 6
filter i = 7
filter i = 8
filter i = 10
filterList = [1, 2, 3, 4]
takeWhile i = 1
takeWhile i = 2
takeWhile i = 3
takeWhile i = 4
takeWhile i = 5
takeWhileList = [1, 2, 3, 4]
dropWhile i = 1
dropWhile i = 2
dropWhile i = 3
dropWhile i = 4
dropWhile i = 5
dropWhileList = [5, 6, 7, 8, 10]

주의할 점은 정렬된 Stream을 사용해야 하며 그렇지 않으면 원치 않는 결과를 얻게 됩니다. 따라서 'takeWhile'과 'dropWhile'을 사용하기 전에 정렬된 Stream으로 만든 후 사용해야 합니다.

limit

주어진 값 이하의 크기를 갖는 새로운 Stream을 반환하는 메서드입니다. 주어진 인수 값의 Stream이 완성되면 나머지 반복 작업을 중단합니다.

@Test
void stream_slicing_limit() {
    List<Coffee> limitList = coffeeList.stream()
            .filter(coffee -> coffee.getPrice() <= 3000)
            .limit(2)
            .collect(Collectors.toList());
}

skip

처음 n개 요소를 제외한 Stream을 반환하는 메서드입니다. n개 이하의 요소를 가진 Stream의 경우는 빈 Stream을 반환합니다.

@Test
void stream_slicing_skip() {
    List<Integer> filterList = coffeeList.stream()
            .filter(coffee -> coffee.getPrice() <= 4000)
            .map(Coffee::getPrice)
            .collect(Collectors.toList());
    System.out.println("filterList = " + filterList);
    
    List<Integer> limitList = coffeeList.stream()
            .filter(coffee -> coffee.getPrice() <= 4000)
            .map(Coffee::getPrice)
            .limit(3)
            .collect(Collectors.toList());
    System.out.println("limitList = " + limitList);

    List<Integer> skipList = coffeeList.stream()
            .filter(coffee -> coffee.getPrice() <= 4000)
            .map(Coffee::getPrice)
            .skip(3)  // 처음 3개 요소를 제외
            .collect(Collectors.toList());
    System.out.println("skipList = " + skipList);
}
// 결과
filterList = [2000, 3000, 2000, 3000, 3200]
limitList = [2000, 3000, 2000]
skipList = [3000, 3200]

2-3. 매핑

map

함수를 인수로 받아 각 요소에 적용하여 적용된 결과가 새로운 요소로 매핑되어 Stream으로 반환하는 메서드입니다. 값을 고친다는 개념보다는 새로운 버전의 Stream을 만든다는 개념에 더 가깝기 때문에 매핑(Mapping)이라는 단어를 사용합니다.

@Test
void stream_map() {
    List<String> mapList = coffeeList.stream()
            .map(Coffee::getBrands) // Coffee 객체를 Brands 으로 반환
            .map(Brands::getDesc)   // Brands 객체를 String 으로 반환
            .collect(Collectors.toList());
    System.out.println("mapList = " + mapList);
}
mapList = [메가커피, 메가커피, 스타벅스, 스타벅스, 스타벅스, 빽다방, 빽다방, 투썸플레이스, 투썸플레이스, 이디야, 이디야]

flatMap

'map' 메서드로 매핑을 할 때 값이 1개 이상이 나올 경우 요소의 반환 값은 Stream<Object[]>이 되고, Stream의 반환 값은 Stream<Stream<Object[]>>가 됩니다. 이때, 생성된 Stream<Stream<Object[]>>을 Stream<Object>으로 평면화하고 싶을 때 사용되는 메서드입니다.

@Test
void stream_flatMap() {

    List<String> strings = Arrays.asList("Hello", "world");
    // 하나의 요소에서 Stream<Object[]> 형식으로 반환되는 경우
    // List<Object[]> 형태로 반환이 된다.
    List<String[]> mapList = strings.stream()
            .map(s -> s.split(""))
            .distinct()
            .collect(Collectors.toList());
    System.out.println("mapList = " + mapList);

    // flatMap으로 평면화 했을 경우
    // List<Object> 형태로 반환이 된다.
    List<String> flatMapList = strings.stream()
            .map(s -> s.split(""))
            .flatMap(Arrays::stream)
            .distinct()
            .collect(Collectors.toList());
    System.out.println("flatMapList = " + flatMapList);
}
// 결과
mapList = [[Ljava.lang.String;@6b81ce95, [Ljava.lang.String;@2a798d51]
flatMapList = [H, e, l, o, w, r, d]

연습 문제

1. 두 개의 숫자 리스트가 있을 때 모든 숫자 쌍의 리스트를 반환하시오.

ex) [1, 2, 3], [3, 4] → [(1, 3), (1, 4), (2, 3), (2, 4), (3, 3), (3,4)]

List<Integer> number1 = Arrays.asList(1, 2, 3);
List<Integer> number2 = Arrays.asList(3, 4);

List<int[]> pairs = number1.stream()
        .flatMap(i -> number2.stream()
                .map(j -> new int[]{i, j}))
        .collect(Collectors.toList());

2. 2번 예제에서 합이 3으로 나누어 떨어지는 쌍만 반환하시오.

List<Integer> number1 = Arrays.asList(1, 2, 3);
List<Integer> number2 = Arrays.asList(3, 4);

List<int[]> pairs = number1.stream()
        .flatMap(i -> number2.stream()
                .filter(j -> (i + j) % 3 == 0)
                .map(j -> new int[]{i, j}))
        .collect(Collectors.toList());

2-4. 검색과 매칭

anyMatch

Predicate를 인자로 받아 Stream에서 적어도 한 요소와 일치하는지 확인할 때 사용하는 메서드입니다.

// CoffeeList에 메가커피가 1개라도 있으면 true 반환
boolean isMegaCheck = coffeeList.stream()
        .anyMatch(coffee -> coffee.getBrands().equals(Brands.MEGA));

allMatch

Predicate를 인자로 받아 Stream의 모든 요소가 주어진 Predicate와 일치하는지 확인할 때 사용하는 메서드입니다.

// coffeeList의 price가 모두 2000원 이상일 경우 true 반환
boolean allMatchCheck = coffeeList.stream()
        .allMatch(coffee -> coffee.getPrice() >= 2000);

noneMatch

'allMatch' 메서드와 반대로 Stream의 모든 요소가 주어진 Predicate와 일치하지 않는지 확인할 때 사용하는 메서드입니다.

// coffeeList의 price가 모두 2000원이하가 아닐경우 true 반환
boolean noneMatchCheck = coffeeList.stream()
        .noneMatch(coffee -> coffee.getPrice() < 2000);

findAny, findFirst

현재 Stream에서 조건에 해당되는 임의의 요소('findAny')나 첫 번째 요소('findFirst')를 반환할 때 사용되는 메서드입니다. 즉, 현재 Stream에 해당되는 요소가 있으면 반환하는 메서드로 결과를 찾는 즉시 바로 종료(쇼트서킷)되기 때문에 최적화할 수 있다는 장점이 있습니다. 주의할 점은 해당되는 요소가 없는 경우가 있을 수도 있기 때문에 반환값은 Optional로 반환됩니다.

두 개의 메서드가 비슷해 보이지만 병렬 실행 시 차이점이 존재합니다. 'findFirst'의 경우 병렬 실행할 때에는 첫 번째 요소를 찾기 어려워지기 때문에 'findFirst'를 사용할 수 없지만 'findAny'의 경우 요소의 반환 순서가 상관이 없기 때문에 병렬 처리 Stream과 'findAny'를 사용하여 최적화할 수 있다는 차이점이 존재합니다.

Optional<Coffee> coffeeOp = coffeeList.stream()
        .filter(coffee -> coffee.getBrands().equals(Brands.MEGA))
        .findAny(); // 반환 순서와 상관없이 해당되는 요소를 찾음
        
Optional<Coffee> coffeeOp2 = coffeeList.stream()
        .filter(coffee -> coffee.getBrands().equals(Brands.MEGA))
        .findFirst(); // 해당되는 첫 번째 요소를 찾음

2-5. 리듀싱

reduce - 초기값이 있는 경우

'reduce' 메서드는 Stream의 요소를 이용하여 병합 연산을 수행하는 최종 연산 중 하나입니다. 초기값과 연산을 처리하는 람다식을 인수로 받습니다.

@Test
void stream_reduce() {
    List<Integer> numberList = Arrays.asList(4, 5, 3, 9);

    // stream을 사용하지 않은 코드
    int sum = 0;
    for (Integer number : numberList) {
        sum += number;
    }

    // stream을 사용한 코드
    int reduceSum = numberList.stream().reduce(0, (a, b) -> a + b); // 초기값 0과 람다식
//  int reduceSum = numberList.stream().reduce(0, Integer::sum);
    log.info("sum = {}, reduceSum = {}", sum, reduceSum);
}

// 결과
sum = 21, reduceSum = 21

reduce - 초기값이 없는 경우

초기값이 없는 경우에는 람다식만 인수로 받아 사용할 수 있습니다. 다만 초기값이 없기 때문에 Stream 요소가 없는 경우에 대한 반환 값 처리를 위해 Optional을 반환한다는 특징이 있습니다.

@Test
void stream_reduce_no_init() {
    List<Integer> numberList = Arrays.asList(4, 5, 3, 9);

    Optional<Integer> reduceSumOp = numberList.stream().reduce(Integer::sum);
    int reduceSum = reduceSumOp.orElse(0);
    log.info("reduceSum", reduceSum);
}

// 결과
reduceSum = 21

3. 숫자형 Stream

숫자형 데이터를 Stream에서 사용하다 보면 비용이 비싼 박싱, 언박싱으로 인해 성능이 저하되는 경우가 발생합니다. 이러한 숫자형 데이터를 효율적으로 다루기 위해 사용되는 것이 바로 숫자형 Stream 입니다. 숫자형 Stream을 사용하게 되면 불필요한 박싱, 언박싱을 처리를 하지 않게 되어 메모리 사용과 성능 면에서 효율적으로 사용할 수 있다는 장점이 있습니다.

숫자형 Stream은 IntStream, DoubleStream, LongStream을 제공하며 오직 박싱과정에서 일어나는 효율성과 관련이 있기 때문에 추가적인 기능을 제공하지 않고, 자주 사용되는 숫자 관련 메서드 'sum', 'max', 'min'을 제공합니다.

@Test
void primitive_stream() {

    int priceSum = coffeeList.stream()
            .map(Coffee::getPrice)  // 박싱 비용 발생!
            .reduce(0, (a, b) -> a + b);

    int priceSum2 = coffeeList.stream()
            .mapToInt(Coffee::getPrice) // IntStream 반환, 박싱 비용 발생하지 않는다.
            .sum();
}

3-1. 객체 스트림으로 복원

boxed

숫자형 Stream으로 만든 다음에 다시 객체 Stream으로 복원하고자 할 때 사용되는 메서드입니다.

@Test
void primitive_stream_boxed() {
    IntStream intStream = coffeeList.stream()
            .mapToInt(Coffee::getPrice);         // 숫자형 Stream으로 변환
    Stream<Integer> stream = intStream.boxed();  // 객체형 Stream으로 변환
}

3-2. OptionalInt, OptionaDouble, OptionalLong

숫자형 Stream을 사용 시 요소가 없을 경우 Optional을 반환하기 위해 사용되는 클래스입니다. 각 클래스 명에 맞게 숫자형 Stream 중 int는 OptionalInt, double은 OptionalDouble, long은 OptionalLong을 반환합니다.

@Test
void primitive_stream_op() {
    OptionalInt minPriceOp = coffeeList.stream()
            .mapToInt(Coffee::getPrice) // 숫자형 Stream으로 변환
            .min();  // min 메서드 사용 시 숫자형 Stream의 요소가 없을 수도 있기 때문에
}                    // OptionalInt의 형태로 반환

'Backend > Java' 카테고리의 다른 글

[Java] 자바 8, 9 - 컬렉션 API 개선 (0)	2023.07.30
[Java] 자바 8 - (3) Stream : Collectors (0)	2023.07.24
[Java] 자바 8 - (1) Stream : 람다 표현식의 등장 배경 (0)	2023.07.23
[Java] Lambda Expression(람다 표현식) (0)	2023.07.22
[Java] ThreadLocal (1)	2023.07.16

제이동의 성장 블로그

[Java] 자바 8 - (2) Stream : 데이터 처리 연산

자바 8 - (2) Stream : 데이터 처리 연산

1. Stream의 정의

1-1. 데이터 소스(Data Source)

1-2. 데이터 처리 연산

2. Stream에서 지원하는 다양한 연산

2-1. 필터링

2-2. 슬라이싱

2-3. 매핑

2-4. 검색과 매칭

2-5. 리듀싱

3. 숫자형 Stream

3-1. 객체 스트림으로 복원

3-2. OptionalInt, OptionaDouble, OptionalLong

'Backend > Java' 카테고리의 다른 글

티스토리툴바

[Java] 자바 8 - (2) Stream : 데이터 처리 연산

자바 8 - (2) Stream : 데이터 처리 연산

1. Stream의 정의

1-1. 데이터 소스(Data Source)

1-2. 데이터 처리 연산

2. Stream에서 지원하는 다양한 연산

2-1. 필터링

2-2. 슬라이싱

2-3. 매핑

2-4. 검색과 매칭

2-5. 리듀싱

3. 숫자형 Stream

3-1. 객체 스트림으로 복원

3-2. OptionalInt, OptionaDouble, OptionalLong

'Backend > Java' 카테고리의 다른 글

관련글

티스토리툴바