[Pandas] 범주형(category) 데이터처리 / 구간분할(pd.cut, np.histogram)

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

조금씩 꾸준히 완성을 향해

[Pandas] 범주형(category) 데이터처리 / 구간분할(pd.cut, np.histogram) 본문

Python/Numpy & Pandas

[Pandas] 범주형(category) 데이터처리 / 구간분할(pd.cut, np.histogram)

all_sound 2022. 10. 4. 23:45

범주형(category) 데이터 처리

구간 분할

연속형 데이터를 그대로 사용하기 보다는 일정한 구간(bin)으로 나눠서 분석하는 것이 효율적인 경우가 있다. 이를 구간 분할(binning) 이라고 하고 pandas의 cut 함수를 사용한다.

▶ 라이브러리 import

import pandas as pd
import numpy as np

▶ NaN값 처리

# horsepower 열의 NaN 삭제
df.dropna(subset=['horsepower'], inplace=True)
# horsepower 열의 NaN 값 개수 확인
df.horsepower.isna().sum()  # 0

▶ np.histogram 함수 사용해서 열 분리

# horsepower 열을 bins 옵션으로 3개로 분리 실행
# bins:구간 분할 개수
count, bin_dividers = np.histogram(df.horsepower, bins=3)

# bin에 의해 나눠진 각 구간의 값의 개수
count
# array([257, 103,  32])

# bins 구간을 나눈 경계값
bin_dividers
# array([ 46.        , 107.33333333, 168.66666667, 230.        ])

print(df.horsepower.min())  # 46.0  최소값이 bin_dividers의 시작 값
print(df.horsepower.max())  # 230.0  최대값이 bin_dividers의 끝 값

▶ bin의 이름 지정

bin_names = ['저출력', '보통출력', '고출력']

▶ pd.cut 함수

# pd.cut 함수로 각 데이터를 3개의 bin에 할당
df['hp_bin'] = pd.cut(x = df.horsepower, 
                      bins = bin_dividers, #bins : 경계값 옵션
                      labels= bin_names, #labels : bin에 이름 붙이기                
                      include_lowest=True) #include_lowest : 첫 경계값 포함 옵션

# hp_bin 열 확인
df

'Python > Numpy & Pandas' 카테고리의 다른 글

[Pandas] 함수 매핑(mapping) / apply, applymap, pipe (0)	2022.10.06
[Pandas] 시계열 데이터 생성, 변환, 분리, 인덱싱(to_datetime, to_period, date_range, period_range 등) (0)	2022.10.05
[Pandas] 데이터 전처리 / 데이터 단위 변경, 데이터 타입 변경 (0)	2022.10.04
[Pandas] 데이터 전처리 / 중복 데이터 확인 및 제거 (duplicated, drop_duplicates) (0)	2022.10.02
[Pandas] 데이터 전처리 / 누락 데이터 처리 (isnull, notnull, dropna, fillna) (0)	2022.10.02

'Python/Numpy & Pandas' Related Articles

조금씩 꾸준히 완성을 향해

[Pandas] 범주형(category) 데이터처리 / 구간분할(pd.cut, np.histogram) 본문

[Pandas] 범주형(category) 데이터처리 / 구간분할(pd.cut, np.histogram)

범주형(category) 데이터 처리

구간 분할

'Python > Numpy & Pandas' 카테고리의 다른 글

티스토리툴바