[Python] 데이터 결합 (np.concatenate, pd.concat)

티스토리 뷰

Python/Process

[Python] 데이터 결합 (np.concatenate, pd.concat)

Aaron 2019. 2. 14. 21:45

import pandas as pd

import numpy as np

from pandas import Series, DataFrame

#. 배열 결합 (np.concatenate)

np.concatenate?

concatenate((a1, a2, ...), axis=0, out=None)

ar1 = np.arange(4).reshape(2,2)

array([[0, 1],

[2, 3]])

np.concatenate([ar1, ar1], axis=1)

array([[0, 1, 0, 1],

[2, 3, 2, 3]])

np.concatenate([ar1, ar1], axis=0)

array([[0, 1],

[2, 3],

[0, 1],

[2, 3]])

#. 데이터 프레임 결합 (pd.concat)

pd.concat?

pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)

# axis : 축

# join : 조인 방법

# join_axes : 조인 축 지정

# keys : 원본데이터 이름 지정

# ignore_index : 중복되는 로우 이름 색인 무시 여부

df1 = DataFrame({'a':[1,2,3], 'b':[4,5,6]})

df2 = DataFrame({'a':[10,11], 'b':[40,50]})

df3 = DataFrame({'b':[40,50,60], 'c':[70,80,90], 'd':[10,11,12]})

df1 df2 df3

pd.concat([df1, df2]) # row 합치기

pd.concat([df1, df3], axis=1) # column 합치기

pd.concat([df1, df2], axis=1) # 매칭되는 key가 없을 경우 NA

pd.concat([df1, df2], axis=1, join='inner') # join

pd.concat([df1, df2], keys=['one', 'two']) # keys

pd.concat([df1, df3], axis=1, keys=['one', 'two'])

#. 중복 데이터 결합 (combine_first)

df1.combine_first?

df1.combine_first(other)

# 기준 데이터셋에 NA가 있을 경우, 참고하는 키의 New data로 업데이트

# 기준 데이터셋에 없는 Key가 참고 데이터에 있을 경우, 기준 데이터셋에 새로운 Key 삽입

s1 = Series([1,2,3,NA], index=['a','b','c','d'])

a 1.0

b 2.0

c 3.0

d NaN

dtype: float64

s2 = Series([10,20,30,40,50], index=['a','b','c','d','e'])

a 10

b 20

c 30

d 40

e 50

dtype: int64

s1.combine_first(s2) # s1 가 s2 를 참고하여 결합

a 1.0

b 2.0

c 3.0

d 40.0 # 기준 데이터셋의 NA를 참고 데이터로 업데이트

e 50.0 # 기준 데이터셋에 없는 Key 삽입

dtype: float64

# DataFrame의 경우, 같은 컬럼끼리 비교

df1 = DataFrame({'a':[1,2,3,4], 'b':[5,6,7,8], 'c':[9,10,11,12]})

df2 = DataFrame({'a':['a','b','c','d'], 'd':['e','f','g','h'], 'e':['i','j','k','l']})

df1 df2

df1.combine_first(df2)

#. 조건 결합 (np.where)

np.where?

where(condition, [x, y])

s1 = Series([np.nan,3,np.nan,7,9])

0 NaN

1 3.0

2 NaN

3 7.0

4 9.0

dtype: float64

s2 = Series([1,np.nan,5,np.nan,np.nan])

0 1.0

1 NaN

2 5.0

3 NaN

4 NaN

dtype: float64

np.where(pd.isnull(s1),s2,s1)

array([1., 3., 5., 7., 9.])

#. Q1

# emp_1, emp_2, emp_3 파일을 불러온 후

e1 = pd.read_excel("emp_1.xlsx")

e2 = pd.read_excel("emp_2.xlsx")

e3 = pd.read_excel("emp_3.xlsx")

e1 e2 e3

1) emp_1 파일과 emp_2파일을 하나의 테이블 형태로

emp_x = pd.merge(e1, e2) # on 생략 시 일치하는 key(EMPNO)에 join

2) 위에서 생성된 객체와 emp_3 데이터를 하나의 테이블 형태로

e5 = pd.concat([emp_x , e3] , ignore_index = True)

3) 1~2에서 결합한 테이블에 dept 테이블을 참조한 부서명을 추가하여 새로운 테이블 생성

dept = get_query('select * from dept')

dept

final_e = pd.merge(e5 , dept.loc[:,"DEPTNO":"DNAME"] , on = "DEPTNO")

참고: KIC 캠퍼스 머신러닝기반의 빅데이터분석 양성과정

저작자표시 (새창열림)

'Python > Process' 카테고리의 다른 글

[Python] DataFrame의 데이터 변형 메서드 (0)	2019.02.15
[Python] 피벗 (.pivot, .pivot_table) (0)	2019.02.14
[Python] 데이터 병합(Join) - pandas.merge (0)	2019.02.14
[Python] DataFrame의 멀티인덱스와 멀티컬럼(Multi-index, Multi-column) (8)	2019.02.12
[Python] Pandas - DataFrame 관련 메서드 (0)	2019.02.11

최근에 올라온 글

최근에 달린 댓글

링크

Total

Today

Yesterday

TAG more

Data Makes Our Future

티스토리 뷰

[Python] 데이터 결합 (np.concatenate, pd.concat)

'Python > Process' 카테고리의 다른 글

티스토리툴바