[AI Tech] 2주차 9-1차시 피처 중요도와 피처 선택

9차시: 피처 엔지니어링 - 2

9-1. 피처 중요도와 피처 선택

1. 피처 중요도란?

1-1. 피처 중요도란(Feature Importance)란?

피처 중요도

: 타겟 변수를 예측하는 데 얼마나 유용한 지에 따라 피처에 점수를 할당해서 중요도를 측정하는 방법이다.

Model-specific vs Model-agnostic

- Model-specific: 머신러닝 모델 자체에서 피처 중요도 계산이 가능한 것

- Model-agnostic: 모델에서 제공하는 기능에 의존하지 않고 모델을 학습한 후에 적용되는 피처 중요도 계산 방법

2. Boosting Tree 피처 중요도

2-1. LightGBM 피처 중요도

LightGBM 피처 중요도 함수

- Training된 LightGBM 모델 클래스에 feature_importance(importance_type)함수로 피처 중요도 계산 기능을 제공한다.

- 인자의 importance_type 값에 'split' 또는 'gain' 사용 가능하고, 디폴트는 'split'이다.

* split: number of times the feature is used in a model

* gain: total gains of splits that use the feature

2-2. XGBoost 피처 중요도

XGBoost 피처 중요도 함수

- Training된 XGBoost 모델 클래스에 get_score(importance_type) 함수로 피처 중요도 계산 기능을 제공한다.

- 인자의 importance_type, 디폴트는 'weight'이다.

* weight: the number of times a feature is used to split the data across all trees.

* gain: the average gain across all splits the feature is used in.

* cover: the average coverage across all splits the feature is used in.

* total_gain: the total gain across all splits the feature is used in.

* total_cover: the total coverage across all splits the feature is used in.

2-3. CatBoost 피처 중요도

CatBoost 피처 중요도 함수

- Training된 CatBoost 모델 클래스에 get_feature_importance(type) 함수로 피처 중요도 계산 기능을 제공한다.

- 인자의 type, 디폴트는 FeatureImportance이다.

* FeatureImportance: Equal to PredictionValuesChange for non-ranking metrics and

LossFunctionChangeror ranking metrics

* ShapValues: A vector with contributions of each feature to the prediction for every input object and

the expected value of the model prediction for the object

* Interaction: The value of the feature interaction strength for each pair of features.

* PredictionDiff: A vector with contributions of each feature to the RawFormulaVal difference for

each pair of objects.

3. Permutation 피처 중요도

3-1. Permutation 피처 중요도란?

- Measure the importance of a feature by calculating the increase in the model's prediction error after permuting the feature.

- A feature is "important" if shuffling its values increases the model error, because in this case the model relied on the feature for the prediction.

- A feature is "unimportant" if shuffling its values leaves the model error unchanged, because in this case the model ignored the feature for the prediction

pseudo-code of permutation feature importance

3-2. Permutation 피처 중요도 적용

4. 피처 선택이란?

4-1. 피처 선택(Feature Selection)이란?

피처 선택

- 머신러닝 모델에서 사용할 피처를 선택하는 과정

- 머신러닝 모델이 타겟 변수를 예측하는데 유용한 피처와 유용하지 않은 피처를 구분해서 유용한 피처를 선택하는 과정

- 피처 선택을 통해 모델의 복잡도를 낮춤으로써 오버피팅 방지 및 모델의 속도 향상 가능

- 피처 선택 방법

* Filter Method

* Wrapper Method

* Embedded Method

4-2. 피처 선택의 방식

Filter method

- 통계적인 방법으로 피처들의 상관관계를 알아내는 것이다.

- 전처리할 때 사용할 수 있다.

- 피처들의 correlation들을 계산

Wrapper method

- 예측모델을 사용해서 피처의 subset을 계속해서 테스트하는 방법이다.

- 어느 feature가 중요한지 알아내는 방식이다.

Embedded method

- 학습 알고리즘 자체에서 feature selection을 할 수 있는 방법이다.

저작자표시 비영리 변경금지

'취업준비 > 인공지능' 카테고리의 다른 글

[AI Tech] 2주차 9-3차시 Quiz 4 (2)	2024.01.24
[AI Tech] 2주차 9-2차시 피처 엔지니어링 연습 (0)	2024.01.24
[AI Tech] 2주차 8-2차시 피처 엔지니어링 (2) (0)	2024.01.23
[AI Tech] 2주차 8-1차시 피처 엔지니어링 (1) (0)	2024.01.23
[AI Tech] 1주차 7차시 프로젝트 3 (0)	2024.01.21

후유카와의 전자공학 이야기

[AI Tech] 2주차 9-1차시 피처 중요도와 피처 선택

9차시: 피처 엔지니어링 - 2

9-1. 피처 중요도와 피처 선택

1. 피처 중요도란?

2. Boosting Tree 피처 중요도

3. Permutation 피처 중요도

4. 피처 선택이란?

'취업준비 > 인공지능' 카테고리의 다른 글

티스토리툴바

[AI Tech] 2주차 9-1차시 피처 중요도와 피처 선택

9차시: 피처 엔지니어링 - 2

9-1. 피처 중요도와 피처 선택

1. 피처 중요도란?

2. Boosting Tree 피처 중요도

3. Permutation 피처 중요도

4. 피처 선택이란?

'취업준비 > 인공지능' 카테고리의 다른 글

'취업준비/인공지능' Related Articles

티스토리툴바