CH4. 지도 그래프 학습(특징기반방법)

graph
Author

김보람

Published

April 6, 2023

ref

특징기반방법(Feature based methods)

  • 설명적인 특징 집합을 특정 출력에 매핑하는 함수 찾기.

  • 해당 개념을 학습할 만큼 전체를 충분히 대표하도록 주의 깊게 설계.

  • 평균 차수, 전체 효율성, 특징정인 경로 길이에 의존

  1. sellargraph 통해서 데이터셋 로드
from stellargraph import datasets
from IPython.display import display, HTML

dataset = datasets.PROTEINS()
display(HTML(dataset.description))
graphs, graph_labels = dataset.load()
2023-04-06 21:04:18.948281: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Each graph represents a protein and graph labels represent whether they are are enzymes or non-enzymes. The dataset includes 1113 graphs with 39 nodes and 73 edges on average for each graph. Graph nodes have 4 attributes (including a one-hot encoding of their label), and each graph is labelled as belonging to 1 of 2 classes.
  1. stellargraph형식에서 networkx 형식으로 그래프 변환
  • stellargraph 표현에서 numpy 인접행렬로 그래프 변환

  • 인접행렬을 사용해 networkx 표현으로 돌리기

# tellargraph 형태에서 numpy인접행렬로 변환
adjs = [graph.to_adjacency_matrix().A for graph in graphs]

# Pandas.Series로 구성된 라벨을 numpy array로 변환
labels = graph_labels.to_numpy(dtype=int)
  1. 각 그래프에 대해 설명하기 위해 전역 측정 지표 계산
  • 간선수, 평균 클러스터 계수, 전역 효율성 선택
import numpy as np
import networkx as nx

metrics = []
for adj in adjs:
  G = nx.from_numpy_matrix(adj)

  # 기본 속성
  num_edges = G.number_of_edges()

  # 클러스터링 방법
  cc = nx.average_clustering(G)

  # 효율성 측정
  eff = nx.global_efficiency(G)

  metrics.append([num_edges, cc, eff])
  1. scikit-learn 유틸리티를 활용해 훈련 및 테스트 세트를 생성

데이터셋의 70% 훈련, 30% 테스트

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(metrics, labels, test_size=0.3, random_state=42)
  1. 머신러닝 알고리즘 학습 시작

scikit-learn의 SVC모듈 사용

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
from sklearn import svm
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

clf = svm.SVC()
clf.fit(X_train_scaled, y_train)

y_pred = clf.predict(X_test_scaled)

print('Accuracy', accuracy_score(y_test,y_pred))
print('Precision', precision_score(y_test,y_pred))
print('Recall', recall_score(y_test,y_pred))
print('F1-score', f1_score(y_test,y_pred))
Accuracy 0.7455089820359282
Precision 0.7709251101321586
Recall 0.8413461538461539
F1-score 0.8045977011494253