ref

특징기반방법(Feature based methods)

설명적인 특징 집합을 특정 출력에 매핑하는 함수 찾기.
해당 개념을 학습할 만큼 전체를 충분히 대표하도록 주의 깊게 설계.
평균 차수, 전체 효율성, 특징정인 경로 길이에 의존

sellargraph 통해서 데이터셋 로드

from stellargraph import datasets
from IPython.display import display, HTML

dataset = datasets.PROTEINS()
display(HTML(dataset.description))
graphs, graph_labels = dataset.load()

2023-04-06 21:04:18.948281: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

Each graph represents a protein and graph labels represent whether they are are enzymes or non-enzymes. The dataset includes 1113 graphs with 39 nodes and 73 edges on average for each graph. Graph nodes have 4 attributes (including a one-hot encoding of their label), and each graph is labelled as belonging to 1 of 2 classes.

stellargraph형식에서 networkx 형식으로 그래프 변환

stellargraph 표현에서 numpy 인접행렬로 그래프 변환
인접행렬을 사용해 networkx 표현으로 돌리기

# tellargraph 형태에서 numpy인접행렬로 변환
adjs = [graph.to_adjacency_matrix().A for graph in graphs]

# Pandas.Series로 구성된 라벨을 numpy array로 변환
labels = graph_labels.to_numpy(dtype=int)

각 그래프에 대해 설명하기 위해 전역 측정 지표 계산

간선수, 평균 클러스터 계수, 전역 효율성 선택

import numpy as np
import networkx as nx

metrics = []
for adj in adjs:
  G = nx.from_numpy_matrix(adj)

  # 기본 속성
  num_edges = G.number_of_edges()

  # 클러스터링 방법
  cc = nx.average_clustering(G)

  # 효율성 측정
  eff = nx.global_efficiency(G)

  metrics.append([num_edges, cc, eff])

scikit-learn 유틸리티를 활용해 훈련 및 테스트 세트를 생성

데이터셋의 70% 훈련, 30% 테스트

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(metrics, labels, test_size=0.3, random_state=42)

머신러닝 알고리즘 학습 시작

scikit-learn의 SVC모듈 사용

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

from sklearn import svm
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

clf = svm.SVC()
clf.fit(X_train_scaled, y_train)

y_pred = clf.predict(X_test_scaled)

print('Accuracy', accuracy_score(y_test,y_pred))
print('Precision', precision_score(y_test,y_pred))
print('Recall', recall_score(y_test,y_pred))
print('F1-score', f1_score(y_test,y_pred))

Accuracy 0.7455089820359282
Precision 0.7709251101321586
Recall 0.8413461538461539
F1-score 0.8045977011494253