필수로 있어야 할것
- 시간 차이와 같이 무언가 연계 되어 있는 것
- 고유 인덱스
- amt (거의 있을듯.. ? )
- y (사기거래여부)
import pandas as pd
data
- https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud //
- https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud //
- https://www.kaggle.com/datasets/whenamancodes/fraud-detection // y가 없는듯
- https://www.kaggle.com/datasets/dhanushnarayananr/credit-card-fraud
- https://www.kaggle.com/datasets/ealaxi/paysim1
- https://www.kaggle.com/datasets/mishra5001/credit-card
- https://www.kaggle.com/datasets/joebeachcapital/credit-card-fraud // 복잡함..
- https://www.kaggle.com/datasets/nelgiriyewithana/credit-card-fraud-detection-dataset-2023
- https://www.kaggle.com/datasets/jainilcoder/online-payment-fraud-detection
- https://www.kaggle.com/datasets/rupakroy/online-payments-fraud-detection-dataset // 시간이없당
- https://www.kaggle.com/datasets/dermisfit/fraud-transactions-dataset
- https://www.kaggle.com/datasets/kartik2112/fraud-detection
- https://www.kaggle.com/datasets/vardhansiramdasu/fraudulent-transactions-prediction
-
유형
유형1(v1~v29); 1 / 2 / 6 / 7
유형2: 3
유형3: 4 / 8 / 9 / 12
유형4: 5
유형5(책): 10 / 11
1. creditcardfraud
https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
= pd.read_csv("~/Desktop/creditcard.csv") credicard
credicard
Time | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | ... | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | Amount | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | -1.359807 | -0.072781 | 2.536347 | 1.378155 | -0.338321 | 0.462388 | 0.239599 | 0.098698 | 0.363787 | ... | -0.018307 | 0.277838 | -0.110474 | 0.066928 | 0.128539 | -0.189115 | 0.133558 | -0.021053 | 149.62 | 0 |
1 | 0.0 | 1.191857 | 0.266151 | 0.166480 | 0.448154 | 0.060018 | -0.082361 | -0.078803 | 0.085102 | -0.255425 | ... | -0.225775 | -0.638672 | 0.101288 | -0.339846 | 0.167170 | 0.125895 | -0.008983 | 0.014724 | 2.69 | 0 |
2 | 1.0 | -1.358354 | -1.340163 | 1.773209 | 0.379780 | -0.503198 | 1.800499 | 0.791461 | 0.247676 | -1.514654 | ... | 0.247998 | 0.771679 | 0.909412 | -0.689281 | -0.327642 | -0.139097 | -0.055353 | -0.059752 | 378.66 | 0 |
3 | 1.0 | -0.966272 | -0.185226 | 1.792993 | -0.863291 | -0.010309 | 1.247203 | 0.237609 | 0.377436 | -1.387024 | ... | -0.108300 | 0.005274 | -0.190321 | -1.175575 | 0.647376 | -0.221929 | 0.062723 | 0.061458 | 123.50 | 0 |
4 | 2.0 | -1.158233 | 0.877737 | 1.548718 | 0.403034 | -0.407193 | 0.095921 | 0.592941 | -0.270533 | 0.817739 | ... | -0.009431 | 0.798278 | -0.137458 | 0.141267 | -0.206010 | 0.502292 | 0.219422 | 0.215153 | 69.99 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
284802 | 172786.0 | -11.881118 | 10.071785 | -9.834783 | -2.066656 | -5.364473 | -2.606837 | -4.918215 | 7.305334 | 1.914428 | ... | 0.213454 | 0.111864 | 1.014480 | -0.509348 | 1.436807 | 0.250034 | 0.943651 | 0.823731 | 0.77 | 0 |
284803 | 172787.0 | -0.732789 | -0.055080 | 2.035030 | -0.738589 | 0.868229 | 1.058415 | 0.024330 | 0.294869 | 0.584800 | ... | 0.214205 | 0.924384 | 0.012463 | -1.016226 | -0.606624 | -0.395255 | 0.068472 | -0.053527 | 24.79 | 0 |
284804 | 172788.0 | 1.919565 | -0.301254 | -3.249640 | -0.557828 | 2.630515 | 3.031260 | -0.296827 | 0.708417 | 0.432454 | ... | 0.232045 | 0.578229 | -0.037501 | 0.640134 | 0.265745 | -0.087371 | 0.004455 | -0.026561 | 67.88 | 0 |
284805 | 172788.0 | -0.240440 | 0.530483 | 0.702510 | 0.689799 | -0.377961 | 0.623708 | -0.686180 | 0.679145 | 0.392087 | ... | 0.265245 | 0.800049 | -0.163298 | 0.123205 | -0.569159 | 0.546668 | 0.108821 | 0.104533 | 10.00 | 0 |
284806 | 172792.0 | -0.533413 | -0.189733 | 0.703337 | -0.506271 | -0.012546 | -0.649617 | 1.577006 | -0.414650 | 0.486180 | ... | 0.261057 | 0.643078 | 0.376777 | 0.008797 | -0.473649 | -0.818267 | -0.002415 | 0.013649 | 217.00 | 0 |
284807 rows × 31 columns
len(set(credicard.V1))
275663
V1이 고유 뭐시기를 한거 같은데 .. 겹치는게 9,144개
time / amt / is_fraud 있음
2. fraud-detection
- 1번이랑 같은 거인듯?
3. credit-card-fraud
= pd.read_csv("~/Desktop/card_transdata.csv")
card_transdata card_transdata
distance_from_home | distance_from_last_transaction | ratio_to_median_purchase_price | repeat_retailer | used_chip | used_pin_number | online_order | fraud | |
---|---|---|---|---|---|---|---|---|
0 | 57.877857 | 0.311140 | 1.945940 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 |
1 | 10.829943 | 0.175592 | 1.294219 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 5.091079 | 0.805153 | 0.427715 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 |
3 | 2.247564 | 5.600044 | 0.362663 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 |
4 | 44.190936 | 0.566486 | 2.222767 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
999995 | 2.207101 | 0.112651 | 1.626798 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 |
999996 | 19.872726 | 2.683904 | 2.778303 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 |
999997 | 2.914857 | 1.472687 | 0.218075 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 |
999998 | 4.258729 | 0.242023 | 0.475822 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 |
999999 | 58.108125 | 0.318110 | 0.386920 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 |
1000000 rows × 8 columns
distance_from_home - 거래가 발생한 집으로부터의 거리입니다.
distance_from_last_transaction - 마지막 트랜잭션이 발생한 거리입니다.
ratio_to_median_purchase_price - 중간 구매 가격에 대한 구매 가격 거래의 비율입니다.
Repeat_retailer - 거래가 동일한 소매업체에서 이루어졌는지 여부입니다.
Used_chip - 칩(신용카드)을 통한 거래입니다.
Used_pin_number - PIN 번호를 사용하여 거래가 이루어졌는지 여부.
online_order - 거래가 온라인 주문입니까?
사기 - 거래가 사기인지 여부.
시간 없음..
4. paysim1
= pd.read_csv("~/Desktop/PS_20174392719_1491204439457_log.csv")
PS PS
step | type | amount | nameOrig | oldbalanceOrg | newbalanceOrig | nameDest | oldbalanceDest | newbalanceDest | isFraud | isFlaggedFraud | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | PAYMENT | 9839.64 | C1231006815 | 170136.00 | 160296.36 | M1979787155 | 0.00 | 0.00 | 0 | 0 |
1 | 1 | PAYMENT | 1864.28 | C1666544295 | 21249.00 | 19384.72 | M2044282225 | 0.00 | 0.00 | 0 | 0 |
2 | 1 | TRANSFER | 181.00 | C1305486145 | 181.00 | 0.00 | C553264065 | 0.00 | 0.00 | 1 | 0 |
3 | 1 | CASH_OUT | 181.00 | C840083671 | 181.00 | 0.00 | C38997010 | 21182.00 | 0.00 | 1 | 0 |
4 | 1 | PAYMENT | 11668.14 | C2048537720 | 41554.00 | 29885.86 | M1230701703 | 0.00 | 0.00 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6362615 | 743 | CASH_OUT | 339682.13 | C786484425 | 339682.13 | 0.00 | C776919290 | 0.00 | 339682.13 | 1 | 0 |
6362616 | 743 | TRANSFER | 6311409.28 | C1529008245 | 6311409.28 | 0.00 | C1881841831 | 0.00 | 0.00 | 1 | 0 |
6362617 | 743 | CASH_OUT | 6311409.28 | C1162922333 | 6311409.28 | 0.00 | C1365125890 | 68488.84 | 6379898.11 | 1 | 0 |
6362618 | 743 | TRANSFER | 850002.52 | C1685995037 | 850002.52 | 0.00 | C2080388513 | 0.00 | 0.00 | 1 | 0 |
6362619 | 743 | CASH_OUT | 850002.52 | C1280323807 | 850002.52 | 0.00 | C873221189 | 6510099.11 | 7360101.63 | 1 | 0 |
6362620 rows × 11 columns
step - 현실 세계의 시간 단위를 매핑합니다. 이 경우 1단계는 1시간입니다. 총 단계 744(30일 시뮬레이션).
유형 - CASH-IN, CASH-OUT, DEBIT, PAYMENT 및 TRANSFER.
금액 -현지 통화로 표시된 거래 금액입니다.
nameOrig - 거래를 시작한 고객
oldbalanceOrg - 거래 전 초기 잔액
newbalanceOrig - 거래 후 새 잔액입니다.
nameDest - 거래 수신자인 고객
oldbalanceDest - 거래 전 초기 잔액 수령인입니다. M(가맹점)으로 시작하는 고객에 대한 정보는 없습니다.
newbalanceDest - 거래 후 새 잔액 수신자입니다. M(가맹점)으로 시작하는 고객에 대한 정보는 없습니다.
isFraud - 시뮬레이션 내 사기 행위자가 수행한 거래입니다. 이 특정 데이터세트에서 에이전트의 사기 행위는 통제권이나 고객 계정을 빼앗아 이익을 얻고 다른 계정으로 이체한 다음 시스템에서 현금화하여 자금을 비우는 것을 목표로 합니다.
isFlaggedFraud - 비즈니스 모델은 한 계정에서 다른 계정으로의 대규모 이체를 제어하고 불법적인 시도를 표시하는 것을 목표로 합니다. 이 데이터 세트의 불법 시도는 단일 거래에서 200,000개 이상의 전송을 시도하는 것입니다.
len(set(PS.nameOrig))
6353307
5. credit-card
= pd.read_csv("~/Desktop/application_data.csv")
application_data application_data
SK_ID_CURR | TARGET | NAME_CONTRACT_TYPE | CODE_GENDER | FLAG_OWN_CAR | FLAG_OWN_REALTY | CNT_CHILDREN | AMT_INCOME_TOTAL | AMT_CREDIT | AMT_ANNUITY | ... | FLAG_DOCUMENT_18 | FLAG_DOCUMENT_19 | FLAG_DOCUMENT_20 | FLAG_DOCUMENT_21 | AMT_REQ_CREDIT_BUREAU_HOUR | AMT_REQ_CREDIT_BUREAU_DAY | AMT_REQ_CREDIT_BUREAU_WEEK | AMT_REQ_CREDIT_BUREAU_MON | AMT_REQ_CREDIT_BUREAU_QRT | AMT_REQ_CREDIT_BUREAU_YEAR | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100002 | 1 | Cash loans | M | N | Y | 0 | 202500.0 | 406597.5 | 24700.5 | ... | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
1 | 100003 | 0 | Cash loans | F | N | N | 0 | 270000.0 | 1293502.5 | 35698.5 | ... | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 100004 | 0 | Revolving loans | M | Y | Y | 0 | 67500.0 | 135000.0 | 6750.0 | ... | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 100006 | 0 | Cash loans | F | N | Y | 0 | 135000.0 | 312682.5 | 29686.5 | ... | 0 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 100007 | 0 | Cash loans | M | N | Y | 0 | 121500.0 | 513000.0 | 21865.5 | ... | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
307506 | 456251 | 0 | Cash loans | M | N | N | 0 | 157500.0 | 254700.0 | 27558.0 | ... | 0 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN |
307507 | 456252 | 0 | Cash loans | F | N | Y | 0 | 72000.0 | 269550.0 | 12001.5 | ... | 0 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN |
307508 | 456253 | 0 | Cash loans | F | N | Y | 0 | 153000.0 | 677664.0 | 29979.0 | ... | 0 | 0 | 0 | 0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 |
307509 | 456254 | 1 | Cash loans | F | N | Y | 0 | 171000.0 | 370107.0 | 20205.0 | ... | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
307510 | 456255 | 0 | Cash loans | F | N | N | 0 | 157500.0 | 675000.0 | 49117.5 | ... | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 1.0 |
307511 rows × 122 columns
= pd.read_csv("~/Desktop/columns_description.csv",encoding='cp1252')
columns_description columns_description
Unnamed: 0 | Table | Row | Description | Special | |
---|---|---|---|---|---|
0 | 1 | application_data | SK_ID_CURR | ID of loan in our sample | NaN |
1 | 2 | application_data | TARGET | Target variable (1 - client with payment diffi... | NaN |
2 | 5 | application_data | NAME_CONTRACT_TYPE | Identification if loan is cash or revolving | NaN |
3 | 6 | application_data | CODE_GENDER | Gender of the client | NaN |
4 | 7 | application_data | FLAG_OWN_CAR | Flag if the client owns a car | NaN |
... | ... | ... | ... | ... | ... |
155 | 209 | previous_application.csv | DAYS_FIRST_DUE | Relative to application date of current applic... | time only relative to the application |
156 | 210 | previous_application.csv | DAYS_LAST_DUE_1ST_VERSION | Relative to application date of current applic... | time only relative to the application |
157 | 211 | previous_application.csv | DAYS_LAST_DUE | Relative to application date of current applic... | time only relative to the application |
158 | 212 | previous_application.csv | DAYS_TERMINATION | Relative to application date of current applic... | time only relative to the application |
159 | 213 | previous_application.csv | NFLAG_INSURED_ON_APPROVAL | Did the client requested insurance during the ... | NaN |
160 rows × 5 columns
= pd.read_csv("~/Desktop/previous_application.csv")
previous_application previous_application
SK_ID_PREV | SK_ID_CURR | NAME_CONTRACT_TYPE | AMT_ANNUITY | AMT_APPLICATION | AMT_CREDIT | AMT_DOWN_PAYMENT | AMT_GOODS_PRICE | WEEKDAY_APPR_PROCESS_START | HOUR_APPR_PROCESS_START | ... | NAME_SELLER_INDUSTRY | CNT_PAYMENT | NAME_YIELD_GROUP | PRODUCT_COMBINATION | DAYS_FIRST_DRAWING | DAYS_FIRST_DUE | DAYS_LAST_DUE_1ST_VERSION | DAYS_LAST_DUE | DAYS_TERMINATION | NFLAG_INSURED_ON_APPROVAL | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2030495 | 271877 | Consumer loans | 1730.430 | 17145.0 | 17145.0 | 0.0 | 17145.0 | SATURDAY | 15 | ... | Connectivity | 12.0 | middle | POS mobile with interest | 365243.0 | -42.0 | 300.0 | -42.0 | -37.0 | 0.0 |
1 | 2802425 | 108129 | Cash loans | 25188.615 | 607500.0 | 679671.0 | NaN | 607500.0 | THURSDAY | 11 | ... | XNA | 36.0 | low_action | Cash X-Sell: low | 365243.0 | -134.0 | 916.0 | 365243.0 | 365243.0 | 1.0 |
2 | 2523466 | 122040 | Cash loans | 15060.735 | 112500.0 | 136444.5 | NaN | 112500.0 | TUESDAY | 11 | ... | XNA | 12.0 | high | Cash X-Sell: high | 365243.0 | -271.0 | 59.0 | 365243.0 | 365243.0 | 1.0 |
3 | 2819243 | 176158 | Cash loans | 47041.335 | 450000.0 | 470790.0 | NaN | 450000.0 | MONDAY | 7 | ... | XNA | 12.0 | middle | Cash X-Sell: middle | 365243.0 | -482.0 | -152.0 | -182.0 | -177.0 | 1.0 |
4 | 1784265 | 202054 | Cash loans | 31924.395 | 337500.0 | 404055.0 | NaN | 337500.0 | THURSDAY | 9 | ... | XNA | 24.0 | high | Cash Street: high | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1670209 | 2300464 | 352015 | Consumer loans | 14704.290 | 267295.5 | 311400.0 | 0.0 | 267295.5 | WEDNESDAY | 12 | ... | Furniture | 30.0 | low_normal | POS industry with interest | 365243.0 | -508.0 | 362.0 | -358.0 | -351.0 | 0.0 |
1670210 | 2357031 | 334635 | Consumer loans | 6622.020 | 87750.0 | 64291.5 | 29250.0 | 87750.0 | TUESDAY | 15 | ... | Furniture | 12.0 | middle | POS industry with interest | 365243.0 | -1604.0 | -1274.0 | -1304.0 | -1297.0 | 0.0 |
1670211 | 2659632 | 249544 | Consumer loans | 11520.855 | 105237.0 | 102523.5 | 10525.5 | 105237.0 | MONDAY | 12 | ... | Consumer electronics | 10.0 | low_normal | POS household with interest | 365243.0 | -1457.0 | -1187.0 | -1187.0 | -1181.0 | 0.0 |
1670212 | 2785582 | 400317 | Cash loans | 18821.520 | 180000.0 | 191880.0 | NaN | 180000.0 | WEDNESDAY | 9 | ... | XNA | 12.0 | low_normal | Cash X-Sell: low | 365243.0 | -1155.0 | -825.0 | -825.0 | -817.0 | 1.0 |
1670213 | 2418762 | 261212 | Cash loans | 16431.300 | 360000.0 | 360000.0 | NaN | 360000.0 | SUNDAY | 10 | ... | XNA | 48.0 | middle | Cash X-Sell: middle | 365243.0 | -1163.0 | 247.0 | -443.0 | -423.0 | 0.0 |
1670214 rows × 37 columns
6. credit-card-fraud
- 1번이랑 같은 거인듯?
7. credit-card-fraud-detection-dataset-2023
= pd.read_csv("~/Desktop/creditcard_2023.csv")
creditcard_2023 creditcard_2023
id | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | ... | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | Amount | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | -0.260648 | -0.469648 | 2.496266 | -0.083724 | 0.129681 | 0.732898 | 0.519014 | -0.130006 | 0.727159 | ... | -0.110552 | 0.217606 | -0.134794 | 0.165959 | 0.126280 | -0.434824 | -0.081230 | -0.151045 | 17982.10 | 0.0 |
1 | 1 | 0.985100 | -0.356045 | 0.558056 | -0.429654 | 0.277140 | 0.428605 | 0.406466 | -0.133118 | 0.347452 | ... | -0.194936 | -0.605761 | 0.079469 | -0.577395 | 0.190090 | 0.296503 | -0.248052 | -0.064512 | 6531.37 | 0.0 |
2 | 2 | -0.260272 | -0.949385 | 1.728538 | -0.457986 | 0.074062 | 1.419481 | 0.743511 | -0.095576 | -0.261297 | ... | -0.005020 | 0.702906 | 0.945045 | -1.154666 | -0.605564 | -0.312895 | -0.300258 | -0.244718 | 2513.54 | 0.0 |
3 | 3 | -0.152152 | -0.508959 | 1.746840 | -1.090178 | 0.249486 | 1.143312 | 0.518269 | -0.065130 | -0.205698 | ... | -0.146927 | -0.038212 | -0.214048 | -1.893131 | 1.003963 | -0.515950 | -0.165316 | 0.048424 | 5384.44 | 0.0 |
4 | 4 | -0.206820 | -0.165280 | 1.527053 | -0.448293 | 0.106125 | 0.530549 | 0.658849 | -0.212660 | 1.049921 | ... | -0.106984 | 0.729727 | -0.161666 | 0.312561 | -0.414116 | 1.071126 | 0.023712 | 0.419117 | 14278.97 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
163352 | 163352 | -0.452617 | 0.259592 | -0.006192 | -0.964916 | -0.051024 | -0.378307 | 0.354511 | 0.165575 | 0.616150 | ... | -0.212255 | -0.642926 | 0.260726 | 0.082772 | -0.431475 | 0.319001 | -0.225063 | -0.443086 | 14523.93 | 0.0 |
163353 | 163353 | 1.988546 | -0.676106 | -0.066960 | -1.247678 | 0.278521 | 0.134391 | 0.348558 | -0.234717 | 0.089843 | ... | 0.063047 | 1.204999 | -0.096033 | 0.591697 | 0.462605 | 0.162256 | -0.276319 | -0.242267 | 12033.69 | 0.0 |
163354 | 163354 | 0.156866 | -0.088362 | 0.153504 | -0.178948 | 0.566320 | 0.753835 | 0.224169 | -0.860127 | 0.012889 | ... | -0.263018 | 0.397224 | -0.656675 | 1.578449 | 1.221929 | -0.856416 | 0.051583 | 0.599162 | 15302.77 | 0.0 |
163355 | 163355 | 0.120934 | -0.108950 | 0.578018 | -0.961257 | 0.423398 | -0.222660 | 0.846739 | -0.201579 | 0.545305 | ... | -0.220013 | -0.535015 | 0.061926 | 0.126831 | -0.753613 | 0.330297 | 0.198570 | 0.280656 | 20371.26 | 0.0 |
163356 | 163356 | 1.726261 | -0.440380 | -0.083019 | -0.377200 | 0.253506 | -0.486139 | 0.515510 | -0.225861 | 1.020000 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
163357 rows × 31 columns
- 1번이랑 같은 느낌. 그런데 time이 빠지고 id가 넣어졌는데 index가 똑같다.
8. online-payment-fraud-detection
- 4번이랑 같은 형식.. 순서만 다른 듯
9. online-payments-fraud-detection-dataset
- 4번이랑 같은 형식
10. fraud-transactions-dataset
- 책과 같은 형식 (혹시 모르니.. test 다운 받아 놓깅)
= pd.read_csv("~/Desktop/fraudTest.csv")
fraudTest fraudTest
Unnamed: 0 | trans_date_trans_time | cc_num | merchant | category | amt | first | last | gender | street | ... | lat | long | city_pop | job | dob | trans_num | unix_time | merch_lat | merch_long | is_fraud | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 2020-06-21 12:14:25 | 2291163933867244 | fraud_Kirlin and Sons | personal_care | 2.86 | Jeff | Elliott | M | 351 Darlene Green | ... | 33.9659 | -80.9355 | 333497 | Mechanical engineer | 1968-03-19 | 2da90c7d74bd46a0caf3777415b3ebd3 | 1371816865 | 33.986391 | -81.200714 | 0 |
1 | 1 | 2020-06-21 12:14:33 | 3573030041201292 | fraud_Sporer-Keebler | personal_care | 29.84 | Joanne | Williams | F | 3638 Marsh Union | ... | 40.3207 | -110.4360 | 302 | Sales professional, IT | 1990-01-17 | 324cc204407e99f51b0d6ca0055005e7 | 1371816873 | 39.450498 | -109.960431 | 0 |
2 | 2 | 2020-06-21 12:14:53 | 3598215285024754 | fraud_Swaniawski, Nitzsche and Welch | health_fitness | 41.28 | Ashley | Lopez | F | 9333 Valentine Point | ... | 40.6729 | -73.5365 | 34496 | Librarian, public | 1970-10-21 | c81755dbbbea9d5c77f094348a7579be | 1371816893 | 40.495810 | -74.196111 | 0 |
3 | 3 | 2020-06-21 12:15:15 | 3591919803438423 | fraud_Haley Group | misc_pos | 60.05 | Brian | Williams | M | 32941 Krystal Mill Apt. 552 | ... | 28.5697 | -80.8191 | 54767 | Set designer | 1987-07-25 | 2159175b9efe66dc301f149d3d5abf8c | 1371816915 | 28.812398 | -80.883061 | 0 |
4 | 4 | 2020-06-21 12:15:17 | 3526826139003047 | fraud_Johnston-Casper | travel | 3.19 | Nathan | Massey | M | 5783 Evan Roads Apt. 465 | ... | 44.2529 | -85.0170 | 1126 | Furniture designer | 1955-07-06 | 57ff021bd3f328f8738bb535c302a31b | 1371816917 | 44.959148 | -85.884734 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
555714 | 555714 | 2020-12-31 23:59:07 | 30560609640617 | fraud_Reilly and Sons | health_fitness | 43.77 | Michael | Olson | M | 558 Michael Estates | ... | 40.4931 | -91.8912 | 519 | Town planner | 1966-02-13 | 9b1f753c79894c9f4b71f04581835ada | 1388534347 | 39.946837 | -91.333331 | 0 |
555715 | 555715 | 2020-12-31 23:59:09 | 3556613125071656 | fraud_Hoppe-Parisian | kids_pets | 111.84 | Jose | Vasquez | M | 572 Davis Mountains | ... | 29.0393 | -95.4401 | 28739 | Futures trader | 1999-12-27 | 2090647dac2c89a1d86c514c427f5b91 | 1388534349 | 29.661049 | -96.186633 | 0 |
555716 | 555716 | 2020-12-31 23:59:15 | 6011724471098086 | fraud_Rau-Robel | kids_pets | 86.88 | Ann | Lawson | F | 144 Evans Islands Apt. 683 | ... | 46.1966 | -118.9017 | 3684 | Musician | 1981-11-29 | 6c5b7c8add471975aa0fec023b2e8408 | 1388534355 | 46.658340 | -119.715054 | 0 |
555717 | 555717 | 2020-12-31 23:59:24 | 4079773899158 | fraud_Breitenberg LLC | travel | 7.99 | Eric | Preston | M | 7020 Doyle Stream Apt. 951 | ... | 44.6255 | -116.4493 | 129 | Cartographer | 1965-12-15 | 14392d723bb7737606b2700ac791b7aa | 1388534364 | 44.470525 | -117.080888 | 0 |
555718 | 555718 | 2020-12-31 23:59:34 | 4170689372027579 | fraud_Dare-Marvin | entertainment | 38.13 | Samuel | Frey | M | 830 Myers Plaza Apt. 384 | ... | 35.6665 | -97.4798 | 116001 | Media buyer | 1993-05-10 | 1765bb45b3aa3224b4cdcb6e7a96cee3 | 1388534374 | 36.210097 | -97.036372 | 0 |
555719 rows × 23 columns
fraudTest.is_fraud.mean()
0.0038598644278853163
11. fraud-detection
- 책과 동일. 10번과 동일
12. fraudulent-transactions-prediction
- 4번과 동일