#!pip install autogluon.multimodal
14wk-62: NLP with Disaster Tweets (Text) / 자료분석(Autogluon)
최규빈
2023-12-01
1. 강의영상
2. Imports
import numpy as np
import pandas as pd
#---#
from autogluon.multimodal import MultiModalPredictor # from autogluon.tabular import TabularPredictor
#---#
import warnings
'ignore') warnings.filterwarnings(
3. Data
!kaggle competitions download -c nlp-getting-started
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/coco/.kaggle/kaggle.json'
Downloading nlp-getting-started.zip to /home/coco/Dropbox/Class/STBDA23/posts
100%|████████████████████████████████████████| 593k/593k [00:00<00:00, 2.28MB/s]
100%|████████████████████████████████████████| 593k/593k [00:00<00:00, 2.28MB/s]
!unzip nlp-getting-started.zip -d data
Archive: nlp-getting-started.zip
inflating: data/sample_submission.csv
inflating: data/test.csv
inflating: data/train.csv
= pd.read_csv('data/train.csv')
df_train = pd.read_csv('data/test.csv')
df_test = pd.read_csv('data/sample_submission.csv') sample_submission
!rm -rf data
!rm nlp-getting-started.zip
4. 분석
df_train.head()
id | keyword | location | text | target | |
---|---|---|---|---|---|
0 | 1 | NaN | NaN | Our Deeds are the Reason of this #earthquake M... | 1 |
1 | 4 | NaN | NaN | Forest fire near La Ronge Sask. Canada | 1 |
2 | 5 | NaN | NaN | All residents asked to 'shelter in place' are ... | 1 |
3 | 6 | NaN | NaN | 13,000 people receive #wildfires evacuation or... | 1 |
4 | 7 | NaN | NaN | Just got sent this photo from Ruby #Alaska as ... | 1 |
df_test.head()
id | keyword | location | text | |
---|---|---|---|---|
0 | 0 | NaN | NaN | Just happened a terrible car crash |
1 | 2 | NaN | NaN | Heard about #earthquake is different cities, s... |
2 | 3 | NaN | NaN | there is a forest fire at spot pond, geese are... |
3 | 9 | NaN | NaN | Apocalypse lighting. #Spokane #wildfires |
4 | 11 | NaN | NaN | Typhoon Soudelor kills 28 in China and Taiwan |
# step1 -- pass
# step2
= MultiModalPredictor(label = 'target')
predictr # step3
predictr.fit(df_train)# step4
= predictr.predict(df_test) yhat
No path specified. Models will be saved in: "AutogluonModels/ag-20231218_074742/"
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
2 unique label values: [1, 0]
If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Global seed set to 0
AutoMM starts to create your model. ✨
- AutoGluon version is 0.8.2.
- Pytorch version is 1.13.1+cu117.
- Model will be saved to "/home/coco/Dropbox/Class/STBDA23/posts/AutogluonModels/ag-20231218_074742".
- Validation metric is "roc_auc".
- To track the learning progress, you can open a terminal and launch Tensorboard:
```shell
# Assume you have installed tensorboard
tensorboard --logdir /home/coco/Dropbox/Class/STBDA23/posts/AutogluonModels/ag-20231218_074742
```
Enjoy your coffee, and let AutoMM do the job ☕☕☕ Learn more at https://auto.gluon.ai
0 GPUs are detected, and 0 GPUs will be used.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
----------------------------------------------------------
0 | model | MultimodalFusionMLP | 109 M
1 | validation_metric | BinaryAUROC | 0
2 | loss_func | CrossEntropyLoss | 0
----------------------------------------------------------
109 M Trainable params
0 Non-trainable params
109 M Total params
439.134 Total estimated model params size (MB)
Epoch 0, global step 26: 'val_roc_auc' reached 0.81472 (best 0.81472), saving model to '/home/coco/Dropbox/Class/STBDA23/posts/AutogluonModels/ag-20231218_074742/epoch=0-step=26.ckpt' as top 3
Epoch 0, global step 53: 'val_roc_auc' reached 0.87681 (best 0.87681), saving model to '/home/coco/Dropbox/Class/STBDA23/posts/AutogluonModels/ag-20231218_074742/epoch=0-step=53.ckpt' as top 3
Epoch 1, global step 80: 'val_roc_auc' reached 0.87866 (best 0.87866), saving model to '/home/coco/Dropbox/Class/STBDA23/posts/AutogluonModels/ag-20231218_074742/epoch=1-step=80.ckpt' as top 3
Epoch 1, global step 107: 'val_roc_auc' reached 0.89115 (best 0.89115), saving model to '/home/coco/Dropbox/Class/STBDA23/posts/AutogluonModels/ag-20231218_074742/epoch=1-step=107.ckpt' as top 3
Epoch 2, global step 134: 'val_roc_auc' reached 0.88618 (best 0.89115), saving model to '/home/coco/Dropbox/Class/STBDA23/posts/AutogluonModels/ag-20231218_074742/epoch=2-step=134.ckpt' as top 3
Epoch 2, global step 161: 'val_roc_auc' reached 0.88654 (best 0.89115), saving model to '/home/coco/Dropbox/Class/STBDA23/posts/AutogluonModels/ag-20231218_074742/epoch=2-step=161.ckpt' as top 3
Epoch 3, global step 188: 'val_roc_auc' reached 0.89034 (best 0.89115), saving model to '/home/coco/Dropbox/Class/STBDA23/posts/AutogluonModels/ag-20231218_074742/epoch=3-step=188.ckpt' as top 3
Epoch 3, global step 215: 'val_roc_auc' was not in top 3
Epoch 4, global step 242: 'val_roc_auc' was not in top 3
Epoch 4, global step 269: 'val_roc_auc' reached 0.89090 (best 0.89115), saving model to '/home/coco/Dropbox/Class/STBDA23/posts/AutogluonModels/ag-20231218_074742/epoch=4-step=269.ckpt' as top 3
Epoch 5, global step 296: 'val_roc_auc' was not in top 3
Epoch 5, global step 323: 'val_roc_auc' was not in top 3
Epoch 6, global step 350: 'val_roc_auc' was not in top 3
Epoch 6, global step 377: 'val_roc_auc' was not in top 3
Start to fuse 3 checkpoints via the greedy soup algorithm.
AutoMM has created your model 🎉🎉🎉
- To load the model, use the code below:
```python
from autogluon.multimodal import MultiModalPredictor
predictor = MultiModalPredictor.load("/home/coco/Dropbox/Class/STBDA23/posts/AutogluonModels/ag-20231218_074742")
```
- You can open a terminal and launch Tensorboard to visualize the training log:
```shell
# Assume you have installed tensorboard
tensorboard --logdir /home/coco/Dropbox/Class/STBDA23/posts/AutogluonModels/ag-20231218_074742
```
- If you are not satisfied with the model, try to increase the training time,
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub: https://github.com/autogluon/autogluon
- 오래 걸리넹..
5. 제출
sample_submission
id | target | |
---|---|---|
0 | 0 | 0 |
1 | 2 | 0 |
2 | 3 | 0 |
3 | 9 | 0 |
4 | 11 | 0 |
... | ... | ... |
3258 | 10861 | 0 |
3259 | 10865 | 0 |
3260 | 10868 | 0 |
3261 | 10874 | 0 |
3262 | 10875 | 0 |
3263 rows × 2 columns
'target'] = yhat
sample_submission["submission.csv",index=False) sample_submission.to_csv(
!kaggle competitions submit -c nlp-getting-started -f submission.csv -m "오토글루온, MultiModalPredictor"
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/coco/.kaggle/kaggle.json'
100%|██████████████████████████████████████| 22.2k/22.2k [00:01<00:00, 12.2kB/s]
Successfully submitted to Natural Language Processing with Disaster Tweets
250/1094
0.22851919561243145
이정도가 합리적임