wine <- read.csv("C:\\data\\wine2.csv", stringsAsFactors = T)
wine
# 8:2 분리하기
library(caret)
set.seed(1)
k <- createDataPartition(wine$Type, p=0.8, list=F)
train_data <- wine[k, ]
test_data <- wine[-k, ]
# test data는 그대로 두고
# train data를 가지고 caret 패키지를 이용해 교차 검증을 시행하겠습니다.
# 10-fold 교차 검증 설정
library(caret)
train_control <- trainControl(method = "cv", number = 10)
# 교차검증하여 랜덤포레스트 모델 생성
model <- train(Type~., data=train_data,
model = 'rf',
trControl = train_control)
# caret 패키지의 train 함수로는 ntree는 튜닝할 수 없는 것 같습니다.
model
Random Forest
143 samples 13 predictor 3 classes: 't1', 't2', 't3'
No pre-processing Resampling: Cross-Validated (10 fold) Summary of sample sizes: 128, 130, 130, 128, 128, 128, ... Resampling results across tuning parameters:
mtry Accuracy Kappa 2 0.9846154 0.9767857 7 0.9703297 0.9549107 13 0.9646886 0.9460365
Accuracy was used to select the optimal model using the largest value. The final value used for the model was mtry = 2. |
model$results
mtry Accuracy Kappa AccuracySD KappaSD 1 2 0.9846154 0.9767857 0.04865043 0.07341002 2 7 0.9703297 0.9549107 0.06260415 0.09510896 3 13 0.9646886 0.9460365 0.05928667 0.09054548 |
# test set 예측하고 평가하기
pred <- predict(model, test_data)
confusionMatrix(pred, test_data$Type)
Confusion Matrix and Statistics
Reference Prediction t1 t2 t3 t1 11 0 0 t2 0 14 0 t3 0 0 9
Overall Statistics
Accuracy : 1 95% CI : (0.8972, 1) No Information Rate : 0.4118 P-Value [Acc > NIR] : 0.00000000000007908
Kappa : 1
34개 test set을 모두 맞추었습니다. |