import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd
from sklearn.preprocessing import StandardScaler # 표준화를 위해 필요
# 1. 데이터 불러오기
df = pd.read_csv("c:\\data\\insurance.csv", engine='python', encoding='CP949')
print(df)
# 2. 모델 생성하기
model = smf.ols(formula = 'expenses ~ age + sex + bmi + children + smoker + region', data = df)
result = model.fit() # 모델 훈련
print( result.summary() )
결과:
OLS Regression Results
==============================================================================
Dep. Variable: expenses R-squared: 0.751
Model: OLS Adj. R-squared: 0.749
Method: Least Squares F-statistic: 500.9
Date: Wed, 24 Feb 2021 Prob (F-statistic): 0.00
Time: 12:07:08 Log-Likelihood: -13548.
No. Observations: 1338 AIC: 2.711e+04
Df Residuals: 1329 BIC: 2.716e+04
Df Model: 8
Covariance Type: nonrobust
=======================================================================================
coef std err t P>|t| [0.025 0.975]
---------------------------------------------------------------------------------------
Intercept -1.194e+04 987.811 -12.089 0.000 -1.39e+04 -1e+04
sex[T.male] -131.3520 332.935 -0.395 0.693 -784.488 521.784
smoker[T.yes] 2.385e+04 413.139 57.723 0.000 2.3e+04 2.47e+04
region[T.northwest] -352.7901 476.261 -0.741 0.459 -1287.095 581.515
region[T.southeast] -1035.5957 478.681 -2.163 0.031 -1974.648 -96.544
region[T.southwest] -959.3058 477.912 -2.007 0.045 -1896.850 -21.762
age 256.8392 11.899 21.586 0.000 233.497 280.181
bmi 339.2899 28.598 11.864 0.000 283.187 395.393
children 475.6889 137.800 3.452 0.001 205.360 746.017
==============================================================================
Omnibus: 300.499 Durbin-Watson: 2.088
Prob(Omnibus): 0.000 Jarque-Bera (JB): 719.382
Skew: 1.212 Prob(JB): 6.14e-157
Kurtosis: 5.652 Cond. No. 311.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.