Re:문제183.[오늘의 마지막 문제] 나이의 결측치를 중앙값으로 두고 knn 모델의 정확도를 확인하세요

import  pandas  as  pdimport  seaborn  as  sns  #1단계.  csv ---> 데이터 프레임으로 변환 df = sns.load_dataset('titanic') # 컬럼이 모두다 출력될 수 있도록 출력할 열의 개수 한도를 늘리기 pd.set_option('display.max_columns', 15 )  rdf =  df.drop(['deck', 'embark_town'] , axis= 1) #나이 누락 데이터 행을 나이 중앙값으로 치환median = rdf['age'].median()rdf['age'].fillna(median, inplace=True) #정박한 항구 누락 데이터 2건을 최빈값으로 치환most_freq = rdf['embarked'].value_counts(dropna=True).idxmax()rdf['embarked'].fillna( most_freq, inplace=True )  ndf = rdf[ ['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'embarked']]  #명목형 데이터를 숫자로 변환하기 위해 더미변수를 생성한다.gender = pd.get_dummies( ndf['sex'] )ndf = pd.concat( [ndf, gender], axis=1 )  onehot_embarked = pd.get_dummies( ndf['embarked'], prefix='town')ndf = pd.concat([ ndf, onehot_embarked], axis=1) #명목형 컬럼 2개를 삭제한다.ndf.drop( ['sex', 'embarked'],  axis=1, inplace=True ) #훈련할 학습 데이터 X와 정답이 있는 라벨 Y를 생성한다.X = ndf[ [ 'pclass' ,'age', 'sibsp','parch', 'female', 'male', 'town_C', 'town_Q',  'town_S']]y = ndf['survived']   # 종속변수  #4.2  독립변수들을 정규화(normalization) 한다. from  sklearn   import   preprocessingX = preprocessing.StandardScaler().fit(X).transform(X) from  sklearn.model_selection  import   train_test_splitX_train, X_test, y_train, y_test =train_test_split( X, y, test_size=0.3,                                                            random_state = 10 ) # 설명:  test_size=0.3 에 의해서 7:3 비율로 훈련과 테스트를 나누고# random_state=10 에 의해서 나중에 split 할 때도 항상 일정하게 # split 할 수 있게 한다.  print  ( 'train data 의 갯수:'  ,  X_train.shape )print  ( 'test  data 의 갯수:'  ,  X_test.shape ) from  sklearn.neighbors  import   KNeighborsClassifier  knn = KNeighborsClassifier( n_neighbors=5 ) knn.fit ( X_train, y_train ) #테스트 데이터에 대한 사망자와 생존자를 예측한다.y_hat = knn.predict( X_test )  #모델 평가를 위해 이원 교차표를 그린다.from  sklearn   import   metrics knn_matrix  =  metrics.confusion_matrix( y_test,  y_hat )print ( knn_matrix )  #모델의 정확도를 확인한다.from sklearn.metrics import accuracy_scoreaccuracy = accuracy_score( y_test, y_hat)print(accuracy) <pre style="box-sizing: border-box; overflow: auto; font-size: 14px; padding: 1px 0px; margin-top: 0px; margin-bottom: 0px; line-height: inherit; color: rgb(0, 0, 0); word-break: break-all; overflow-wrap: break-word; border: 0px; border-radius: 0px; white-space: pre-wrap; vertical-align: baseline;">train data 의 갯수: (623, 9) test data 의 갯수: (268, 9) [[153 21] [ 27 67]] 0.8208955223880597</pre>