정규 표현식과 표현식 사용시 r을 사용

정규표현식<a href="https://docs.python.org/3/library/re.html?highlight=re#module-re" target="_blank" class="ke-link">https://docs.python.org/3/library/re.html?highlight=re#module-re</a> <a href="https://soooprmx.com/archives/7718" target="_blank" class="ke-link">https://soooprmx.com/archives/7718</a><a href="https://regexr.com/5mhou" target="_blank" class="ke-link">https://regexr.com/5mhou</a>  -특정한 규칙을 가진 문자열의 집합을 표현하는데 사용하는 형식 언어.-Programming Language나 Text Editor 등 에서 문자열의 검색과 치환을 위한 용도로 사용.-입력한 문자열에서 특정한 조건을 표현할 경우 일반적인 조건문으로는 다소 복잡할 수도 있지만, 정규 표현식을 이용하면 매우 간단하게 표현-코드가 간단한 만큼 가독성이 떨어져서 표현식을 숙지하지 않으면 이해하기 힘들다는 문제점  - 파이썬에서 정규식은 re 모듈이 제공  *** 정규표현식 기본 이해 ***정규표현식은 아래의 패턴 식을 이용해 문자열의 일부를 취할 수 있다. 식\s         공백문자(스페이스,탭등)\*          *\D        숫자가 아닌 문자^           문자의 시작부분  ex)/^The/i$           문자열 끝부분                     ex)/end$/ \w       알파벳,숫자,밑줄기호(_)[^0-9]    숫자를 제외한[0-9]     숫자만[A-Za-z]알파벳 대소문자|           ors{2}      s의 두번 반복  {반복 횟수} 속성g            전역 매칭m           여러 줄 매칭i             대소문자 구분 않음 문자 매칭*           0회 이상 반복+           1회 이상 반복?           0 or 1개의 문자 매칭.           1개의 문자 매칭  파이썬에서 정규표현식을 사용하려면, 먼저 import re 로 모듈을 읽도록 한다.한편 r 문자는 백슬래시 문자(\)를 해석하지 않기 때문에 정규표현식과 같은 곳에 유용하다. 예를 들어 r문자를 사용하지 않는다면 re.compile('(\\d+)/(\\d+)/(\\d+)') 와 같이 길어 백슬래시를 두 번 사용해야 하는 불편함이 있다. 이 때 r을 초기에 넣어 주면 편리하다.    [출처] <a href="http://blog.naver.com/dunopiorg/220132429093" target="_blank" class="ke-link">정규표현식 python re r' (raw string)</a>|작성자 <a href="http://blog.naver.com/dunopiorg" target="_blank" class="ke-link">장현웅</a>예) re.compile(r'(\d+)/(\d+)/(\d+)')  * re 모듈의 주요 함수complie(pattern[.flags]): 정규식 패턴 객체를 생성search(pattern, string[. flags])match(pattern, string[. flags])split(pattern, string[. maxsplit=0])   : pattern을 문자열 기준으로 분리findall(pattern, string)   : pattern을 만족하는 모든 문자열을 리스트 타입으로 추출sub(pattern, repl, string[, count=0])  : pattern을 찾아서 repl로 치환하는데 count는 치환 횟수를 제한 * Match 객체의 메소드group(): 매칭된 문자열 반환groups(): 매칭된 전체 그룹 문자열을 튜플 형식으로 반환start(): 매칭된 문자열의 시작 위치 리턴end(): 매칭된 문자열의 마지막 위치 리턴 span: 매칭된 문자열의 (시작, 끝) 위치를 리턴  연습1)import re regExp = '[0-9a-zA-Z][_0-9a-zA-Z-]*@[_0-9a-zA-Z-]+(\.[_0-9a-zA-Z-]+){1,2}$'print('이메일 검사') email = 'abc@abc.com'email2 = '@abc.com'print (bool(re.search(regExp, email)))print (bool(re.search(regExp, email2))) print('전화번호 검사')regExp2 = '^\d{3}-\d{3,4}-\d{4}'phone1 = '010-111-1234'phone2 = '0101111234'print (bool(re.search(regExp2, phone1))) print (bool(re.search(regExp2, phone2)))  연습2) # 정규표현식의 패턴을 이용한 문자열 수정import re bold =re.compile(r'\*{2}(.*?)\*{2}')text = 'make this **korea**. This **hello**.' print('text:', text)print('bold:', bold.sub(r'\1', text))   #자기참조 표현식print('bold:', bold.sub(r'\1', text, count=1))  #count로 수행 횟수 지정print('bold:', bold.subn(r'\1', text))  #subn()은 치환 횟수를 알려줌 print()#\g<name>은 숫자를 사용한 참조와 혼용도 가능. 그룹번호와 이를 둘러싸고 있는 문자열의 숫자와 혼동을 피할 수 있어 유용.bold2 = re.compile(r'\*{2}(?P<bold_text>.*?)\*{2}') #그룹명 사용print('text:', text) print('bold:', bold2.sub(r'\g<bold_text>', text))    연습3)# 문자열로 원핫 인코딩import ress = 'hello'regex = re.compile('[a-z]')tokens = regex.findall(ss)print(tokens) idx_token = list(set(tokens))print(idx_token)token_idx = {token:idx for idx, token in enumerate(idx_token)}print(token_idx)one_hot = lambda token:[1 if i == token_idx[token] else 0 for i in range(len(idx_token))]res = [one_hot(token) for token in tokens]print(tuple(res))