STR 자료의 데이터마이닝을 이용한 혈연관계의 분류

정수진; 이효정; 이숭덕; 이승환; 박수정; 김종식; 이재원

SITE LINK

STR 자료의 데이터마이닝을 이용한 혈연관계의 분류 Classification of Common Relationships Based on Short Tandem Repeat Profiles Using Data Mining

대한법의학회지 2019년 43권 3호 p.97 ~ 105

정수진, 이효정, 이숭덕, 이승환, 박수정, 김종식, 이재원,

소속 상세정보

정수진 ( Jeong Su-Jin )
고려대학교 통계학과
이효정 ( Lee Hyo-Jung )
동아ST 개발본부
이숭덕 ( Lee Soong-Deok )
서울대학교 의과대학 법의학과
이승환 ( Lee Seung-Hwan )
대검찰청 과학수사2과
박수정 ( Park Su-Jeong )
대검찰청 과학수사2과
김종식 ( Kim Jong-Sik )
대검찰청 과학수사2과
이재원 ( Lee Jae-Won )
고려대학교 통계학과

KMID : 0357820190430030097 DOI : 10.7580/kjlm.2019.43.3.97

Abstract

We reviewed past studies on the identification of familial relationships using 22 short tandem repeat markers. As a result, we can obtain a high discrimination power and a relatively accurate cut-off value in parent-child and full sibling relationships. However, in the case of pairs of uncle-nephew or cousin, we found a limit of low discrimination power of the likelihood ratio (LR) method. Therefore, we compare the LR ranking method and data mining techniques (e.g., logistic regression, linear discriminant analysis, diagonal linear discriminant analysis, diagonal quadratic discriminant analysis, K-nearest neighbor, classification and regression trees, support vector machines, random forest [RF], and penalized multivariate analysis) that can be applied to identify familial relationships, and provide a guideline for choosing the most appropriate model under a given situation. RF, one of the data mining techniques, was found to be more accurate than other methods. The accuracy of RF is 99.99% for parentchild, 99.44% for full siblings, 90.34% for uncle-nephew, and 79.69% for first cousins.

키워드

Short tandem repeats; Kinship testing; Relationships; Likelihood ratio; Data mining

원문 및 링크아웃 정보

등재저널 정보

KCI

KoreaMed

KAMS

site infomation

국가지정 의과학연구정보센터(MedRIC) Since 1997, kmbase@medric.or.kr, TEL : 043-261-3460
28644 충북 청주시 서원구 충대로 1 충북대학교 산학협력관 N4 의학정보센터 301호

현황(현재기준)

국내논문820,010건(1065 저널)
해외논문483,276건