Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.

STR ÀÚ·áÀÇ µ¥ÀÌÅ͸¶ÀÌ´×À» ÀÌ¿ëÇÑ Ç÷¿¬°ü°èÀÇ ºÐ·ù Classification of Common Relationships Based on Short Tandem Repeat Profiles Using Data Mining

´ëÇѹýÀÇÇÐȸÁö 2019³â 43±Ç 3È£ p.97 ~ 105
Á¤¼öÁø, ÀÌÈ¿Á¤, À̼þ´ö, À̽Âȯ, ¹Ú¼öÁ¤, ±èÁ¾½Ä, ÀÌÀç¿ø,
¼Ò¼Ó »ó¼¼Á¤º¸
Á¤¼öÁø ( Jeong Su-Jin ) 
°í·Á´ëÇб³ Åë°èÇаú

ÀÌÈ¿Á¤ ( Lee Hyo-Jung ) 
µ¿¾ÆST °³¹ßº»ºÎ
À̼þ´ö ( Lee Soong-Deok ) 
¼­¿ï´ëÇб³ ÀÇ°ú´ëÇÐ ¹ýÀÇÇаú
À̽Âȯ ( Lee Seung-Hwan ) 
´ë°ËÂûû °úÇмö»ç2°ú
¹Ú¼öÁ¤ ( Park Su-Jeong ) 
´ë°ËÂûû °úÇмö»ç2°ú
±èÁ¾½Ä ( Kim Jong-Sik ) 
´ë°ËÂûû °úÇмö»ç2°ú
ÀÌÀç¿ø ( Lee Jae-Won ) 
°í·Á´ëÇб³ Åë°èÇаú

Abstract


We reviewed past studies on the identification of familial relationships using 22 short tandem repeat markers. As a result, we can obtain a high discrimination power and a relatively accurate cut-off value in parent-child and full sibling relationships. However, in the case of pairs of uncle-nephew or cousin, we found a limit of low discrimination power of the likelihood ratio (LR) method. Therefore, we compare the LR ranking method and data mining techniques (e.g., logistic regression, linear discriminant analysis, diagonal linear discriminant analysis, diagonal quadratic discriminant analysis, K-nearest neighbor, classification and regression trees, support vector machines, random forest [RF], and penalized multivariate analysis) that can be applied to identify familial relationships, and provide a guideline for choosing the most appropriate model under a given situation. RF, one of the data mining techniques, was found to be more accurate than other methods. The accuracy of RF is 99.99% for parentchild, 99.44% for full siblings, 90.34% for uncle-nephew, and 79.69% for first cousins.

Å°¿öµå

Short tandem repeats; Kinship testing; Relationships; Likelihood ratio; Data mining

¿ø¹® ¹× ¸µÅ©¾Æ¿ô Á¤º¸

µîÀçÀú³Î Á¤º¸

KCI
KoreaMed
KAMS