¤¤¤å­pºâ»y¨¥¾Ç´Á¥Z                        [English]
²Ä¤Q¤T¨÷ ²Ä¤G´Á 2008


¼ÐÃD:
Multiple Document Summarization Using Principal Component Analysis Incorporating Semantic Vector Space Model

§@ªÌ:
Om Vikas, Akhil K Meshram, Girraj Meena, and Amit Gupta

ºK­n:
Text Summarization is very effective in relevant assessment tasks. The Multiple Document Summarizer presents a novel approach to select sentences from documents according to several heuristic features. Summaries are generated modeling the set of documents as Semantic Vector Space Model (SVSM) and applying Principal Component Analysis (PCA) to extract topic features. Pure Statistical VSM assumes terms to be independent of each other and may result in inconsistent results. Vector space is enhanced semantically by modifying the weight of the word vector governed by Appearance and Disappearance (Action class) words. The knowledge base for Action words is maintained by classifying the words as Appearance or Disappearance with the help of Wordnet. The weights of the action words are modified in accordance with the Object list prepared by the collection of nouns corresponding to the action words. Summary thus generated provides more informative content as semantics of natural language has been taken into consideration.

ÃöÁä¦r: Principal Component Analysis (PCA), Semantic Vector Space Model (SVSM), Summarization, Topic Feature, Wordnet


¼ÐÃD:
º~»y»y®Æ®wµü©Ê¼Ðª`¤@­P©ÊÀˬd¤èªk¬ã¨s

§@ªÌ:
±iªê¡B¾G®aùÚ

ºK­n:
¨î¬ù»y®Æ®w¥[¤u«~½èªº¤@­Ó­«­n¤è­±¬O¦h¼Ð°Oµü»yªºµü©Ê¼Ðª`¤@­P©Ê°ÝÃD¡C¥»¤å³q¹L¹ï¤j³W¼Ò»y®Æ®w­ÝÃþµüªºµü©Ê¼Ðª`µ²ªGªº¤ÀªR¡A´£¥X¤F¤@ºØ°ò©ó¤ÀÃþªº»y®Æ®wµü©Ê¼Ðª`¤@­P©ÊÀˬdªº¤èªk¡C­º¥ý¤ÀªRµü©Ê¼Ð°O§Ç¦Cªº¯S¼x¨Ã«Ø¥ß­ÝÃþµü»y¹Ò¦V¶q¼Ò«¬¡A¨ä¦¸¡A¹B¥Îk³Ìªñ¾Fªk¡A¹ï­ÝÃþµü»y¹Ò¦V¶q¤ÀÃþ¡A§P©w­ÝÃþµüµü©Ê¼Ðª`¬O§_¤@­P¡A¶i¦Ó±o¥X¨C½g¤å³¹ªºµü©Ê¼Ðª`ªº¤@­P©Ê±¡ªp¡C³q¹L¹ï¥_¤j150¸U»y®Æ¶i¦æ´ú¸Õ¡AÅã¥Ü¸Ó¤èªk¬O¥i¦æ»P¦³®Äªº¡C

ÃöÁä¦r:
­ÝÃþµü¡B¤@­P©ÊÀˬd¡Bµü©Ê¼Ðª`¡Bº~»y»y®Æ®w¡B¤ÀÃþ


½g¦W:
°ò©ó¬Û¨Ì©Ê¤ÀªR«Øºc±a¦³®É¶¡©ÊÃö«Y¼Ð°Oªºº~»y»y®Æ®w

§@ªÌ:
¾G¨|©÷¡B²L­ì¥¿©¯¡BªQ¥»¸Îªv

ºK­n:
¥»½×¤å¤¶²Ð°ò©ó¬Û¨Ì©Ê¤ÀªR¨Ó«Øºc±a¦³®É¶¡©ÊÃö«Yª`ÄÀªºº~»y»y®Æ®wªº¤èªk»P»¡©ú¼Ð°Oª`ÄÀ¤§¤º®e¡C ¸Ó»y®Æ®w¬O°w¹ïº~»y¤å¥ó¤¤¹ï©ó¨Æ¶H¤§¶¡ªº®É¶¡©ÊÃö«Y¨Ó¼Ð°O¡A¥i¥Î©ó¨Æ¶H¶¡®É¶¡©ÊÃö«Yªº¦Û°Ê§P§O¤§¬ã¨s¡C ¥Ñ©ó¼Ð°O¤å¥ó¤¤©Ò¦³¨Æ¶Hªº²Õ¦X¤§®É¶¡©ÊÃö«Yªº¦¨¥»·¥°ª¥B¨Ã«D©Ò¦³¨Æ¶H¶¡ªº®É¶¡Ãö«Y³£¨ã¦³¼Ð°O»ù­È¡A §Ú­ÌÂǥѻy¥yªº¬Û¨Ì©Ê¤ÀªRµ²ªG¡A­­©w¨Æ¶H¶¡®É¶¡©ÊÃö«Yªº¼Ð°O¥Ø¼Ð¨Ó´£°ª®É¶¡©ÊÃö«Yªº¼Ð°O®Ä²v¡C ¬°ÅçÃÒ¥»¤èªk¹ï©ó¨Æ¶H®É¶¡©ÊÃö«YªºÂл\²v¡A§Ú­Ì¥H¥»¤èªk¨Ó¼Ð°O¤@­Ó±a¦³¤å¥y¤§¬Û¨Ì©Êµ²ºc¸ê°Tªº»y®Æ®w¡A ¨Ã°w¹ï¨ä¤@³¡¥÷¤å¥ó¶i¦æ¨Æ¶H®É¶¡©ÊÃö«YÂл\²vªº¤ÀªR §Ú­Ìµo²{°w¹ï¨Æ¶H¶¡ªº¬Û¨Ì©Ê¨Ó¼Ð°O¨Æ¶H¶¡ªº®É¶¡©ÊÃö«Y¡A¥iÂл\63%ªº¨Æ¶H¶¡®É¶¡©ÊÃö«Y¡C ¨Ã¥B¥»¤èªkªº¤u§@®Ä²v»·°ª©ó¥þ­±¶i¦æ¨Æ¶H¶¡®É¶¡©ÊÃö«Y¼Ð°O¡C

ÃöÁä¦r:
®É¶¡©Êª«¥ó¡B®É¶¡©Ê²z¸Ñ¡B®É¶¡©ÊÃö«Y¡B¨Æ¶H¡B¨Æ¶H»y·N¡B¬Û¨Ì©Êµ²ºc


½g¦W:
±q¹Ï¦¡½×¬Ý§Î¦¡¹Ï¦¡¹ï¤¤°ê¾Ç¥Í¾\Ū²z¸Ñªº§@¥Î-----¤@¥÷¹êÅç³ø§i

§@ªÌ:
Xiaoyan Zhang

ºK­n:
¥»¶µ¬ã¨s³ø§i¤F45¦W«D­^»y±M·~¤j¾Ç¤G¦~¯Å¾Ç¥Í¹ï¤T½g¤º®e¬Û¦P¡A§Î¦¡¹Ï¦¡¤£¦Pªº¤å³¹ªº¾\Ū¹Lµ{¡A¦®¦b±´¯Á§Î¦¡¹Ï¦¡¹ï¾\Ū²z¸Ñªº§@¥Î¡C45¦W³Q¸Õ³Q¤À¬°­^»y¤ô·Ç¬Û·íªº¤T²Õ¡A­n¨D¨C²Õ³Q¸Õ¦b¾\Ū§¹¨ä¤¤¤@½g¤å³¹¤§«á¡A­º¥ý¦^¾Ð¨Ã°O¿ý©Ò¾\Ūªº¤º®e¡AµM«á§¹¦¨¥Ñ¥»½g¤å³¹§ï½sªº¶ñªÅÃD¡C³q¹L¹ï³Q¸Õªº¦^¾Ð°O¿ýªº©w©Ê¡A©w¶q¤ÀªR¡Aµ²ªGªí©ú³Q¸Õ¹ïµ²ºcºò±Kªº¤å³¹¤ñµ²ºcÃP´²ªº¤å³¹¦^¾Ð¼Æ¶qÅãµÛ¦h¡A¦^¾Ð«~½èÅãµÛ°ª¡C³o¶i¤@¨BÅçÃÒ¤F¹Ï¦¡¾\Ū²z½×ªº¥¿½T©Ê¡Aªí©úÁA¸Ñ´x´¤§Î¦¡¹Ï¦¡¥i¥H´£°ª®Ñ­±¥æ¬yªº¦³®Ä©Ê¡A­^»y±Ð®v¦]¦¹À³¸Ó¦b¼g§@±Ð¾Ç¹Lµ{¤¤¦V¾Ç¥Í¶Ç±Â¦p¦ó¦³®ÄÀ³¥Î§Î¦¡¹Ï¦¡ª¾ÃѪº§Þ¥©¡A¥H«K´£°ª¾Ç¥Íªº¼g§@®ÄªG©M¼g§@¯à¤O¡C

ÃöÁä¦r:
§Î¦¡¹Ï¦¡¡B¹Ï¦¡²z½×¡B¾\Ū²z¸Ñ


½g¦W:
¶ë­µµo­µ°_©l®É¶¡ªº¸ó»y¨¥¬ã¨s

§@ªÌ:
»¯«a»ö¡B³¯ÄR¬ü

ºK­n:
¥»¬ã¨s±´°Q¤¤­^¤å¶ë­µµo­µ°_©l®É¶¡¡]voice onset time, VOT¡^¸ó»y¨¥¤ñ¸û¡AÀË´ú11¦ì¤¤¤å¥À»yªÌ»P4¦ì­^»y¥À»y¤H¤h¤§¦r­ºµLÁn¶ë­µµo­µ¡C¥»½g½×¤å±N¤¤­^¤å¶ë­µ¦b©Ò¦³»y¨¥VOTªº³sÄò©Ê¡]VOT continuum¡^¤¤¡A§ä¥X¾A·íªº¸¨ÂI§@¾A·í¤ÀÃþ¡C¬ã¨sµ²ªGÅã¥Ü¤¤­^¤åµLÁn°e®ð¶ë­µ¦bVOTªº³sÄò©Ê¤¤©¼¦¹¶¡¹F¨ìÅãµÛ®t²§¡AÀ³¸Ó¤ÀÄݤ£¦PÃþ§O¡C¦¹µ²ªG¦P®É«ØÄ³¤åÄm¤¤¼sªx¨Ï¥Îªº¤TºØVOT¤ÀÃþ¤£°÷ºë²Ó¡AµLªk¤À¿ë¤¤­^¤å¦bµLÁn °e®ð¶ë­µ¤Wªº®t§O¡C

ÃöÁä¦r:
µo­µ°_©l®É¶¡¡BµLÁn¶ë­µ
¡@


¼ÐÃD:
¾ã¦X¦Û°Ê»y­µ¿ëÃÑ»P¤å¦rÂàª`­µ¤§¸ê®Æ¾É¦V¤èªkÀ³¥Î©ó¦ò¸g»y­µ¼Ð­µ

§@ªÌ:
±ç±Ó¶¯¡B§f¤¯¶é¡B¦¿¥Ã¶i

ºK­n:
§Ú­Ì´£¥X¤F¤@®M¤èªk±N¦Û°Ê»y­µ¿ëÃÑ»P¤å¦rÂàª`­µ§Þ³N¾ã¦X¨Ó°µ¤å¦r¼Ð­µ¡A³o®M¤èªk³Q¥Î¨Ó±N¤¤¤å¦rÂà¼g¦¨¥x»y­µ¼Ð¡CÂǥѱa¦³»y­µ¸ê°Tªº¤å¦r¡A°t¦X¤@¦r¦h­µ²£¥Íªº­»¸zª¬·j´Mºô¸ô¡A§Ú­Ì¥i¥H´î¤Ö¼Ð­µªº¿ù»~²v¡C¦]¦¹¡A¨Ï¥Î¤@¦r¦h­µªºµo­µÃã¨å©Ò²£¥Íªº·j´Mºô¸ô¨Ó§@¼Ð­µ¡A¹êÅ窺¿ù»~²v¥i¥H¹F¨ì12.74%¡C¬°¤F§ó¶i¤@¨Bªº§ïµ½¡A§Ú­Ì¥i¥HÂǥѵo­µÅܲ§³W«h¨Ó½Õ¾Aµo­µÃã¨å¡A¨ä¤¤µo­µÅܲ§³W«h¬O¥iÂǥѤH¤u®Õ¥¿«áªº­µ¼Ð±À¾É¥X¨Ó¡Cµo­µÅܲ§³W«h¥i¤À¬°¨â¤jÃþ: ²Ä¤@¬O¥Hª¾ÃѬ°°ò¦ªºÅܲ§³W«h¡B²Ä¤G¬O¥H¸ê®Æ¾É¦V¬°°ò¦ªºÅܲ§³W«h¡CÂǥѾã¦Xµo­µÅܲ§³W«hªº¤è¦¡¡A¿ù»~²v¥i¥H¦A­°§C¨ì10.56%¡CÁöµM³o­Ó§Þ³N¬O±M¬°¥x»y»y­µ©Òµo®iªº¡A¦ý¥LÀ³¥i®e©ö¦aÀ³¥Î¦b¨ä¥L¤¤¤å»y¨tªº»y¨¥©Î¤è¨¥¤W¡C

ÃöÁä¦r:
¦Û°Ê­µ¼ÐÂà¼g¡B­µ¼Ð¿ëÃÑ¡B¤å¦rÂà­µ¼Ð¡Bµo­µÅܲ§¡B¤¤¤å¤å¦r¡B¥x»y(»Ô«n)¡B¤è¨¥¡B¦ò¸g
¡@


¡@