¶ÔÓÚÖÐÎÄÎı¾¶øÑÔ£¬·Ö´Ê×÷ΪԤ´¦ÀíµÄÊ×Òª²½Ö裬Æä׼ȷÐÔÖ±½Ó¹ØÏµµ½ºóÐø·ÖÎöµÄÓÐЧÐÔ
ÔÚÖÚ¶à·Ö´Ê¹¤¾ßÖУ¬RÓïÑÔ½áºÏLinux»·¾³ÏµÄRwordseg°ü£¬Æ¾½èÆä¸ßЧ¡¢Áé»îµÄÌØµã£¬³ÉΪÁËÖÚ¶àÊý¾Ý·ÖÎöʦºÍÑо¿ÕßÃǵÄÊ×Ñ¡
±¾ÎÄÖ¼ÔÚÉîÈë̽ÌÖRwordsegµÄÓÅÊÆ¡¢Ê¹Ó÷½·¨¼°ÆäÔÚʵ¼ÊÓ¦ÓÃÖеļÛÖµ£¬ÒÔÆÚΪ¶ÁÕßÌṩһ¸öÈ«Ãæ¶øÉîÈëµÄÀí½â
Ò»¡¢RÓïÑÔÓëLinux»·¾³µÄÓÅÊÆ¸ÅÊö RÓïÑÔ£º×÷Ϊͳ¼Æ·ÖÎöºÍÊý¾Ý¿ÉÊÓ»¯µÄÇ¿´ó¹¤¾ß£¬RÓïÑÔÒÔÆä¿ªÔ´¡¢Áé»î¡¢ÉçÇøÖ§³Ö¹ã·ºµÈÌØÐÔ£¬ÔÚÊý¾Ý¿ÆÑ§ÁìÓòÕ¼¾ÝÁËһϯ֮µØ
RÓïÑÔ²»½öÓµÓзḻµÄͳ¼Æº¯ÊýºÍͼÐλæÖƹ¦ÄÜ£¬»¹Í¨¹ýCRAN£¨Comprehensive R Archive Network£©ÌṩÁËÊýÒÔǧ¼ÆµÄÀ©Õ¹°ü£¬¸²¸ÇÁË´Ó»ù´¡Í³¼Æ·ÖÎöµ½¸ß¼¶»úÆ÷ѧϰµÄËùÓÐÐèÇó
¶ÔÓÚÎı¾·ÖÎö¶øÑÔ£¬RÓïÑÔͬÑùÌṩÁ˷ḻµÄÎı¾´¦Àí¹¤¾ßºÍ°ü£¬Èçtm¡¢text2vecµÈ£¬ÎªÖÐÎÄ·Ö´ÊÌṩÁËÁ¼ºÃµÄÉú̬»ù´¡
Linux»·¾³£º×÷Ϊ·þÎñÆ÷²Ù×÷ϵͳµÄÊ×Ñ¡£¬LinuxÒÔÆäÎȶ¨ÐÔ¡¢¸ßЧÐÔ¡¢°²È«ÐÔÒÔ¼°Ç¿´óµÄÃüÁîÐнçÃæ£¬³ÉΪÁËÊý¾Ý¿ÆÑ§¼ÒºÍ¿ª·¢ÈËÔ±µÄÀíÏ빤×÷»·¾³
ÔÚLinux»·¾³ÏÂÔËÐÐR£¬¿ÉÒÔ³ä·ÖÀûÓÃÆä¶àºËÐÄ´¦ÀíÄÜÁ¦ºÍ¸ßЧµÄÄÚ´æ¹ÜÀí»úÖÆ£¬´¦Àí´ó¹æÄ£Êý¾Ý¼¯Ê±±íÏÖÓÈΪ³öÉ«
´ËÍ⣬Linux·á¸»µÄÈí¼þ°ü¹ÜÀíϵͳ£¨Èçapt¡¢yum£©Ê¹µÃ°²×°ºÍÅäÖø÷À๤¾ßºÍ¿â±äµÃÒì³£¼ò±ã£¬ÎªRwordsegµÄ°²×°ºÍÒÀÀµ¹ÜÀíÌṩÁ˼«´óµÄ±ãÀû
¶þ¡¢Rwordseg½éÉÜÓëÓÅÊÆ Rwordseg£ºÊÇ»ùÓÚRÓïÑÔµÄÒ»¸öÖÐÎķִʰü£¬Ëü·â×°Á˶àÖÖÁ÷ÐеÄÖÐÎÄ·Ö´ÊÒýÇæ£¨Èçjieba¡¢Ansj¡¢ICTCLASµÈ£©£¬Ê¹µÃÔÚR»·¾³ÖнøÐÐÖÐÎÄÎı¾·Ö´Ê±äµÃ¼òµ¥Ò×ÐÐ
Rwordseg²»½öÖ§³Ö»ù±¾µÄ·Ö´Ê¹¦ÄÜ£¬»¹ÌṩÁ˹ؼü´ÊÌáÈ¡¡¢´ÊÐÔ±ê×¢µÈ¸ß¼¶¹¦ÄÜ£¬¼«´óµØ·á¸»ÁËÖÐÎÄÎı¾·ÖÎöµÄÊÖ¶Î
ÓÅÊÆ·ÖÎö£º 1.Ò×ÓÃÐÔ£ºRwordsegͨ¹ýRÓïÑÔ½Ó¿Ú£¬½µµÍÁËÖÐÎķִʵļ¼ÊõÃż÷£¬Ê¹µÃ¼´±ãÊÇ·ÇרҵNLP±³¾°µÄÊý¾Ý·ÖÎöʦҲÄÜÇáËÉÉÏÊÖ
2.Áé»îÐÔ£ºÖ§³Ö¶àÖÖ·Ö´ÊÒýÇæ£¬Óû§¿ÉÒÔ¸ù¾Ý¾ßÌåÐèÇóÑ¡ÔñºÏÊʵķִÊËã·¨£¬Æ½ºâ·Ö´Ê¾«¶ÈºÍËÙ¶È
3.¿ÉÀ©Õ¹ÐÔ£ºRwordseg×÷ΪR°ü£¬¿ÉÒÔÇáËɼ¯³Éµ½RÓïÑÔµÄÊý¾Ý´¦ÀíºÍ·ÖÎöÁ÷³ÌÖУ¬ÓëÆäËûÎı¾´¦Àí¡¢»úÆ÷ѧϰ°üÎÞ·ì¶Ô½Ó
4.ÉçÇøÖ§³Ö£ºµÃÒæÓÚRÓïÑԵĹ㷺ӰÏìÁ¦£¬RwordsegÓµÓлîÔ¾µÄÉçÇøÖ§³Ö£¬²»¶ÏÓÐÓû§¹±Ï×еķִÊÒýÇæºÍËã·¨ÓÅ»¯£¬±£³ÖÆäÓëʱ¾ã½ø
Èý¡¢RwordsegµÄʵսӦÓà °²×°ÓëÅäÖ㺠ÔÚLinux»·¾³Ï£¬°²×°Rwordseg·Ç³£¼òµ¥
Ê×ÏÈÈ·±£ÒѰ²×°RºÍRStudio£¨¿ÉÑ¡£©£¬È»ºó¿ÉÒÔͨ¹ýRµÄ°ü¹ÜÀíÆ÷º¯Êý`install.packages()`À´°²×°Rwordseg£º install.packages(Rwordseg) °²×°Íê³Éºó£¬¼ÓÔØRwordseg°ü£º library(Rwordseg) »ù±¾·Ö´ÊʾÀý£º ʹÓÃjieba·Ö´ÊÒýÇæ½øÐмòµ¥·Ö´Ê£º text <- ÎÒ°®×ÔÈ»ÓïÑÔ´¦Àí words <- segmentCN(text, method = jieba) print(words) Êä³ö½«ÊǷִʺóµÄ½á¹ûÁбí
¹Ø¼ü´ÊÌáÈ¡£º Rwordseg»¹ÌṩÁË»ùÓÚTF-IDFµÈËã·¨µÄ¹Ø¼ü´ÊÌáÈ¡¹¦ÄÜ£¬¶ÔÓÚÎı¾ÕªÒª¡¢Ö÷Ìâʶ±ðµÈÈÎÎñ·Ç³£ÓÐÓÃ
keywords <-extract_keywords(text, method = jieba, topN = print(keywords) ´ÊÐÔ±ê×¢£º ´ÊÐÔ±ê×¢ÓÐÖúÓÚÀí½âÿ¸ö´ÊÓïÔÚ¾ä×ÓÖеĽÇÉ«£¬¶ÔÓÚºóÐøµÄÇé¸Ð·ÖÎö¡¢¾ä·¨·ÖÎöµÈÈÎÎñÖÁ¹ØÖØÒª
pos <-pos