{"id":1576,"date":"2021-09-10T15:13:15","date_gmt":"2021-09-10T06:13:15","guid":{"rendered":"https:\/\/tech.at-iroha.jp\/?p=1576"},"modified":"2021-09-13T19:42:20","modified_gmt":"2021-09-13T10:42:20","slug":"%e8%87%aa%e7%84%b6%e8%a8%80%e8%aa%9e%e5%87%a6%e7%90%86%ef%bc%9atf-idf%e3%81%ab%e3%82%88%e3%82%8b%e9%87%8d%e8%a6%81%e5%8d%98%e8%aa%9e%e6%8a%bd%e5%87%ba","status":"publish","type":"post","link":"https:\/\/tech.at-iroha.jp\/?p=1576","title":{"rendered":"\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\uff1aTF-IDF\u306b\u3088\u308b\u91cd\u8981\u5358\u8a9e\u62bd\u51fa"},"content":{"rendered":"\n<p><a href=\"https:\/\/tech.at-iroha.jp\/?p=1530\" data-type=\"URL\" data-id=\"https:\/\/tech.at-iroha.jp\/?p=1530\">\u524d\u56de<\/a>\u306b\u5f15\u304d\u7d9a\u304d\u3001\u7814\u7a76\u8ad6\u6587\u57f7\u7b46\u306e\u305f\u3081\u306b\u69d8\u3005\u306a\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u3092\u8abf\u67fb\u3057\u3066\u304a\u308a\u3001\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u306e\u57fa\u790e\u3068\u3082\u3044\u3048\u308bTF-IDF\u3092\u4f7f\u3063\u305f\u91cd\u8981\u5358\u8a9e\u62bd\u51fa\u306b\u3064\u3044\u3066\u307e\u3068\u3081\u3066\u307f\u307e\u3057\u305f\u3002\u53e4\u5178\u7684\u306a\u624b\u6cd5\u3067\u3059\u304c\u3001\u3068\u3066\u3082\u30b7\u30f3\u30d7\u30eb\u306a\u8a08\u7b97\u5f0f\u3067\u6c42\u3081\u3089\u308c\u308b\u305f\u3081\u3001\u4eca\u3082\u69d8\u3005\u306a\u5206\u91ce\u3067\u6d3b\u7528\u3055\u308c\u3066\u3044\u307e\u3059\u3002<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">TF-IDF\u306b\u3064\u3044\u3066<\/h2>\n\n\n\n<p>TF-IDF\u3068\u306f\u6587\u66f8\u4e2d\u306b\u542b\u307e\u308c\u308b\u5358\u8a9e\u306e\u91cd\u8981\u5ea6\u3092\u8a55\u4fa1\u3059\u308b\u624b\u6cd5\u306e1\u3064\u3067\u3001\u4ee5\u4e0b\u306e\u3088\u3046\u306a\u5f0f\u3067\u8a08\u7b97\u3055\u308c\u307e\u3059\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/tech.at-iroha.jp\/wp-content\/uploads\/2021\/09\/tfidf.png\" alt=\"\" class=\"wp-image-1577\" width=\"311\" height=\"232\"\/><\/figure>\n\n\n\n<p>tf(i, j) = \u6587\u7ae0j\u306b\u304a\u3051\u308b\u5358\u8a9ei\u306e\u51fa\u73fe\u983b\u5ea6 \/ \u6587\u7ae0\u306b\u304a\u3051\u308b\u5168\u5358\u8a9e\u306e\u51fa\u73fe\u983b\u5ea6\u306e\u548c<br>idf(i) = log(\u5168\u6587\u7ae0\u6570 \/ \u5358\u8a9ei\u3092\u542b\u3080\u6587\u7ae0\u6570)<br>tfidf(i, j) = tf(i, j) * idf(i) = \u5358\u8a9e\u306e\u91cd\u8981\u5ea6<\/p>\n\n\n\n<p>tf\uff08Term Frequency\uff09\u30fb\u30fb\u30fb\u5358\u8a9e\u306e\u51fa\u73fe\u983b\u5ea6\uff08\u6587\u7ae0A\u306b\u304a\u3051\u308b\u5358\u8a9ex\u306e\u51fa\u73fe\u983b\u5ea6\uff09<br>idf\uff08Inverse Document Frequency\uff09\u30fb\u30fb\u30fb\u9006\u6587\u66f8\u983b\u5ea6\uff08\u4e00\u822c\u7684\u306a\u7528\u8a9e\u7a0b\u3001idf\u306e\u5024\u304c\u4f4e\u304f\u306a\u308a\u3001\u91cd\u8981\u306a\u5358\u8a9e\u7a0b\u9ad8\u304f\u306a\u308b\uff09<\/p>\n\n\n\n<p> tf(i, j) * idf(i) \u3092\u6c42\u3081\u308b\u3068\u7279\u5b9a\u306e\u6587\u7ae0\u306b\u542b\u307e\u308c\u308b\u7279\u5b9a\u306e\u5358\u8a9e\u306e\u91cd\u8981\u5ea6\u304c\u7b97\u51fa\u3067\u304d\u308b\u3088\u3046\u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n\n\n\n<p>\u4f8b\u3048\u3070\u4ee5\u4e0b\u306e3\u3064\u306e\u6587\u7ae0\u306b\u304a\u3044\u3066\u5404\u5358\u8a9e\u306e\u91cd\u8981\u5ea6\u3092\u6c42\u3081\u308b\u5834\u5408\u3001\u4ee5\u4e0b\u306e\u3088\u3046\u306b\u8a08\u7b97\u3057\u307e\u3059\u3002<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\u6587\u7ae0A:[\u30d6\u30ed\u30f3\u30ba, \u30b7\u30eb\u30d0\u30fc, \u30b7\u30eb\u30d0\u30fc]<br>\u6587\u7ae0B:[\u30d6\u30ed\u30f3\u30ba, \u30b4\u30fc\u30eb\u30c9],<br>\u6587\u7ae0C:[\u30d6\u30ed\u30f3\u30ba, \u30b4\u30fc\u30eb\u30c9, \u30d7\u30e9\u30c1\u30ca]<\/pre>\n\n\n\n<p>\u307e\u305a\u5404\u5358\u8a9e\u306e\u51fa\u73fe\u983b\u5ea6 tf \u3092\u6c42\u3081\u307e\u3059\u3002<br>tf(i, j) = \u6587\u7ae0j\u306b\u304a\u3051\u308b\u5358\u8a9ei\u306e\u51fa\u73fe\u983b\u5ea6 \/ \u6587\u7ae0\u306b\u304a\u3051\u308b\u5168\u5358\u8a9e\u306e\u51fa\u73fe\u983b\u5ea6\u306e\u548c <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">tf(\u30d6\u30ed\u30f3\u30ba, A) = 1\/3 = 0.33\ntf(\u30b7\u30eb\u30d0\u30fc, A) = 2\/3 = 0.66\n\ntf(\u30d6\u30ed\u30f3\u30ba, B) = 1\/2 = 0.5\ntf(\u30b4\u30fc\u30eb\u30c9, B) = 1\/2 = 0.5\n\ntf(\u30d6\u30ed\u30f3\u30ba, C) = 1\/3 = 0.33\ntf(\u30b4\u30fc\u30eb\u30c9, C) = 1\/3 = 0.33\ntf(\u30d7\u30e9\u30c1\u30ca, C) = 1\/3 = 0.33<\/pre>\n\n\n\n<p>\u6b21\u306b\u9006\u6587\u66f8\u983b\u5ea6 idf \u3092\u6c42\u3081\u307e\u3059\u3002<br>idf(i) = log(\u5168\u6587\u7ae0\u6570 \/ \u5358\u8a9ei\u3092\u542b\u3080\u6587\u7ae0\u6570) <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">idf(\u30d6\u30ed\u30f3\u30ba) = log2(3\/3) = 0\nidf(\u30b7\u30eb\u30d0\u30fc) = log2(3\/1) = 1.58\nidf(\u30b4\u30fc\u30eb\u30c9) = log2(3\/2) = 0.58\nidf(\u30d7\u30e9\u30c1\u30ca) = log2(3\/1) = 1.58<\/pre>\n\n\n\n<p>\u6700\u5f8c\u306b tfidf \u3092\u6c42\u3081\u307e\u3059\u3002<br>tfidf(i, j) = tf(i, j) * idf(i) <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">tf(\u30d6\u30ed\u30f3\u30ba,A) * idf(\u30d6\u30ed\u30f3\u30ba) = 0\ntf(\u30b7\u30eb\u30d0\u30fc,A) * idf(\u30b7\u30eb\u30d0\u30fc) = 1.04\n\ntf(\u30d6\u30ed\u30f3\u30ba,B) * idf(\u30d6\u30ed\u30f3\u30ba) = 0\ntf(\u30b4\u30fc\u30eb\u30c9,B) * idf(\u30b4\u30fc\u30eb\u30c9) = 0.29\n\ntf(\u30d6\u30ed\u30f3\u30ba,C) * idf(\u30d6\u30ed\u30f3\u30ba) = 0\ntf(\u30b4\u30fc\u30eb\u30c9,C) * idf(\u30b4\u30fc\u30eb\u30c9) = 0.19\ntf(\u30d7\u30e9\u30c1\u30ca,C) * idf(\u30d7\u30e9\u30c1\u30ca) = 0.52<\/pre>\n\n\n\n<p>tfidf \u306e\u6570\u5024\u304c\u9ad8\u3044\u5358\u8a9e\u7a0b\u3001\u91cd\u8981\u306a\u5358\u8a9e\u3068\u3055\u308c\u3001\u5404\u6587\u7ae0\u3067\u6700\u3082\u91cd\u8981\u5ea6\u306e\u9ad8\u3044\u5358\u8a9e\u306f\u4ee5\u4e0b\u306e\u901a\u308a\u3068\u306a\u308a\u307e\u3059\u3002<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\u6587\u7ae0A : \u30b7\u30eb\u30d0\u30fc<br>\u6587\u7ae0B : \u30b4\u30fc\u30eb\u30c9<br>\u6587\u7ae0C : \u30d7\u30e9\u30c1\u30ca<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">\u74b0\u5883\u69cb\u7bc9<\/h2>\n\n\n\n<p>\u5b9f\u969b\u306b MeCab \u3068 Python \u3092\u4f7f\u7528\u3057\u3066\u91cd\u8981\u5358\u8a9e\u306e\u62bd\u51fa\u3059\u308b\u305f\u3081\u306b\u74b0\u5883\u69cb\u7bc9\u3092\u884c\u3044\u307e\u3059\u3002<\/p>\n\n\n\n<p>\u4eca\u56de\u306f AWS \u306e EC2 (Ubuntu 20) \u4e0a\u306b\u74b0\u5883\u3092\u69cb\u7bc9\u3057\u307e\u3059\u3002<\/p>\n\n\n\n<p>\u307e\u305a pip \uff08Python \u7528\u30d1\u30c3\u30b1\u30fc\u30b8\u30de\u30cd\u30fc\u30b8\u30e3\uff09\u3092\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u3057\u307e\u3059\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo apt-get update\nsudo apt install python3-pip<\/code><\/pre>\n\n\n\n<p> scikit-learn\uff08Python \u7528\u6a5f\u68b0\u5b66\u7fd2\u7528\u30e9\u30a4\u30d6\u30e9\u30ea\uff09\u3092\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u3057\u307e\u3059\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo pip3 install scikit-learn<\/code><\/pre>\n\n\n\n<p>MeCab \u306e\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u3057\u307e\u3059\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo apt install mecab\nsudo apt install libmecab-dev\nsudo apt install mecab-ipadic-utf8<\/code><\/pre>\n\n\n\n<p>Python\u7528\u306e MeCab \u30e9\u30a4\u30d6\u30e9\u30ea\u3092\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u3057\u307e\u3059\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo pip3 install mecab-python3\nsudo pip3 install unidic-lite<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">\u30b9\u30af\u30ea\u30d7\u30c8\u306e\u4f5c\u6210<\/h2>\n\n\n\n<p>tfidf_test.py \u3068\u3044\u3046\u30d5\u30a1\u30a4\u30eb\u540d\u3067\u3001\u4ee5\u4e0b\u306e\u3088\u3046\u306a\u30b9\u30af\u30ea\u30d7\u30c8\u3092\u4f5c\u6210\u3057\u307e\u3059\u3002 <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import MeCab\n\n#\u6307\u5b9a\u3057\u305f\u30d5\u30a1\u30a4\u30eb\u3092\u5f62\u614b\u7d20\u89e3\u6790\u3057\u3001\u5358\u8a9e\u3092\u7a7a\u767d\u533a\u5207\u308a\u3067\u8fd4\u3059\u95a2\u6570\ndef get_text(input_file_name):\n\twith open(input_file_name, 'r', encoding='utf-8') as f:\n\t\ttext = f.read()\n\t\n\t#MeCab \u3092\u4f7f\u7528\u3057\u3066\u5f62\u614b\u7d20\u89e3\u6790\n\tmecab = MeCab.Tagger(\"-O chasen -d \/var\/lib\/mecab\/dic\/ipadic-utf8\/\")\n\tnode  = mecab.parseToNode(text)\n\twords = &#91;]\n\n\t#\u540d\u8a5e\u3001\u52d5\u8a5e\u3001\u52d5\u8a5e\u3067\u3042\u308b\u5358\u8a9e\u306e\u307f\u3092\u62bd\u51fa\n\twhile node:\n\t\tif node.feature.split(\",\")&#91;0] == u\"\u540d\u8a5e\":\n\t\t\twords.append(node.surface)\n\t\telif node.feature.split(\",\")&#91;0] == u\"\u5f62\u5bb9\u8a5e\":\n\t\t\twords.append(node.feature.split(\",\")&#91;6])\n#\t\telif node.feature.split(\",\")&#91;0] == u\"\u52d5\u8a5e\":\n#\t\t\twords.append(node.feature.split(\",\")&#91;6])\n\t\tnode = node.next\n\n\t#\u5358\u8a9e\u3092\u7a7a\u767d\u3067\u7d50\u5408\n\ttext = ' '.join(words);\n\t\n\treturn text\n\nfrom sklearn.feature_extraction.text import TfidfVectorizer\n\ndocs = &#91;\n\tget_text('sample1.txt'),\n\tget_text('sample2.txt'),\n\tget_text('sample3.txt'),\n\tget_text('sample4.txt'),\n\tget_text('sample5.txt'),\n]\n\n# tf-idf\u306e\u8a08\u7b97\uff08\u6587\u66f8\u5168\u4f53\u306e50%\u4ee5\u4e0a\u3067\u51fa\u73fe\u3059\u308b\u5358\u8a9e\u306f\u7121\u8996\uff09\nvectorizer = TfidfVectorizer(max_df = 0.5) \n\n#tf-idf\u884c\u5217\u3092\u53d6\u5f97\nmatrix = vectorizer.fit_transform(docs)\n\n#\u5358\u8a9e\u30ea\u30b9\u30c8\u3092\u53d6\u5f97\nprint(vectorizer.get_feature_names())\n\nwords = vectorizer.get_feature_names()\n\nfor doc_no, vec in zip(range(len(docs)), matrix.toarray()):\n\tprint('doc_no:', doc_no)\n\t\n\tfor w_id, tfidf in sorted(enumerate(vec), key = lambda x: x&#91;1], reverse=True):\n\t\tword = words&#91;w_id]\n\t\tprint('\\t{0:s}: {1:f}'.format(word, tfidf))\n\n\n<\/code><\/pre>\n\n\n\n<p>\u30b5\u30f3\u30d7\u30eb\u306e\u6587\u7ae0\u3068\u3057\u3066\u4ee5\u4e0b\u306e5\u793e\u306e\u4f01\u696d\u7406\u5ff5\u3092\u7528\u610f\u3057\u307e\u3057\u305f\u3002<\/p>\n\n\n\n<p>sample1.txt\uff08\u30c8\u30e8\u30bf\u306e\u4f01\u696d\u7406\u5ff5\uff09<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\u30af\u30ea\u30fc\u30f3\u3067\u5b89\u5168\u306a\u5546\u54c1\u306e\u63d0\u4f9b\u3092\u901a\u3058\u3066\u3001\u8c4a\u304b\u306a\u793e\u4f1a\u3065\u304f\u308a\u306b\u8ca2\u732e\u3057\u3001\u56fd\u969b\u793e\u4f1a\u304b\u3089\u4fe1\u983c\u3055\u308c\u308b\u826f\u304d\u4f01\u696d\u5e02\u6c11\u3092\u3081\u3056\u3057\u3066\u3044\u307e\u3059\u3002<\/pre>\n\n\n\n<p> sample2.txt\uff08ANA\u306e\u4f01\u696d\u7406\u5ff5\uff09 <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\u5b89\u5fc3\u3068\u4fe1\u983c\u3092\u57fa\u790e\u306b\u3001\u4e16\u754c\u3092\u3064\u306a\u3050\u5fc3\u306e\u7ffc\u3067\u5922\u306b\u3042\u3075\u308c\u308b\u672a\u6765\u306b\u8ca2\u732e\u3057\u307e\u3059\u3002<\/pre>\n\n\n\n<p> sample3.txt\uff08\u4e09\u83f1\u5546\u4e8b\u306e\u4f01\u696d\u7406\u5ff5\uff09  <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\u6240\u671f\u5949\u516c\uff08\u4e8b\u696d\u3092\u901a\u3058\u3001\u7269\u5fc3\u5171\u306b\u8c4a\u304b\u306a\u793e\u4f1a\u306e\u5b9f\u73fe\u306b\u52aa\u529b\u3059\u308b\u3068\u540c\u6642\u306b\u3001\u304b\u3051\u304c\u3048\u306e\u306a\u3044\u5730\u7403\u74b0\u5883\u306e\u7dad\u6301\u306b\u3082\u8ca2\u732e\u3059\u308b\u3002\uff09<\/pre>\n\n\n\n<p> sample4.txt\uff08Amazon\u306e\u4f01\u696d\u7406\u5ff5\uff09  <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\u304a\u5ba2\u69d8\u304c\u30aa\u30f3\u30e9\u30a4\u30f3\u3067\u6c42\u3081\u308b\u3042\u3089\u3086\u308b\u3082\u306e\u3092\u63a2\u3057\u3066\u767a\u6398\u3057\u3001\u51fa\u6765\u308b\u9650\u308a\u4f4e\u4fa1\u683c\u3067\u3054\u63d0\u4f9b\u3059\u308b\u3088\u3046\u52aa\u3081\u308b\u3053\u3068\u3002<\/pre>\n\n\n\n<p> sample5.txt\uff08Google\u306e\u4f01\u696d\u7406\u5ff5\uff09  <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\u4e16\u754c\u4e2d\u306e\u60c5\u5831\u3092\u6574\u7406\u3057\u3001\u4e16\u754c\u4e2d\u306e\u4eba\u304c\u30a2\u30af\u30bb\u30b9\u3067\u304d\u3066\u4f7f\u3048\u308b\u3088\u3046\u306b\u3059\u308b\u3053\u3068\u3067\u3059\u3002<\/pre>\n\n\n\n<p>\u30b9\u30af\u30ea\u30d7\u30c8\u3092\u5b9f\u884c\u3059\u308b\u3068\u4ee5\u4e0b\u306e\u3088\u3046\u306b\u51fa\u529b\u3055\u308c\u307e\u3059\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>python3 tfidf_test.py<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">['\u304a\u5ba2\u69d8', '\u304b\u3051\u304c\u3048', '\u3053\u3068', '\u3065\u304f\u308a', '\u306a\u3044', '\u3082\u306e', '\u3088\u3046', '\u30a2\u30af\u30bb\u30b9', '\u30aa\u30f3\u30e9\u30a4\u30f3', '\u30af\u30ea\u30fc\u30f3', '\u4e16\u754c', '\u4e16\u754c\u4e2d', '\u4e8b\u696d', '\u4f01\u696d', '\u4fa1\u683c', '\u4fe1\u983c', '\u52aa\u529b', '\u5546\u54c1', '\u56fd\u969b', '\u5730\u7403', '\u57fa\u790e', '\u5949\u516c', '\u5b89\u5168', '\u5b89\u5fc3', '\u5b9f\u73fe', '\u5e02\u6c11', '\u60c5\u5831', '\u6240\u671f', '\u63d0\u4f9b', '\u6574\u7406', '\u672a\u6765', '\u7269\u5fc3', '\u74b0\u5883', '\u767a\u6398', '\u793e\u4f1a', '\u7dad\u6301', '\u826f\u3044', '\u8c4a\u304b', '\u9650\u308a']\ndoc_no: 0\n        \u793e\u4f1a: 0.455365\n        \u3065\u304f\u308a: 0.282207\n        \u30af\u30ea\u30fc\u30f3: 0.282207\n        \u4f01\u696d: 0.282207\n        \u5546\u54c1: 0.282207\n        \u56fd\u969b: 0.282207\n        \u5b89\u5168: 0.282207\n        \u5e02\u6c11: 0.282207\n        \u826f\u3044: 0.282207\n        \u4fe1\u983c: 0.227683\n        \u63d0\u4f9b: 0.227683\n        \u8c4a\u304b: 0.227683\n        \u304a\u5ba2\u69d8: 0.000000\n        \u304b\u3051\u304c\u3048: 0.000000\n        \u3053\u3068: 0.000000\n        \u306a\u3044: 0.000000\n        \u3082\u306e: 0.000000\n        \u3088\u3046: 0.000000\n        \u30a2\u30af\u30bb\u30b9: 0.000000\n        \u30aa\u30f3\u30e9\u30a4\u30f3: 0.000000\n        \u4e16\u754c: 0.000000\n        \u4e16\u754c\u4e2d: 0.000000\n        \u4e8b\u696d: 0.000000\n        \u4fa1\u683c: 0.000000\n        \u52aa\u529b: 0.000000\n        \u5730\u7403: 0.000000\n        \u57fa\u790e: 0.000000\n        \u5949\u516c: 0.000000\n        \u5b89\u5fc3: 0.000000\n        \u5b9f\u73fe: 0.000000\n        \u60c5\u5831: 0.000000\n        \u6240\u671f: 0.000000\n        \u6574\u7406: 0.000000\n        \u672a\u6765: 0.000000\n        \u7269\u5fc3: 0.000000\n        \u74b0\u5883: 0.000000\n        \u767a\u6398: 0.000000\n        \u7dad\u6301: 0.000000\n        \u9650\u308a: 0.000000\ndoc_no: 1\n        \u4e16\u754c: 0.463693\n        \u57fa\u790e: 0.463693\n        \u5b89\u5fc3: 0.463693\n        \u672a\u6765: 0.463693\n        \u4fe1\u983c: 0.374105\n        \u304a\u5ba2\u69d8: 0.000000\n        \u304b\u3051\u304c\u3048: 0.000000\n        \u3053\u3068: 0.000000\n        \u3065\u304f\u308a: 0.000000\n        \u306a\u3044: 0.000000\n        \u3082\u306e: 0.000000\n        \u3088\u3046: 0.000000\n        \u30a2\u30af\u30bb\u30b9: 0.000000\n        \u30aa\u30f3\u30e9\u30a4\u30f3: 0.000000\n        \u30af\u30ea\u30fc\u30f3: 0.000000\n        \u4e16\u754c\u4e2d: 0.000000\n        \u4e8b\u696d: 0.000000\n        \u4f01\u696d: 0.000000\n        \u4fa1\u683c: 0.000000\n        \u52aa\u529b: 0.000000\n        \u5546\u54c1: 0.000000\n        \u56fd\u969b: 0.000000\n        \u5730\u7403: 0.000000\n        \u5949\u516c: 0.000000\n        \u5b89\u5168: 0.000000\n        \u5b9f\u73fe: 0.000000\n        \u5e02\u6c11: 0.000000\n        \u60c5\u5831: 0.000000\n        \u6240\u671f: 0.000000\n        \u63d0\u4f9b: 0.000000\n        \u6574\u7406: 0.000000\n        \u7269\u5fc3: 0.000000\n        \u74b0\u5883: 0.000000\n        \u767a\u6398: 0.000000\n        \u793e\u4f1a: 0.000000\n        \u7dad\u6301: 0.000000\n        \u826f\u3044: 0.000000\n        \u8c4a\u304b: 0.000000\n        \u9650\u308a: 0.000000\ndoc_no: 2\n        \u304b\u3051\u304c\u3048: 0.285112\n        \u306a\u3044: 0.285112\n        \u4e8b\u696d: 0.285112\n        \u52aa\u529b: 0.285112\n        \u5730\u7403: 0.285112\n        \u5949\u516c: 0.285112\n        \u5b9f\u73fe: 0.285112\n        \u6240\u671f: 0.285112\n        \u7269\u5fc3: 0.285112\n        \u74b0\u5883: 0.285112\n        \u7dad\u6301: 0.285112\n        \u793e\u4f1a: 0.230026\n        \u8c4a\u304b: 0.230026\n        \u304a\u5ba2\u69d8: 0.000000\n        \u3053\u3068: 0.000000\n        \u3065\u304f\u308a: 0.000000\n        \u3082\u306e: 0.000000\n        \u3088\u3046: 0.000000\n        \u30a2\u30af\u30bb\u30b9: 0.000000\n        \u30aa\u30f3\u30e9\u30a4\u30f3: 0.000000\n        \u30af\u30ea\u30fc\u30f3: 0.000000\n        \u4e16\u754c: 0.000000\n        \u4e16\u754c\u4e2d: 0.000000\n        \u4f01\u696d: 0.000000\n        \u4fa1\u683c: 0.000000\n        \u4fe1\u983c: 0.000000\n        \u5546\u54c1: 0.000000\n        \u56fd\u969b: 0.000000\n        \u57fa\u790e: 0.000000\n        \u5b89\u5168: 0.000000\n        \u5b89\u5fc3: 0.000000\n        \u5e02\u6c11: 0.000000\n        \u60c5\u5831: 0.000000\n        \u63d0\u4f9b: 0.000000\n        \u6574\u7406: 0.000000\n        \u672a\u6765: 0.000000\n        \u767a\u6398: 0.000000\n        \u826f\u3044: 0.000000\n        \u9650\u308a: 0.000000\ndoc_no: 3\n        \u304a\u5ba2\u69d8: 0.354602\n        \u3082\u306e: 0.354602\n        \u30aa\u30f3\u30e9\u30a4\u30f3: 0.354602\n        \u4fa1\u683c: 0.354602\n        \u767a\u6398: 0.354602\n        \u9650\u308a: 0.354602\n        \u3053\u3068: 0.286091\n        \u3088\u3046: 0.286091\n        \u63d0\u4f9b: 0.286091\n        \u304b\u3051\u304c\u3048: 0.000000\n        \u3065\u304f\u308a: 0.000000\n        \u306a\u3044: 0.000000\n        \u30a2\u30af\u30bb\u30b9: 0.000000\n        \u30af\u30ea\u30fc\u30f3: 0.000000\n        \u4e16\u754c: 0.000000\n        \u4e16\u754c\u4e2d: 0.000000\n        \u4e8b\u696d: 0.000000\n        \u4f01\u696d: 0.000000\n        \u4fe1\u983c: 0.000000\n        \u52aa\u529b: 0.000000\n        \u5546\u54c1: 0.000000\n        \u56fd\u969b: 0.000000\n        \u5730\u7403: 0.000000\n        \u57fa\u790e: 0.000000\n        \u5949\u516c: 0.000000\n        \u5b89\u5168: 0.000000\n        \u5b89\u5fc3: 0.000000\n        \u5b9f\u73fe: 0.000000\n        \u5e02\u6c11: 0.000000\n        \u60c5\u5831: 0.000000\n        \u6240\u671f: 0.000000\n        \u6574\u7406: 0.000000\n        \u672a\u6765: 0.000000\n        \u7269\u5fc3: 0.000000\n        \u74b0\u5883: 0.000000\n        \u793e\u4f1a: 0.000000\n        \u7dad\u6301: 0.000000\n        \u826f\u3044: 0.000000\n        \u8c4a\u304b: 0.000000\ndoc_no: 4\n        \u4e16\u754c\u4e2d: 0.694134\n        \u30a2\u30af\u30bb\u30b9: 0.347067\n        \u60c5\u5831: 0.347067\n        \u6574\u7406: 0.347067\n        \u3053\u3068: 0.280011\n        \u3088\u3046: 0.280011\n        \u304a\u5ba2\u69d8: 0.000000\n        \u304b\u3051\u304c\u3048: 0.000000\n        \u3065\u304f\u308a: 0.000000\n        \u306a\u3044: 0.000000\n        \u3082\u306e: 0.000000\n        \u30aa\u30f3\u30e9\u30a4\u30f3: 0.000000\n        \u30af\u30ea\u30fc\u30f3: 0.000000\n        \u4e16\u754c: 0.000000\n        \u4e8b\u696d: 0.000000\n        \u4f01\u696d: 0.000000\n        \u4fa1\u683c: 0.000000\n        \u4fe1\u983c: 0.000000\n        \u52aa\u529b: 0.000000\n        \u5546\u54c1: 0.000000\n        \u56fd\u969b: 0.000000\n        \u5730\u7403: 0.000000\n        \u57fa\u790e: 0.000000\n        \u5949\u516c: 0.000000\n        \u5b89\u5168: 0.000000\n        \u5b89\u5fc3: 0.000000\n        \u5b9f\u73fe: 0.000000\n        \u5e02\u6c11: 0.000000\n        \u6240\u671f: 0.000000\n        \u63d0\u4f9b: 0.000000\n        \u672a\u6765: 0.000000\n        \u7269\u5fc3: 0.000000\n        \u74b0\u5883: 0.000000\n        \u767a\u6398: 0.000000\n        \u793e\u4f1a: 0.000000\n        \u7dad\u6301: 0.000000\n        \u826f\u3044: 0.000000\n        \u8c4a\u304b: 0.000000\n        \u9650\u308a: 0.000000<\/pre>\n\n\n\n<p>\u5404\u793e\u306e\u91cd\u8981\u5ea6\u306e\u9ad8\u3044\u5358\u8a9e\uff08\u4e0a\u4f4d5\u3064\uff09\u306f\u4ee5\u4e0b\u306e\u901a\u308a\u3068\u306a\u308a\u307e\u3057\u305f\u3002<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>\u30c8\u30e8\u30bf<\/strong><br>\u793e\u4f1a: 0.455365<br>\u3065\u304f\u308a: 0.282207<br>\u30af\u30ea\u30fc\u30f3: 0.282207<br>\u4f01\u696d: 0.282207<br>\u5546\u54c1: 0.282207<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>ANA<\/strong><br>\u4e16\u754c: 0.463693<br>\u57fa\u790e: 0.463693<br>\u5b89\u5fc3: 0.463693<br>\u672a\u6765: 0.463693<br>\u4fe1\u983c: 0.374105<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>\u4e09\u83f1\u5546\u4e8b<\/strong><br>\u304b\u3051\u304c\u3048: 0.285112<br>\u306a\u3044: 0.285112<br>\u4e8b\u696d: 0.285112<br>\u52aa\u529b: 0.285112<br>\u5730\u7403: 0.285112<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>Amazon<\/strong><br>\u304a\u5ba2\u69d8: 0.354602<br>\u3082\u306e: 0.354602<br>\u30aa\u30f3\u30e9\u30a4\u30f3: 0.354602<br>\u4fa1\u683c: 0.354602<br>\u767a\u6398: 0.354602<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>Google<\/strong><br>\u4e16\u754c\u4e2d: 0.694134<br>\u30a2\u30af\u30bb\u30b9: 0.347067<br>\u60c5\u5831: 0.347067<br>\u6574\u7406: 0.347067<br>\u3053\u3068: 0.280011<\/pre>\n\n\n\n<p>\u305d\u308c\u306a\u308a\u306b\u91cd\u8981\u3068\u601d\u308f\u308c\u308b\u5358\u8a9e\u304c\u62bd\u51fa\u3067\u304d\u3066\u3044\u307e\u3059\u304c\u3001\u7cbe\u5ea6\u3092\u4e0a\u3052\u308b\u306b\u306f\u30b5\u30f3\u30d7\u30eb\u6570\u3092\u5897\u3084\u3057\u3001\u4e00\u822c\u7684\u306a\u5358\u8a9e\u3092\u30d5\u30a3\u30eb\u30bf\u30ea\u30f3\u30b0\u3059\u308b\u5fc5\u8981\u304c\u3042\u308b\u3068\u601d\u308f\u308c\u307e\u3059\u3002<\/p>\n\n\n","protected":false},"excerpt":{"rendered":"<p>\u524d\u56de\u306b\u5f15\u304d\u7d9a\u304d\u3001\u7814\u7a76\u8ad6\u6587\u57f7\u7b46\u306e\u305f\u3081\u306b\u69d8\u3005\u306a\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u3092\u8abf\u67fb\u3057\u3066\u304a\u308a\u3001\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u306e\u57fa\u790e\u3068\u3082\u3044\u3048\u308bTF-IDF\u3092\u4f7f\u3063\u305f\u91cd\u8981\u5358\u8a9e\u62bd\u51fa\u306b\u3064\u3044\u3066\u307e\u3068\u3081\u3066\u307f\u307e\u3057\u305f\u3002\u53e4\u5178\u7684\u306a\u624b\u6cd5\u3067\u3059\u304c\u3001\u3068\u3066\u3082\u30b7\u30f3\u30d7\u30eb\u306a\u8a08\u7b97\u5f0f\u3067\u6c42\u3081\u3089\u308c\u308b\u305f\u3081\u3001\u4eca\u3082 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1577,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[],"class_list":["post-1576","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-nlp"],"_links":{"self":[{"href":"https:\/\/tech.at-iroha.jp\/index.php?rest_route=\/wp\/v2\/posts\/1576","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tech.at-iroha.jp\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tech.at-iroha.jp\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tech.at-iroha.jp\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tech.at-iroha.jp\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1576"}],"version-history":[{"count":12,"href":"https:\/\/tech.at-iroha.jp\/index.php?rest_route=\/wp\/v2\/posts\/1576\/revisions"}],"predecessor-version":[{"id":1595,"href":"https:\/\/tech.at-iroha.jp\/index.php?rest_route=\/wp\/v2\/posts\/1576\/revisions\/1595"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/tech.at-iroha.jp\/index.php?rest_route=\/wp\/v2\/media\/1577"}],"wp:attachment":[{"href":"https:\/\/tech.at-iroha.jp\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1576"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tech.at-iroha.jp\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1576"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tech.at-iroha.jp\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1576"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}