MySQL单词搜索相关度排名
一个单词搜索的相关度排名,这个例子演示了一个单词搜索的相关度排名计算。
mysql> CREATE TABLE articles (
-> id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
-> title VARCHAR(200),
-> body TEXT,
-> FULLTEXT (title,body)
-> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.56 sec)
mysql> INSERT INTO articles (title,body) VALUES
-> (‘MySQL Tutorial’,‘This database tutorial …’),
-> (“How To Use MySQL”,‘After you went through a …’),
-> (‘Optimizing Your Database’,‘In this database tutorial …’),
-> (‘MySQL vs. YourSQL’,‘When comparing databases …’),
-> (‘MySQL Security’,‘When configured properly, MySQL …’),
-> (‘Database, Database, Database’,‘database database database’),
-> (‘1001 MySQL Tricks’,‘1. Never run mysqld as root. 2. …’),
-> (‘MySQL Full-Text Indexes’, ‘MySQL fulltext indexes use a …’);
Query OK, 8 rows affected (0.02 sec)
Records: 8 Duplicates: 0 Warnings: 0
mysql> SELECT id, title, body, MATCH (title,body) AGAINST (‘database’ IN BOOLEAN MODE) AS score FROM articles ORDER BY score DESC;
±—±-----------------------------±------------------------------------±--------------------+
| id | title | body | score |
±—±-----------------------------±------------------------------------±--------------------+
| 6 | Database, Database, Database | database database database | 1.0886961221694946 |
| 3 | Optimizing Your Database | In this database tutorial … | 0.36289870738983154 |
| 1 | MySQL Tutorial | This database tutorial … | 0.18144935369491577 |
| 2 | How To Use MySQL | After you went through a … | 0 |
| 4 | MySQL vs. YourSQL | When comparing databases … | 0 |
| 5 | MySQL Security | When configured properly, MySQL … | 0 |
| 7 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. … | 0 |
| 8 | MySQL Full-Text Indexes | MySQL fulltext indexes use a … | 0 |
±—±-----------------------------±------------------------------------±--------------------+
8 rows in set (0.00 sec)
总共有8条记录,其中3条与数据库搜索词匹配。第一条记录(id 6)包含搜索词6次,相关度排名为1.0886961221694946。这个排名值计算使用TF的价值6(数据库搜索词出现6次记录id 6)和IDF值为0.42596873216370745,计算如下(8是记录的总数和3是包含搜索词的记录数量)
${IDF} = log10( 8 / 3 ) = 0.42596873216370745
mysql> select log10( 8 / 3 ) ;
±--------------------+
| log10( 8 / 3 ) |
±--------------------+
| 0.42596873216370745 |
±--------------------+
1 row in set (0.00 sec)
然后将TF和IDF值输入到排名公式中
${rank} = ${TF} * ${IDF} * ${IDF}
在MySQL命令行客户端执行计算将返回一个排名值1.088696164686938。
mysql> SELECT 6*log10(8/3)log10(8/3);
±------------------------+
| 6log10(8/3)*log10(8/3) |
±------------------------+
| 1.088696164686938 |
±------------------------+
1 row in set (0.00 sec)
您可能会注意到SELECT … MATCH … AGAINST语句和MySQL命令行客户端所计算的排名值有差别(1.0886961221694946对1.088696164686938)。区别在于InnoDB内部是如何执行整数和浮点数/双精度类型转换的(以及相关的精度和四舍五的决定),以及它们在其他地方是如何执行的,比如在MySQL命令行客户端或其他类型的计算器中。
多词搜索的相关度排名
这个示例演示了基于前面示例中使用的articles表和数据计算多单词全文搜索的相关度排名。
如果你搜索的是一个以上的单词,那么相关度排名值就是每个单词相关度排名值的总和,如下公式所示:
${rank} = ${TF} * ${IDF} * ${IDF} + ${TF} * ${IDF} * ${IDF}
执行两个搜索词(‘mysql tutorial’)搜索将返回以下结果:
mysql> SELECT id, title, body, MATCH (title,body) AGAINST (‘mysql tutorial’ IN BOOLEAN MODE) AS score FROM articles ORDER BY score DESC;
±—±-----------------------------±------------------------------------±---------------------+
| id | title | body | score |
±—±-----------------------------±------------------------------------±---------------------+
| 1 | MySQL Tutorial | This database tutorial … | 0.7405621409416199 |
| 3 | Optimizing Your Database | In this database tutorial … | 0.3624762296676636 |
| 5 | MySQL Security | When configured properly, MySQL … | 0.031219376251101494 |
| 8 | MySQL Full-Text Indexes | MySQL fulltext indexes use a … | 0.031219376251101494 |
| 2 | How To Use MySQL | After you went through a … | 0.015609688125550747 |
| 4 | MySQL vs. YourSQL | When comparing databases … | 0.015609688125550747 |
| 7 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. … | 0.015609688125550747 |
| 6 | Database, Database, Database | database database database | 0 |
±—±-----------------------------±------------------------------------±---------------------+
8 rows in set (0.00 sec)
在第一条记录(id 1)中,“mysql”出现一次,“tutorial”出现两次。“mysql”有六条匹配记录,“tutorial”有两条匹配记录。当将这些值插入到用于多个单词搜索的排名公式中时,MySQL命令行客户端返回预期的排名值
mysql> SELECT (1*log10(8/6)log10(8/6)) + (2log10(8/2)log10(8/2));
±------------------------------------------------------+
| (1log10(8/6)log10(8/6)) + (2log10(8/2)*log10(8/2)) |
±------------------------------------------------------+
| 0.7405621541938003 |
±------------------------------------------------------+
1 row in set (0.00 sec)
与单个单词搜索一样,使用select … match … against语句和MySQL命令行工具执行的结果有差别。
- 点赞
- 收藏
- 关注作者
评论(0)