Freebase 中的控制信息

举报
蜉蝣与海 发表于 2021/07/27 17:20:13 2021/07/27
【摘要】 一、Freebase中的资源表示:ID、Key、MID上文提到,Freebase中存在一些三元组,这些三元组作为定义,定义了其他三元组的行为。例如:<http://rdf.freebase.com/ns/award.award_winner> <http://rdf.freebase.com/ns/type.object.type> <http://rdf.freebase.com/ns...

一、Freebase中的资源表示:ID、Key、MID

上文提到,Freebase中存在一些三元组,这些三元组作为定义,定义了其他三元组的行为。例如:

<http://rdf.freebase.com/ns/award.award_winner> <http://rdf.freebase.com/ns/type.object.type>   <http://rdf.freebase.com/ns/type.type>  .
<http://rdf.freebase.com/ns/m.04kr> <http://rdf.freebase.com/ns/type.object.type> <http://rdf.freebase.com/ns/type.type> .
<http://rdf.freebase.com/ns/m.04kr> <http://rdf.freebase.com/ns/type.object.id> "/people/person" .

这里的IRI表现形式分为两类,一类是后缀为award.award_winner、type.object.type的,另一类是后缀为m.04kr这样的。

这两种分别代表了Freebase中的两种资源表示,ID和MID,其中ID往往由若干英文单词构成,方便人们理解(human-readable id), 而MID是由Freebase背后的数据库自动生成的(Machine generated id),由m开头,采用类似Base32形式编码的字符串。因此下列IRI都代表的是<people.person>这一资源。

<http://rdf.freebase.com/ns/m.04kr>
<http://rdf.freebase.com/ns/people.person>

ID和MID都唯一地映射到一个资源,同一时刻一个资源只有一个ID,也只有一个MID。然而对一个开放的知识图谱而言,一种资源往往具备多种表示形式,Freebase使用Key的概念来解决这一问题。例如,代表姚明的MID具有如下条目(截取部分):

<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/en/yao_ming"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/authority/tvrage/person/92058"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/authority/imdb/name/nm1495244"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/authority/netflix/role/30007513"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/source/videosurf/11512"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/authority/twitter/yaoming"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/wikipedia/fa/$064A$0627$0626$0648_$0645$064A$0646$06AF"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/wikipedia/fr/Yao_Ming"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/wikipedia/sv/Yao_Ming"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/authority/facebook/Yao"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/wikipedia/zh-cn_id/10813"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/wikipedia/no/Yao_Ming"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/wikipedia/fi/Yao_Ming"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/wikipedia/ja_id/168510"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/wikipedia/zh-tw/$59DA$660E"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/wikipedia/et_id/138463"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/wikipedia/et_title/Yao_Ming"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key> "/source/nytimes/top$002Freference$002Ftimestopics$002Fpeople$002Fy$002Fyao_ming"	.
<http://rdf.freebase.com/ns/m.01jzhl>	<http://rdf.freebase.com/ns/type.object.key>	"/authority/netflix/api/http$003A$002F$002Fapi$002Enetflix$002Ecom$002Fcatalog$002Fpeople$002F30007513"	.

  这里除了/en/yao_ming这样的key,还有指向imdb、netflix、twitter、wikipedia上的表示,在Freebase中一个资源拥有多个key,但是对于一个特定的key,其唯一的指向一个资源。

二、控制其他三元组的三元组

刚刚提到<award.award_winner>, <people.person>等都是类型, 这些类型又属于<type.type>这一类型。这说明这些类型信息在Data Dump文件中有一些概念解释。

对代表“人”的MID在data dump中进行搜索,可以发现一些有意思的三元组。

<m.04kr> <type.object.key> "/people/person" .
<m.04kr> <type.object.id> "/people/person" .
<m.04kr> <freebase.object_hints.best_hrid> "/people/person" .
<m.04kr> <freebase.type_hints.mediator> "false" .
<m.04kr> <freebase.type_profile.instance_count> "2225136" .
<m.04kr> <freebase.type_profile.property_count> "4127013" .
<m.04kr> <type.object.type> <type.type> .
<m.04kr> <type.object.type> <freebase.type_profile> .
<m.04kr> <type.type.domain> <m.01z0kpp> .
<m.04kr> <type.type.properties> <m.025d7wc> .
<m.04kr> <type.type.properties> <m.025d7w3> .
<m.04kr> <type.type.properties> <m.04m8> .
<m.04kr> <type.type.properties> <m.04nt> . 
<m.04kr> <type.type.expected_by> <m.028xmhx> .
<m.04kr> <type.type.expected_by> <m.01z0kvv> .
<m.04kr> <freebase.type_hints.included_types> <m.01c5> .
<m.04kr> <freebase.type_profile.strict_included_types> <m.01c5> .
<m.04kr> <freebase.type_profile.published> <m.02hqglv> .
<m.04kr> <freebase.type_profile.equivalent_topic> <m.01g317> .
<m.04kr> <freebase.type_profile.strict_included_types> <m.0rhbwmv> .
<m.04kr> <freebase.type_profile.strict_included_types> <m.0rhbs7t> .
<m.04kr> <freebase.type_profile.strict_included_types> <m.0rhbqq0> .
<m.04kr> <base.ontologies.ontology_class.equivalent_classes> <m.04gf613> .
<m.04kr> <freebase.type_profile.kind> <m.06whrm1> .

这里面除了介绍m.04kr的类型外,还有一些谓词,私以为是如下含义:

<type.type.properties> 该谓词下面的实体,代表该类型可以包含哪些类型的边
<freebase.type_hints.mediator> 该类型是否是一个mediator(后文会解释)
<freebase.type_profile.strict_included_types> 该类型被哪些类型包含
<type.type.domain> 该类型归属哪些Domain

对其中的m.04nt进行检索,发现如下条目:

<m.04nt>    <type.object.type>    <freebase.property_hints>    .
<m.04nt>    <type.object.type>    <type.property>    .
<m.04nt>    <type.object.id>    "/people/person/nationality"    .
<m.04nt> <type.property.unique> "false" .
<m.04nt> <type.property.schema> <m.04kr> .
<m.04nt> <http://www.w3.org/2000/01/rdf-schema#domain> <m.04kr> .
<m.04nt> <type.property.expected_type> <m.01mp> .
<m.04nt> <http://www.w3.org/2000/01/rdf-schema#range> <m.01mp> .
<m.04nt> <freebase.property_hints.disambiguator> "true" .
<m.04nt> <freebase.property_hints.display_none> "false" .
<m.04nt> <freebase.property_hints.deprecated> "false" .
<m.04nt> <freebase.property_hints.display_orientation> "horizontal"@en .
<m.04nt> <freebase.property_hints.inverse_description> "{name}: Nationality"@en .

显然该谓词表示的是国籍,与此同时,也检索到如下内容:

<people.person.nationality> <type.property.unique> "false" .
<people.person.nationality> <type.property.expected_type> <location.country> .
<people.person.nationality> <http://www.w3.org/2000/01/rdf-schema#range> <location.country> .
<people.person.nationality> <type.property.schema> <people.person> .
<people.person.nationality> <http://www.w3.org/2000/01/rdf-schema#domain> <people.person> .
<people.person.nationality>    <type.object.type>    <type.property>    .

可以看到Dump文件中,详细定义了一个谓词是否支持多值,这个谓词属于哪个schema(属于哪个类型)、这个谓词期望的取值是什么。

<m.01jzhl>	<people.person.nationality>	<m.0d05w3>	.

对于上面这条三元组,至少有如下几条约束:

<people.person.nationality> <type.property.unique> "false" .
<people.person.nationality> <type.property.expected_type> <location.country> .
<people.person.nationality> <type.property.schema> <people.person> .

三、Domain的概念

由于Freebase有众多的类型,官方对这些类型进行了分组,每个组叫做一个domain,与rdfs中domain的概念相呼应。例如:

<m.04kr> <type.object.id> "/people/person" .
<m.04kr> <type.type.domain> <m.01z0kpp> .
<m.01z0kpp> <type.object.id> "/people" .
<m.03bqmw0>	<type.type.domain>	<m.01z0kpp>	.
<m.02h65n3>	<type.type.domain>	<m.01z0kpp>	.
<m.0kps54>	<type.type.domain>	<m.01z0kpp>	.
<m.04p7ysz>	<type.type.domain>	<m.01z0kpp>	.
<m.04nr73n>	<type.type.domain>	<m.01z0kpp>	.
<m.05n4rsg>	<type.type.domain>	<m.01z0kpp>	.

可以看到除了<people.person>, 还有很多类型也属于“/people”这个domain。加上type、topic的概念,就构成了domain、type、topic的层次结构。

此外,Freebase的Key拥有不同的前缀,例如"/en","/wikipedia"等,这些在官网中叫做namespace,用于对key值做一些分类。

一些参考:

[1]Machine ID - Freebase: http://wiki.freebase.com/wiki/Machine_ID(网页快照)

[2]Basic Concept: https://developers.google.com/freebase/guide/basic_concepts

想了解更多的AI技术干货,欢迎上华为云的AI专区,目前有AI编程Python等六大实战营供大家免费学习。(六大实战营link:http://su.modelarts.club/qQB9)

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。