Freebase 中的控制信息
一、Freebase中的资源表示:ID、Key、MID
上文提到,Freebase中存在一些三元组,这些三元组作为定义,定义了其他三元组的行为。例如:
<http://rdf.freebase.com/ns/award.award_winner> <http://rdf.freebase.com/ns/type.object.type> <http://rdf.freebase.com/ns/type.type> .
<http://rdf.freebase.com/ns/m.04kr> <http://rdf.freebase.com/ns/type.object.type> <http://rdf.freebase.com/ns/type.type> .
<http://rdf.freebase.com/ns/m.04kr> <http://rdf.freebase.com/ns/type.object.id> "/people/person" .
这里的IRI表现形式分为两类,一类是后缀为award.award_winner、type.object.type的,另一类是后缀为m.04kr这样的。
这两种分别代表了Freebase中的两种资源表示,ID和MID,其中ID往往由若干英文单词构成,方便人们理解(human-readable id), 而MID是由Freebase背后的数据库自动生成的(Machine generated id),由m开头,采用类似Base32形式编码的字符串。因此下列IRI都代表的是<people.person>这一资源。
<http://rdf.freebase.com/ns/m.04kr>
<http://rdf.freebase.com/ns/people.person>
ID和MID都唯一地映射到一个资源,同一时刻一个资源只有一个ID,也只有一个MID。然而对一个开放的知识图谱而言,一种资源往往具备多种表示形式,Freebase使用Key的概念来解决这一问题。例如,代表姚明的MID具有如下条目(截取部分):
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/en/yao_ming" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/authority/tvrage/person/92058" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/authority/imdb/name/nm1495244" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/authority/netflix/role/30007513" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/source/videosurf/11512" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/authority/twitter/yaoming" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/wikipedia/fa/$064A$0627$0626$0648_$0645$064A$0646$06AF" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/wikipedia/fr/Yao_Ming" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/wikipedia/sv/Yao_Ming" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/authority/facebook/Yao" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/wikipedia/zh-cn_id/10813" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/wikipedia/no/Yao_Ming" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/wikipedia/fi/Yao_Ming" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/wikipedia/ja_id/168510" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/wikipedia/zh-tw/$59DA$660E" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/wikipedia/et_id/138463" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/wikipedia/et_title/Yao_Ming" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/source/nytimes/top$002Freference$002Ftimestopics$002Fpeople$002Fy$002Fyao_ming" .
<http://rdf.freebase.com/ns/m.01jzhl> <http://rdf.freebase.com/ns/type.object.key> "/authority/netflix/api/http$003A$002F$002Fapi$002Enetflix$002Ecom$002Fcatalog$002Fpeople$002F30007513" .
这里除了/en/yao_ming这样的key,还有指向imdb、netflix、twitter、wikipedia上的表示,在Freebase中一个资源拥有多个key,但是对于一个特定的key,其唯一的指向一个资源。
二、控制其他三元组的三元组
刚刚提到<award.award_winner>, <people.person>等都是类型, 这些类型又属于<type.type>这一类型。这说明这些类型信息在Data Dump文件中有一些概念解释。
对代表“人”的MID在data dump中进行搜索,可以发现一些有意思的三元组。
<m.04kr> <type.object.key> "/people/person" .
<m.04kr> <type.object.id> "/people/person" .
<m.04kr> <freebase.object_hints.best_hrid> "/people/person" .
<m.04kr> <freebase.type_hints.mediator> "false" .
<m.04kr> <freebase.type_profile.instance_count> "2225136" .
<m.04kr> <freebase.type_profile.property_count> "4127013" .
<m.04kr> <type.object.type> <type.type> .
<m.04kr> <type.object.type> <freebase.type_profile> .
<m.04kr> <type.type.domain> <m.01z0kpp> .
<m.04kr> <type.type.properties> <m.025d7wc> .
<m.04kr> <type.type.properties> <m.025d7w3> .
<m.04kr> <type.type.properties> <m.04m8> .
<m.04kr> <type.type.properties> <m.04nt> .
<m.04kr> <type.type.expected_by> <m.028xmhx> .
<m.04kr> <type.type.expected_by> <m.01z0kvv> .
<m.04kr> <freebase.type_hints.included_types> <m.01c5> .
<m.04kr> <freebase.type_profile.strict_included_types> <m.01c5> .
<m.04kr> <freebase.type_profile.published> <m.02hqglv> .
<m.04kr> <freebase.type_profile.equivalent_topic> <m.01g317> .
<m.04kr> <freebase.type_profile.strict_included_types> <m.0rhbwmv> .
<m.04kr> <freebase.type_profile.strict_included_types> <m.0rhbs7t> .
<m.04kr> <freebase.type_profile.strict_included_types> <m.0rhbqq0> .
<m.04kr> <base.ontologies.ontology_class.equivalent_classes> <m.04gf613> .
<m.04kr> <freebase.type_profile.kind> <m.06whrm1> .
这里面除了介绍m.04kr的类型外,还有一些谓词,私以为是如下含义:
<type.type.properties> 该谓词下面的实体,代表该类型可以包含哪些类型的边
<freebase.type_hints.mediator> 该类型是否是一个mediator(后文会解释)
<freebase.type_profile.strict_included_types> 该类型被哪些类型包含
<type.type.domain> 该类型归属哪些Domain
对其中的m.04nt进行检索,发现如下条目:
<m.04nt> <type.object.type> <freebase.property_hints> .
<m.04nt> <type.object.type> <type.property> .
<m.04nt> <type.object.id> "/people/person/nationality" .
<m.04nt> <type.property.unique> "false" .
<m.04nt> <type.property.schema> <m.04kr> .
<m.04nt> <http://www.w3.org/2000/01/rdf-schema#domain> <m.04kr> .
<m.04nt> <type.property.expected_type> <m.01mp> .
<m.04nt> <http://www.w3.org/2000/01/rdf-schema#range> <m.01mp> .
<m.04nt> <freebase.property_hints.disambiguator> "true" .
<m.04nt> <freebase.property_hints.display_none> "false" .
<m.04nt> <freebase.property_hints.deprecated> "false" .
<m.04nt> <freebase.property_hints.display_orientation> "horizontal"@en .
<m.04nt> <freebase.property_hints.inverse_description> "{name}: Nationality"@en .
显然该谓词表示的是国籍,与此同时,也检索到如下内容:
<people.person.nationality> <type.property.unique> "false" .
<people.person.nationality> <type.property.expected_type> <location.country> .
<people.person.nationality> <http://www.w3.org/2000/01/rdf-schema#range> <location.country> .
<people.person.nationality> <type.property.schema> <people.person> .
<people.person.nationality> <http://www.w3.org/2000/01/rdf-schema#domain> <people.person> .
<people.person.nationality> <type.object.type> <type.property> .
可以看到Dump文件中,详细定义了一个谓词是否支持多值,这个谓词属于哪个schema(属于哪个类型)、这个谓词期望的取值是什么。
<m.01jzhl> <people.person.nationality> <m.0d05w3> .
对于上面这条三元组,至少有如下几条约束:
<people.person.nationality> <type.property.unique> "false" .
<people.person.nationality> <type.property.expected_type> <location.country> .
<people.person.nationality> <type.property.schema> <people.person> .
三、Domain的概念
由于Freebase有众多的类型,官方对这些类型进行了分组,每个组叫做一个domain,与rdfs中domain的概念相呼应。例如:
<m.04kr> <type.object.id> "/people/person" .
<m.04kr> <type.type.domain> <m.01z0kpp> .
<m.01z0kpp> <type.object.id> "/people" .
<m.03bqmw0> <type.type.domain> <m.01z0kpp> .
<m.02h65n3> <type.type.domain> <m.01z0kpp> .
<m.0kps54> <type.type.domain> <m.01z0kpp> .
<m.04p7ysz> <type.type.domain> <m.01z0kpp> .
<m.04nr73n> <type.type.domain> <m.01z0kpp> .
<m.05n4rsg> <type.type.domain> <m.01z0kpp> .
可以看到除了<people.person>, 还有很多类型也属于“/people”这个domain。加上type、topic的概念,就构成了domain、type、topic的层次结构。
此外,Freebase的Key拥有不同的前缀,例如"/en","/wikipedia"等,这些在官网中叫做namespace,用于对key值做一些分类。
一些参考:
[1]Machine ID - Freebase: http://wiki.freebase.com/wiki/Machine_ID(网页快照)
[2]Basic Concept: https://developers.google.com/freebase/guide/basic_concepts
想了解更多的AI技术干货,欢迎上华为云的AI专区,目前有AI编程Python等六大实战营供大家免费学习。(六大实战营link:http://su.modelarts.club/qQB9)
- 点赞
- 收藏
- 关注作者
评论(0)