Knowledge-base Extractor

This is a node.js application that aims at extracting the knowledge represented in the Google infoboxes (aka Google Knowlege Graph Panel).

The Algorithm implemented is the following:

Notes

How to run?

We Will automatically create all the required Cache folders:

The application is run in the console and the output will be available in cache/result.json

Crawling Configuration

There is a set of options that you can change found in the file options.json

cache_dbpedia_concepts       : true,
limit_dbpedia_concepts       : true,
limit_dbpedia_instances      : true,
limit_dbpedia_concepts_value : 10,
limit_dbpedia_instances_value: 10,
proxy                        : null

For our experiment the parameters are:

cache_dbpedia_concepts       : true,
limit_dbpedia_concepts       : false,
limit_dbpedia_instances      : true,
limit_dbpedia_concepts_value : null,
limit_dbpedia_instances_value: 100,
proxy                        : null

Moreover, you can always check the corresponding CSS class name selectors for the Google Knowledge Panel and edit them if needed in the same options.json file.

Currently the CSS selectors are:

"knowledgeBox"                : "#kno-result",
"knowledgeBox_disambiguate"   : ".kp-blk",
"property"                    : "._Nl",
"property_value"              : ".kno-fv",
"label"                       : ".kno-ecr-pt",
"description"                 : ".kno-rdesc",
"type"                        : "._kx",
"images"                      : ".bicc",
"special_property"            : ".kno-sh",
"special_property_value"      : "._Zh",
"special_property_value_link" : "a._dt"

Updates

Sample Result

  "Band": {
  	"summary": {
  		"label": {
  			"uri": "http://dbpedia.org/property/label",
  			"count": 100
  		},
  		"description": {
  			"uri": "http://purl.org/dc/elements/1.1/description",
  			"count": 100
  		},
  		"type": {
  			"uri": "http://dbpedia.org/property/type",
  			"count": 100
  		},
  		"origin": {
  			"uri": "http://dbpedia.org/property/origin",
  			"count": 88.17204301075269
  		},
  		"members": {
  			"uri": "http://dbpedia.org/property/members",
  			"count": 88.17204301075269
  		},
  		"albums": {
  			"uri": "http://dbpedia.org/property/albums",
  			"count": 87.09677419354838
  		},
  		"leadSingers": {
  			"uri": "http://dbpedia.org/property/leadSingers",
  			"count": 6.451612903225806
  		},
  		"recordLabel": {
  			"uri": "http://dbpedia.org/property/recordLabel",
  			"count": 12.903225806451612
  		},
  		"awards": {
  			"uri": "http://dbpedia.org/property/awards",
  			"count": 13.978494623655912
  		},
  		"nominations": {
  			"uri": "http://dbpedia.org/property/nominations",
  			"count": 7.526881720430108
  		},
  		"born": {
  			"uri": "http://dbpedia.org/property/born",
  			"count": 2.1505376344086025
  		},
  		"nationality": {
  			"uri": "http://dbpedia.org/property/nationality",
  			"count": 2.1505376344086025
  		},
  		"height": {
  			"uri": "http://dbpedia.org/property/height",
  			"count": 1.0752688172043012
  		}
  	},
  	"infoboxless": [
  		"!Action Pact!",
  		"Allele (band)",
  		"Anti-Pasti",
  		"Armageddon (A&M band)",
  		"Banket (band)",
  		"Battlelore",
  		"Ben Folds Five"
  	],
  	"Unmapped_Properties": {
  		"leadSinger": 1,
  		"recordLabels": 1,
  		"songs": 1,
  		"upcomingEvents": 1,
  		"peopleAlsoSearchFor": 1,
  		"activeFrom": 1,
  		"filmMusicCredits": 1,
  		"activeUntil": 1,
  		"moviesAndTvShows": 1
  	}
  }