Hello folks, I hope you all are doing good. Today we’re are going to do something cool(at least in my sense😅). Basically, we’re going to see how you can use wikidata to fetch publicily available data. It is also being used by wikipedia to fetch its structured data. Lets first talk about some of the things that are important.
- Wikidata – Its an open source and free knowledge-base that gives you information of an enity in structured format. As it’s an open source, which means you can use it anywhere.
- Wikipedia – Wikipedia is world’s largest free online encyclopedia and it also uses wikidata to extract structured information about particular entity.
- Wikibase SDK – Its an official NPM package by Wikidata to query information from wikidata easily.
But In this article, our main focus would be on Wikidata and Wikibase SDK. As we will be using wikibase sdk to query data from wikidata. But first of all, some of the thing i would like to clear which i think is important to know before you jump into the tutorial part.
Brief guide of Wikidata
In Wikidata, there two main things Entity
and Property
. where Entity could be any name, place, animal, things, etc e.g Tom cruise, Earth, Naruto or World War. And Properties are traits or data value of that particular entity e.g place of birth, directed by, etc.
So, In wikidata every single entity has their own particuar QID – e.g Earth (Q2), Tom Crusie(Q37079), etc. And in other side property has their own specific PID – e.g occupation(P106) place of birth(P19), etc. And then all the information are stored in a statment like structure e.g
- Entity – Property – Value(could be Entity, Literal, Numeric, etc) e.g.
Tom Cruise(Q37079)
instance of(P31)
Human(Q5
).
For more understanding you can visit any Wikidata page of an entity e.g https://www.wikidata.org/wiki/Q37079
Now you understand how information are being stored inside Wikidata then we’re are ready to jump into the tutorial part. Where I’ll show you how can query data from this awesome Wikidata(Knowledge base) using wikibase SDK.
Querying data from Wikidata using Wikibase SDK
But before that let me tell you there are two different ways to query data from Wikidata.
- First one is through Wikibase SDK
- and Second one is through SPARQL Query – Read Docs for SPARQL
You can also query more selective or specific information from wikidata using SPARQL. SPARQL gives you more flexiblity when it comes to querying information but for that you should have understanding of SPARQL langugae.
But for now, In this article we’re going to use wikibase SDK to query info about particular entity. So, let’s jump into the tutorial.
First of all you have to install two npm packages – wikibase-sdk and axios.
npm install wikibase-sdk axios
Once you’have added this package in your node project then we’re ready to go. After that you have to create an instance of wikibase-sdk
const WBK = require('wikibase-sdk')
const wbk = WBK({
instance: 'https://www.wikidata.org',
sparqlEndpoint: 'https://query.wikidata.org/sparql'
})
1. Search Entities
Now the first api that we’re going to see is wikibase search entites. For this you have to invoke searchEntities method from wbk isntance. And after that pass your search word inside searchEntities
const url = wbk.searchEntities('Tom Crusie')
// url = https://www.wikidata.org/w/api.php?action=wbsearchentities&search=Ingmar%20Bergman&language=en&limit=20&continue=0&format=json&uselang=en&type=item
Now what this searchEntities does that its brings all the different entities page of this specific keyword. Now you can send get request to above url to see result
const response = axios.get(url)
To see response you can visit above url to see result as its a get request. It will fetch all different entities around that speicific word.
In the above image as you can see we have Tom Crusie(Q37079), Tom Cruise Filmograph(Q3467556),…, etc.
2. Search with Wikipedia Title:
You can also fetch info of particular entity based on wikipedia title. It is more useful when you want to fetch information of particular entity.
const url = wbk.getEntitiesFromSitelinks([ 'Hamburg', 'London', 'Lisbon' ])
Now above url will give you label and description about these entities.
By default, this will give you response in all different languages that are available. Although you can narrow down your results by adding additional params e.g:
// all of these params are optional so use only what you want
const url = wbk.getEntitiesFromSitelinks({
titles: 'Hamburg', // String or array of title
sites: 'enwiki',
languages: [ 'en' ],
props: [ 'info', 'labels', 'descriptions', 'claims' ], // things you want to fetch from particular entity
format: 'json', // defaults to json
redirections: false // defaults to true
})
Although props
are important one, as with this you can define what type of data you want from an entity
- labels: to fetch label
- descriptions: to fetch descriptions
- claims: to fetch all properties(PIDs) of that particular entity
3. By using ID(QID):
Now last but not the least. As i said that every entity has their own ID(QID). So you can fetch data from wikibase using their QIDs.
const url = wbk.getEntities({
ids: [ 'Q1', 'Q5', 'Q571' ],
languages: [ 'en' ], // returns all languages if not specified
props: [ 'info', 'labels', 'descriptions' ], // returns all data if not specified
format: 'json', // defaults to json
redirections: false // defaults to true
})
And the above will give you the following result:
Before we end this guide i would like to show you some of the helper methods as well which could save your time and energy
Wikibase Helper Methods:
1. Getting Image url
Inside your claim(image prop[P18]) there is no url of image available. Instead they provide you the name of that file. In this case it would be hard to extract or form url with given image label. As I remeber i used to do an indian jugaad😅 just to make image url. But I wouldn’t have done that If i knew this method back then.
const iamgeUrl = wbk.getImageUrl('Hubble ultra deep field.jpg') // mainsnak.datavalue.value
// https://upload.wikimedia.org/wikipedia/commons/2/2f/Hubble_ultra_deep_field.jpg
2. Converting date and time from ISO to Human readable format
although its not that hard to convert ISO date and time into human readable format. As you can use packages like moment, etc to do that. But first you have to remove +
from your ISO String. For that you can use wikibase helper function as well
const moment = require('moment')
const ISOString = wbk.wikibaseTimeToISOString('+1962-07-03T00:00:00Z')
console.log(moment(ISOString).utc().format('MMMM Do YYYY'))
// July 3rd 1962
So, that’s it for now. Although if you’ve reached so far then i wouldn’t stop you from exploring. If you are still interested then you can read official docs of Wikibase. As this article intended to give you an understanding of how things work inside wikidata.
Good read: How to use vuex state management in Vue 3
Although In upcoming article, I would also explain you SPARQL way to fetch info from wikidata. As i said it’s more flexible. But for that you have to subscribe this one and don’t forget to upvote this article if you liked this. And also Stay hungry and stay foolish
.
Thanks for visiting 🙂