Saving the world by feeding the Al Dragon with unbiased open data
Talk given by Kaj Arnö on 4 February 2025 at the State of Open Conference 25 (SOOCon) in London.
- Webinar recording on youtube: https://youtu.be/b1uJsIhAk-4
- Slides: Google Slides
Moderator intro
We have an amazing talk from a fantastic speaker coming up. This is Saving the World by Feeding the AI Dragon with Unbiased Open Data, a subject which I am sure we’re all incredibly interested in. Please give a very warm round of applause to Kaj.
Introduction
Indeed, my goal is to save the world by feeding the AI dragon with unbiased open data. I’m Kaj Arnö, and I’m the CEO of MariaDB Foundation. And I’ve worked with free and open source software all of this century, and in that process I’ve become convinced that openness is a powerful force for good, also when it comes to open data.
Background: Finland, a country with two languages and cultures
I’m a dual citizen of both my native Finland, and Germany, and over the years, I’ve noted that Finland hasn’t properly been represented in open data. Finland is a country with two languages, and I’ve seen that Swedish Finland hasn’t been sufficiently described.
You might know Swedish Finland by two of its representatives, Linus Torvalds, the father of Linux, and Alexander Stubb, our current president. And I founded a project to fix this in 2017 called Projekt Fredrika.
And that has given the background for the much larger scope of not just saving little Swedish Finland, but saving the world.
Background: What is MariaDB, the relational database?
As for MariaDB, it’s an open source relational database management system, and that’s the reason why I am at Open UK. But for the purposes of this presentation, there’s only three things I’d like you to know about MariaDB.
- First, I think it’s fair to portray MariaDB as the future of MySQL.
- Second, it has the infrastructure needed for AI, AI in the form of vectors. The most prolific use case there is to create RAG applications, which means that the large language models are answering the questions, not based on its general training data, but on the specific data you’ve provided as the creator of that application.
- Third, MariaDB is the database used by Wikipedia.
But now for saving the world.
The premise: The war on Ukraine should have been no surprise
The 24th of February, three years ago, took the world by surprise.
It shouldn’t have.
With hindsight, we had all the foresight needed to have prevented that happening in the Ukraine. And this foresight we had had for centuries, but it’s not as there hadn’t been extra alarm bells ringing in the last decades.
There were recent alarm bells – but they went unheard
One of the first alarm bells was in 2008, when Georgia was attacked by Russia. Our reaction: “Well, that’s sort of far away. And have you been to Georgia? I don’t know where it is. Perhaps it’s not important.” There were few exceptions to the general attitude, which was, “ah, these alarm bells are not important”. And one of those exceptions happened to be Finland’s then foreign secretary, Alexander Stubb, now our president.
There were other alarm bells in 2013 in the Ukraine. In Ukraine, Euromaidan should have been a lesson for all of us who didn’t do our homework when learning about the history of Ukraine to see that they consider themselves to be part of Europe. We didn’t do our homework. And then in 2014, when little green men came to Crimea, the alarm bells still stayed too silent, and Angela Merkel played the role of Neville Chamberlain, trying to appease dictatorship by presuming that common good would be a priority for them. It wasn’t.
Finally in 2022, alarms went off, but only slowly
So in 2022, the armbells then really did go off, but very late and very slowly and very partially. The reason for this is our lack of understanding of history through three centuries of successful portrayal of Ukraine as a non-country, non-language, non-culture. “Ukraine is probably a part of Russia, isn’t it?” Ukraine was outside the general education in Western Europe.
In hindsight, we had all ingredients for foresight
My two central claims around this is that, in hindsight, had the West reacted properly when the alarm bells should have gone off in 2008 and 2014, there wouldn’t have been a full-out war in 2022. And this is not much of a hindsight. It is not a question of “had we known then what we know now”. It is a question of had we used then what we really should have known already. Had the past been known, and understood as part of the general education, we would have reacted properly.
Now, the cost of lack of general education is hundreds of billions €/$/£
With the end outcome that military expenditure by the West is now in the hundreds of billions of whatever currency units, not to speak of a million lost human lives. But, tell you what, history isn’t over yet. This isn’t the last time when such things can happen. Foresight is still needed.
History is written by the victors
History is always written by the victors, and those of you with A, good eyesight, and B, very good knowledge of geography, will notice that this picture, which is uploaded to Wikimedia, is portraying Crimea as a part of Russia. Let’s hope that victory doesn’t happen.
Contrasting Finnish and Ukrainian nationalism portrayal
It’s a very different situation if you look at how Finland is portrayed on Wikipedia and Ukraine. So if you look at nationalism, in Finland it’s something where we sort of pat ourselves on the shoulder, and that is reflected upon Wikipedia regardless of which language we’re portrayed in.
In Finland, nationalism in the 1800s is national awakening. It’s the Russian eagle that tries to rob us of rule of law. And we have great poets and great composers, painters, and we did honorable resistance against Russia in the 19th century.
In Ukraine, well, according to the spirit of many Wikipedia articles on Ukraine in different languages, they had nationalism too, but that was pogroms, and it was the root of Nazism and violence and lots of dissonance.
Yet it’s the same national awakening that happened in Ukraine and in Finland: great poets, painters, resistance and all. Only we in Finland didn’t lose, they in Ukraine did.
On top, most Ukrainians of the 1800s are classified as Russians in Wikipedia. I am happy to note that no Finns of that era are classified as Russians.
Correcting the portrayal is a form of cultural self defense
The way to correct this is cultural self-defense, at less than 0.1% of the cost of the military expense now used to fix stuff that we screwed up. And I still remind you that history is not over yet. And this is Projekt Kateryna, named after a painting by Taras Shevchenko, which is trying to rectify things by, and this is an expression, decolonization of Ukrainian history in Wikipedia.
So examples of this. Wikipedia has partial truths. It’s not blatant lies. And those of you very interested in history might know that there’s a guy called Ivan Mazepa who sided with “us”, the Nordics, the Scandinavians, in the Battle of Poltava, 1709. Well, it’s still being fought over in Ukraine, whether Mazepa betrayed Russia – or whether his colleagues betrayed Ukraine – by siding with Sweden. In 1709. Strong unresolved feelings need facts to be portrayed properly. Now, there’s a lack of nuance in how facts related to the Ukrainian nation are portrayed.
Here’s a picture, also from Wikipedia Foundation, about a Viking named Rurik, who was the person who in A.D. 870 founded … well, what exactly did he found? Was it Russia? Was it Kiev? Was it Ukraine?
Currently, the portrayal of Ukrainian history is frequently a form of Russian propaganda by a thousand cuts. It’s not a question of fixing one Wikipedia article, it’s all of the articles where Ukraine, in this case, is portrayed on Wikipedia.
The final consequence of misrepresented history can be: War
The consequence of not portraying it properly, by copying how the Russians have depicted Ukraine over the centuries, is what happened in 2008 and 2014. It’s the lazy truth.
Those who really should be the most educated people in politics, in journalism, they go to the place where they believe that facts are most credibly represented, which would then have been Wikipedia, and now frequently also AI. And if Wikipedia mainly provides a partial truth, how can neutral point of view prevail?
The solution: Portraying neutral facts on open data
The solution, though – and there is a solution – is described in the title of this presentation, To Feed the AI Dragon. The good news is that we can influence open data, Wikipedia, Wikidata. And they have a goal of a neutral point of view. The value here is the credibility by Wikipedia and Wikidata, and the accessibility of it. There’s a platform for it. And it is one of the best sources of training data for AI. So just go fix it.
We have nearly all necessary tools
We have nearly all the infrastructure. We have the universities and scholars providing this truth, or neutral point of view. Here, the very good authority on this is Timothy Snyder, a professor at Yale.
And the technology exists. So Wikipedia exists. Vector databases exist. AI, retrieval augmented generation. We have Python scripts.
We miss the institution to supply the Neutral Point of View
So what is missing? Well, what’s missing is that the collective West, to use an expression favored in Russia, is missing an institution with the task to feed this AI dragon with neutral point of view. And to provide a vaccine against the propaganda that exists, all across the partial truth, by curating neutral point of view data into Wikipedia.
I think such an institution should be named after Denis Diderot, who’s the guy who originated the concept of an encyclopedia.
The question then would be, who should form this organisation? Where should it be based? Should it be in your wonderful country here in the UK? I think you have a good track record of influencing world history, or at least attempting at it – you were one of the first to realise what was happening in Kiev in Ukraine in 2022.
I think the EU very much has it as its interest for this to happen.
First they came …
If we don’t do anything about this, the old adage will apply: “First they came for the communists. I’m not one, so I didn’t do anything. Then they came for the homosexuals. Well, I’m not one, so I didn’t do anything about it. Then they came for the Jews. Well, I’m not a Jew, so I didn’t do anything about it. Now they came for me, and there is no-one left to defend me.”
That will happen again. It will not stop at Ukraine.
The US hopefully should have this as their self-interest. And even because this is a question of cultural self-defense, I would imagine that even an organisation of the type of NATO should have it as its interest.
My call to arms: Help me navigate the powers of The Collective West!
So my call to arms, my call to action for you is: please help me navigate the collective West to find and to found and to fund the appropriate institution for this.
Thank you.
Audience questions
We have time for questions.
What should we do when the AI dragon gives us results we do not like?
Q: What should we do when the AI dragon gives us results we do not like?
A: We cannot control how these AI dragons are exactly being fed. But what we can do is exactly what I’m proposing here: feed it with proper open data.
I think most organisations that train their dragons are only thankful for the credibility and the curation done by Wikimedia organisations. So we can exactly do what I’m proposing here, feed the AI dragon with knowledge, non-propaganda.
How can anybody in this room help in making this happen?
Q: What are the practical steps that anybody in this room can do to help me organise this?
A: It is to identify the people who have either plentiful money or connections to the organisations in which interest this really should be (governments of UK, EU, US, NATO), to connect us with them. I think this is an interest of society at large, at least the freedom loving, free liberal West should be interested in this.
And I think it should be self-evident. I am searching for the right people to talk to. So if you know about whom I should be talking to, please use the email address and my LinkedIn address on the slides there and help by connecting.
Who defines Neutral Point of View
Q: The question is: how do we define neutral point of view?
A: My happy answer is: we don’t have to. Wikimedia Foundation has already defined NPOV and Wikipedia already has a mechanism for mediating between how things are being described.
There is a mechanism that has been in place for describing Israel and Palestine for years and the equilibrium has been found. So that’s a problem that is already solved and outside the scope of the feeding the AI dragon problem.
So we just need to feed the neutral point of view information. And if we feed it with something that isn’t neutral, we will be reprimanded and it will not stay in Wikipedia. So there is this control mechanism.
Moderator conclusion
Thank you. That was an absolutely fantastic talk.