The UNL Archive

Ronaldo Martins, PhD
Former Language Resources Manager, UNDL Foundation
ronaldotmartins@gmail.com

In March 2025, I was approached by Prof. Tarcisio Della Senta, former president of the UNDL Foundation, to make the UNL Programme collection available again to the international community. The material had been hosted on UNLweb, a portal that brought together the linguistic and computational resources developed for the project. However, the site had not been maintained since the Foundation ceased operations in 2015 and had been experiencing frequent instability, with long periods of downtime.

The initial idea was simply to mirror the services on a new, more stable server. The task, however, proved far more difficult than planned. UNLweb had been developed primarily between 2009 and 2012 as part of the UNL+3 project, using technology (PHP 5.3, Java 1.6), third-party libraries, plugins, APIs and modules, and web applications (Joomla 1.5, MediaWiki 1.16, phpBB 3.0) that were common at the time. Ten years after the last update, the software simply no longer functioned on modern servers. From 2015 onward, several PHP functions were deprecated. New versions of Joomla and MediaWiki underwent radical transformations that made direct migration impossible. And most external dependencies were broken and caused compatibility issues. In short, the mirroring strategy proved impractical and would have required rewriting significant portions of thousands of lines of code across all systems—far exceeding the budget (which was zero), my time availability, and the actual objectives of the proposal, which was not to revive the old structure that would not be continued, but rather to make the collection available for documentation, research, and future exploration.

The solution, then, was to reorganize the collection, which has been divided here into six sections: linguistic resources, computational resources, documentation, education, history of the UNL Programme, and history of the UNDL Foundation. I have maintained, wherever possible, access to the project's primary data and resources; however, several functions have been removed.

During this compilation and reorganization work, I was compelled to revisit my entire history with the UNL Programme. It spans 18 years. I joined the project in 1997, when I was beginning my PhD, as a member of the Brazilian Language Centre, based at the University of São Paulo. I defended my doctoral thesis based on this experience and joined the UNL Centre in 2002, closely following its history and participating, episodically, in several of its initiatives. In 2009, I joined the UNDL Foundation at its headquarters in Petit Lancy, Geneva, where I remained until 2014, when financial resources were exhausted.

Throughout this period, I had experiences that defined the researcher and linguist I am today. I had the opportunity to meet people from vastly different places and, above all, from vastly different languages. I lived the hope of a language model that could be represented to machines, and the continuous frustrations resulting from each successive adaptation of that model. Expectations surrounding the proposal were many, often fed irresponsibly. The complexity of human language, however, was always far greater, and this sphinx devoured us all.

We were, above all, overtaken by the revolution that occurred in natural language processing beginning mainly in the 2010s. The resurgence of neural networks starting in the late 2000s and the subsequent development of the technology, particularly with transformers from the second half of the 2010s onward, represented a paradigmatic rupture that rendered obsolete the symbolic approaches we had been exploring since the 1990s.

We computational linguists experience today a feeling that, I believe, must not be very different from that of alchemists confronting the advances of modern chemistry, or Newtonian physicists confronting quantum physics. Today there exists another way—much simpler and more effective—of making machines speak, one that does not depend on categories we believed to be consolidated in traditional linguistics. LLMs do not even observe word boundaries, and work with sublexical entities that seem absurdly arbitrary to us, but which, despite this, produce better results than we ever achieved.

It is the vindication of Marvin Minsky's famous aphorism that airplanes don't flap their wings. In other words, the best alternative for systems would be to abandon deductive modeling in favor of an inductive perspective, which is precisely the currently hegemonic vision in the field of AI.

When Prof. Della Senta asked me to make the UNL Programme products available to the public, the key question was precisely the relevance of keeping this data accessible to the research and development community. Does this material have any utility today? Is there still room for approaches like UNL in a world of LLMs? Is there a place for dictionaries and grammars in this new vector space where advances in natural language processing now unfold? Do Linguistics and linguists still have anything to teach machines?

The immediate answer seems to be no.

However, John Searle's famous Chinese Room argument still seems valid. The thought experiment, formulated in 1980, served as a critique of the Turing Test. Searle proposed a hypothetical situation in which a man, without knowing Chinese but equipped with a correspondence table between Chinese and English, translated texts between the two languages. The translations, from the perspective of results, were perfect. The man, however, translated without understanding, merely mechanically applying the rules at his disposal. He manipulated symbols but did not truly comprehend their meaning. He himself could not judge (nor explain) the quality of his translations. In sum: intelligence, in the strong sense, would not be merely similarity of results, but also awareness of processes.

The question is to what extent processes are effectively constitutive of results. That is, to what extent "understanding" is, in fact, important for the execution of linguistic tasks. The success of generative AI systems leads to even more radical questions: to what extent does "meaning"—like "consciousness" itself—actually exist as an input to linguistic processing? Would semantics be a material cause or an epiphenomenal consequence, an emergent property, the product of a posteriori rationalization, of linguistic processing, which would be strictly neurochemical in basis?

Any answer, at this point, is merely probable. But the inability of current AI systems to self-criticize—particularly regarding hallucinations—seems to indicate that meaning is not dead and does indeed exercise a defining function in linguistic processing. There are limits to machine learning, and these limits seem to involve precisely the capacity for stepping outside one's own processing—a form of metareasoning that becomes possible only when the system can take its own operations as an object of scrutiny, that is, when it achieves a commitment to explicability through metalinguistic reflection. And it is precisely through this fissure that one might glimpse the light of a future for approaches like UNL.

Whatever the case, here are, tentatively, the organized results of the UNL project's history — perhaps awaiting someone with greater skill and artistry. If you also have materials about your experience with the UNL Programme that you would like to make available on this portal, it would be a pleasure to include and preserve them for posterity.

Brasília, Brazil, October 2025