The Evolution of Search Engines: Past, Present & Future • Fluid Topics

Avr 15, 2025 | Reading Time: 4 minutes

“Understanding Semantic Search” is an ongoing series. This is the second post in this series. Don’t miss our previous article “Understanding Semantic Search: What it is and How to Use it”.

There’s no question, semantic search is a valuable tool, but how did we get here? Let’s look at the recent history of search technology and several major milestones that have brought us to where search technology is today.

Glossary of Relevant Terms

Embeddings: These are mathematical representations that try to convey meaning in the form of a vector (a list) of numeric values. Embeddings are also called semantic vectors.
Generative AI (GenAI): This is a category of AI that produces content — text, images, audio, code — to try and copy human creativity. It uses datasets to study patterns and produce content based on this data in response to prompts (instructions).
Natural Language Processing (NLP): This is when a computer program uses various algorithms or models to process natural language as opposed to computer language. This allows humans to successfully interact with computers using natural sentences.
Recurrent Neural Networks (RNNs): RNNs are a specific category of artificial neural networks that use sequential data inputs or time series data. These deep learning models are often used for language translation, NLP, speech recognition, and image captioning.
Tokenization: This is the algorithm which splits a written text into small sequences of letters. For keyword search, one token is one word. But for similarity search and embeddings creation, a token is just a sequence of letters that can be a part of a word or span over two words.

A Timeline of Search Technology Advancements

As time passes and technology evolves, search engines have evolved to improve result relevancy and reflect changes in user search methods. Each new evolution in search technology brought valuable new capabilities. Read on to uncover the specificities of each advancement.

From Bag of Words to TF-IDF

First, the standard way to solve Natural Language Processing (NLP) challenges was with “bag of words” search. This information retrieval model removes word ordering and simply counts how many times a word appears in a document. The higher the number, the more important the word. This keyword search model was most notably useful for document classification.

Imagined in the 60’s and implemented at scale in the 80’s, the next step was term frequency–inverse document frequency (TF-IDF) — a more optimized version of bag of words. TF-IDF looks at the word density of a text: the inverse relationship between word frequency and importance so that commonly used words with little meaning — the, of, or, it… — don’t skew results. Basically, this method aims to correct results for sampling bias.

The Rise of Language Models

Eventually, TF-IDF started showing its limitations in relevance, and researchers conceived a new approach in the 90’s with the introduction of probabilistic evaluation: the probabilistic distribution of words in the content and the probability that the documents can generate the query in a given language (i.e. the language model derived from the documents can generate the query). The most well-known probabilistic model is Okapi-BM25, which was a real improvement in relevance compared to TF-IDF. However, it was still limited and could not easily cope with the new personalization challenges because it lacked the possibility to include additional non-lexical parameters in the ranking (due to its joint probability approach).

In the 2000s, a breakthrough was introduced with Bayesian models: generative language models that rely on conditional probability and that allow applications to gracefully include any parameter in the search equation at query time to make the search more specific and personalized. Fluid Topics’ Taruqa search engine uses this probabilistic generative keyword search engine technology.

Embeddings and the Emergence of Neural Networks

After this, language model technology gained traction and we moved to semantic modelling through vectors, eventually with embeddings, as companies began to establish libraries for dense word representations. This was complemented with the emergence of neural network frameworks like Keras, Tensorflow, and PyTorch that enabled the computation of larger models. These allowed programs to process data more like humans, improve continuously, and solve complex problems. In the context of search, embeddings provided the possibility to understand the meaning of a sequence of words rather than just one word at a time.

The Rise of Transformers

Finally, around 2018 (when the famous “Attention is all you need” paper was released) we moved to using transformers. With embeddings we could compute the first layer of neural networks and use it to train other layers of data. However, transformers refined this process by computing the entire data model without needing to relearn the data every time they had to complete a new task.

Additionally, transformers produce better embeddings due to self-attention mechanisms which help models concentrate on and weigh the importance of input tokens when producing outputs. In practice this enhances reading comprehension and is essential for tasks where the model needs to understand context. Compared to the earlier models that used Recurrent Neural Networks, these attention-based models are much more effective at capturing long-range dependencies. Concretely, this is essential for language modeling when a model needs to understand a sentence that depends heavily on the words and context that appeared much earlier in the text.

Since 2022 there has been a massive acceleration of new Large Language Models launching. The timeline of AI is ramping up faster than ever, and with new capabilities, search technology will also evolve.

Conclusion

The world has made great progress in language models, relevancy, and probability in recent decades, leading to huge results for the advancement of search engine technology. Don’t miss the next article, 3 Must-Have Semantic Search Use Cases, where we move from theory to the essential business use cases where semantic search improves employee productivity and enhances the user experience.

About The Authors

Fabrice Lacroix

Fabrice is Fluid Topics visionary thinker. By tirelessly meeting clients, prospects and partners, he is sensing the needs of the market and fueling his creativity to invent the functions that makes Fluid Topics the market leading solution for technical content dynamic delivery.

Kelly Dell