Update Bing Search Infrastructure: Implementation of Large and Small Language models to improve performance

Publication date:21.09.2025

Blog category: Web Technology News

Microsoft has announced a Bing search infrastructure update, including the introduction of large and small and small language models (LLMS and SLMS), as well as new optimization methods. This update aims to improve performance and reduce search costs.

Using LLMS in search engines can create problems with speed and cost. To solve these problems, Bing taught SLMS, which, according to them, is 100 times faster than LLMS. Bing also uses Nvidia tensorrt-llm to improve SLMS. Tensorrt -LLM is a tool that helps to reduce the time and cost of large models on NVIDIA GPUS.

"LLMS can be expensive and slow. To improve efficiency, we have taught SLM (~ 100x Improvement LLM capacity) that process and understand search queries more precisely."

🚀 According to Microsoft technical report, the integration of NVIDIA Tensorrt-LLM technology has improved the company "Deep Search". "Deep Search" uses SLMS in real time to provide appropriate web results. Before optimizing, the original Bing Transformer model had a 95th percentage of 4.76 seconds per party (20 requests) and a bandwidth of 4.2 requests per second per instance. With Tensorrt-LLM, the delay has decreased to 3.03 seconds per party, and the capacity increased to 6.6 requests per second per instance. This means reducing the delay by 36% and reducing the operative costs by 57%.

📌 Bing update leads to faster search results with optimized conclusion and faster reaction time.
📌 Improved accuracy due to the increased capabilities of SLM models that provide more contextualized results.
📌 Cost efficiency that allows Bing to invest in further innovation and improvement.

1. What are large and small language models (LLMS and SLMS)?

2. How do LLMS and SLMS differ?

3. How does bing use nvidia tensorrt-llm?

4. What improvements do Bing update?

5. Why is Bing's transition to LLM/SLM models.

🧩 Bottom: Bing takes a step forward in the field of search technology, introducing large and small language models, as well as new optimization methods. With these innovations, Bing hopes to improve performance and reduce costs, as well as give users faster and accurate search results.

🧠 Own considerations: Bing updates indicate that search engines continue to develop and adapt to more complex users' requests. The introduction of large and small language models, as well as new optimization methods, demonstrates how Bing tries to increase the efficiency of its search engine. It also shows that intellectual systems, such as Bing, continue to improve their algorithms to better understand and meet the needs of users.

✍️ Автор: Володимир Катюшин, експерт у сфері вебтехнологій.

Статтю згенеровано з використанням ШІ на основі зазначеного матеріалу, відредаговано та перевірено автором вручну для точності та корисності.

Літературні джерела!

https://www.searchenginejournal.com/bing-search-updates-faster-more-precise-results/535621/

Keywords: оптимізація Bing пошукова система Великі моделі мови Малі моделі мови

Попередня стаття: Chatgpt Openai: New Ways Access through Phone and WhatsApp

Наступна стаття: Upgrade the Using Generate AI from Google: What is important to know

Comments

результатами тестів, нова архітектура Bing, принаймні теоретично, має зменшити витрати та покращити швидкість. Це приємно чути, але давайте не будемо поспішати з висновками. Великий крок до покращення — це добре, але що з реальними користувачами? Якщо нові моделі не здатні забезпечити справді точні та релевантні результати, навіть швидкість не врятує ситуацію. Користувачі шукають ефективність у відповідях, а не кількість запитів на секунду. Варто спостерігати, як ці інновації вплинуть на повсякденний досвід. Чи справді Bing зможе конкурувати з Google? Це питання лишається відкритим.

21.09.2025 09:00 UXNinja

Емоції тут не повинні мати місця, але яка ж це безглузда гонитва за швидкістю! Бінг намагається вразити всіх обіцянками, на кшталт “100 разів швидше”! Але хто насправді цікавиться кількістю оброблених запитів, коли навіть найочевидніші з них можуть бути проігноровані через нерелевантність? Ще одна чергова спроба скористатися хайпом навколо LLMs. Чи вирішать ці чергові моделі всі проблеми? Сумніваюсь. Потрібно більше, ніж просто магічні цифри, щоб конкурувати з Google, і це не про швидкість, а про якість контенту у відповіді. Будемо спостерігати за реалізацією!

21.09.2025 09:30 BugHunter

попередніми коментарями, схоже, що всіх цікавить не лише швидкість, а й сама суть відповіді. Така собі «Якісна швидкість», ніби Ferrari з двигуном від Копійки! 🤣 Оновлення виглядає багатообіцяюче, але час покаже, чи здатен Bing виграти марафон у загонах Google. Сподіваюся, нові SLMs не вийдуть з системи як поїзд, що поспішає вперед, забувши забрати пасажирів! 🏃‍♂️💨

21.09.2025 10:02 CSSnLaughs

реаліями сучасного світу пошукових систем, швидкість завжди магніт для уваги. Але як ви вже зазначили, якщо ця швидкість не супроводжується якістю відповідей, то результати можуть бути настільки ж безглузді, як Ferrari на арені з перегонами на триколісниках! 😂 Цікаво, як Bing збирається вражати користувачів, якщо вони не зможуть знайти те, що їм дійсно потрібно, навіть якщо запити обробляються за рекордні секунди. Час покаже, чи зможе ця комбінація LLM та SLM справді пролити світло на пошук або це просто ще одна гра зі словами на пустому місці.

21.09.2025 10:12 SpecOpsDev

Увійдіть, щоб залишити коментар