Inferact launches with $150M in funding to commercialize vLLM

A group of artificial intelligence researchers today launched **Inferact Inc.**, a new startup that will commercialize the open-source vLLM project. The company is backed by $150 million in seed funding. Andreessen Horowitz and Lightspeed led the round with participation from Databricks Inc.’s venture capital arm, UC Berkeley Chancellor’s Fund, and several other backers. Their investment values Inferact at $800 million.

Inferact’s founding team includes computer science professor and Databricks co-founder **Ion Stoica**. He is currently the director of the University of California at Berkeley’s Sky Computing Lab, which developed the original version of vLLM in 2023. Since then, the project’s pool of code contributors has grown to more than 2,000 developers.

Software teams use vLLM to speed up inference workloads by applying a wide range of optimizations to large language models (LLMs). Many of these optimizations, including a particularly important vLLM feature called **PagedAttention**, focus on reducing models’ memory usage.

When an LLM receives a prompt, it completes a small portion of the calculations needed to produce an answer and saves the results to a so-called **KV cache**. It then performs another portion of the calculations, updates the KV cache with new results, and repeats this process until a prompt response is generated. Storing all those results requires a significant amount of memory.

PagedAttention makes it possible to store KV cache data in non-adjacent sections of a server’s RAM. This feature, along with other capabilities, significantly reduces memory waste and lowers the hardware consumption of LLMs.

For added efficiency, vLLM uses a method called **quantization** to compress AI models’ weights, thereby shrinking their memory footprint.

Besides optimizing RAM use, vLLM can also boost inference speeds. Typically, LLMs generate prompt responses one token at a time. With vLLM, developers can configure their models to generate multiple tokens simultaneously, reducing loading times for users.

“We see a future where serving AI becomes effortless,” Inferact co-founder **Woosuk Kwon** wrote in a blog post. “Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn’t disappear; it gets absorbed into the infrastructure we’re building.”

The blog post hints that Inferact plans to launch a paid, serverless version of vLLM. Many startups focused on commercializing open-source projects take this route. Usually, managed versions of open-source technologies automate administrative tasks such as provisioning infrastructure and downloading updates.

An Inferact job posting indicates plans to equip its software with observability, troubleshooting, and disaster recovery features. The listing also suggests that the software will run on **Kubernetes**.

Kwon added in today’s blog post that the Inferact team, which includes several core vLLM maintainers, will continue to enhance the upstream open-source version. The company plans to release new performance optimizations and add support for emerging AI model architectures. Additionally, Inferact aims to enable vLLM to run on a wider variety of data center hardware.
https://siliconangle.com/2026/01/22/inferact-launches-150m-funding-commercialize-vllm/

更多推荐

Leave a Reply

Your email address will not be published. Required fields are marked *

Sitemap Index