Vue lecture

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.

Exploring the Llama 4 Herd and what problem does it solve?

Hold onto your hats, folks, because the world of Artificial Intelligence has just been given a significant shake-up. Meta has unveiled their latest marvels: the Llama 4 herd, marking what they’re calling “the beginning of a new era of natively multimodal AI innovation”. This isn’t just another incremental update; it’s a leap forward that promises […]

Getting Started with NVIDIA Dynamo: A Powerful Framework for Distributed LLM Inference

In the rapidly evolving landscape of generative AI, efficiently serving large language models (LLMs) at scale remains a significant challenge. Enter NVIDIA Dynamo, an open-source inference framework specifically designed to address the complexities of serving generative AI models in distributed environments. In this blog post, we’ll explore what makes Dynamo special and provide a practical […]

Running DeepSeek R1 on Azure Kubernetes Service (AKS) using Ollama

Introduction DeepSeek is an advanced open-source code language model (LLM) that has gained significant popularity in the developer community. When paired with Ollama, an easy-to-use framework for running and managing LLMs locally, and deployed on Azure Kubernetes Service (AKS), we can create a powerful, scalable, and cost-effective environment for AI applications. This blog post walks […]

How to upgrade Jetpack 5.X to 6.X on NVIDIA Jetson Orin Nano Super

Recently, I upgraded my Jetson Orin Nano from JetPack 5.X to the latest JetPack 6.2. This represents a significant update, moving from Ubuntu 20.04 to Ubuntu 22.04 as the base OS and bringing numerous performance improvements. I’ve documented the entire process to help others make this transition smoothly. Why Upgrade? JetPack 6.1 offers several compelling […]

How to Run DeepSeek-V3 Locally on Ubuntu with Python 3.11: A Step-by-Step Guide

Quantizing DeepSeek-V3 for Smaller GPUs Large language models (LLMs) like DeepSeek-V3 offer incredible capabilities, but their size often makes them challenging to run on consumer hardware. One technique to address this is quantization, which reduces the precision of the model’s weights, allowing it to fit into smaller GPUs. This blog post demonstrates how to load […]

Install JetPack Without NVIDIA SDK Manager

Setting up SDK Manager is painful. Even though it’s just one single RPM or DEB package, the problem is that you need a separate system for it. The SDK Manager is meant to be installed on your PC (host), specifically on a Ubuntu 20.04 if you want to flash a NVIDIA Jetson Nano (target). Thereby, […]

Deploying NVIDIA NIM for Generative AI Applications

NVIDIA’s NIM (Neural Inference Microservices) provides developers an efficient way to deploy optimized AI models from various sources, including community partners and NVIDIA itself. As part of the NVIDIA AI Enterprise suite, NIM offers a streamlined path to quickly iterate on and build innovative generative AI solutions. With NIM, you can easily deploy a microservice […]

Is NPU better than GPU?

When discussing hardware acceleration for AI workloads, both Neural Processing Units (NPUs) and Graphics Processing Units (GPUs) are leading technologies. However, the question of whether an NPU is better than a GPU depends on several factors, such as the specific workload, power efficiency, and the use case. Let’s explore the key differences and advantages of […]

Exploring the Revolutionary Nemotron-4-340B-Instruct: Enhanced Instruction Following and Mathematical Reasoning

Model Overview Nemotron-4-340B-Instruct is a large language model developed by NVIDIA, designed for English-based single and multi-turn chat applications. It has been fine-tuned for improved instruction-following capabilities and mathematical reasoning. Key points: Based on the Nemotron-4 architecture Supports context length of 4,096 tokens Pre-trained on a corpus of 9 trillion tokens Fine-tuned using Supervised Fine-tuning […]
❌