Vue normale

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.
À partir d’avant-hierFlux principal

Exploring the Llama 4 Herd and what problem does it solve?

Par : Adesoji Alu
8 avril 2025 à 14:10
Hold onto your hats, folks, because the world of Artificial Intelligence has just been given a significant shake-up. Meta has unveiled their latest marvels: the Llama 4 herd, marking what they’re calling “the beginning of a new era of natively multimodal AI innovation”. This isn’t just another incremental update; it’s a leap forward that promises […]

Running Docker Desktop on NVIDIA Jetson Orin Nano Super for the first time

Par : Ajeet Raina
26 mars 2025 à 06:48
I’ve been eyeing the NVIDIA Jetson lineup for ages, and when the Orin Nano Super was released, I knew I had to get my hands on one. After weeks of hunting—and honestly, some desperate emails to NVIDIA—I finally scored this tiny yet mighty AI computer. My first thought? “Let’s get Docker Desktop running on this […]

Getting Started with NVIDIA Dynamo: A Powerful Framework for Distributed LLM Inference

19 mars 2025 à 03:04
In the rapidly evolving landscape of generative AI, efficiently serving large language models (LLMs) at scale remains a significant challenge. Enter NVIDIA Dynamo, an open-source inference framework specifically designed to address the complexities of serving generative AI models in distributed environments. In this blog post, we’ll explore what makes Dynamo special and provide a practical […]

Running DeepSeek R1 on Azure Kubernetes Service (AKS) using Ollama

Par : Adesoji Alu
11 mars 2025 à 12:28
Introduction DeepSeek is an advanced open-source code language model (LLM) that has gained significant popularity in the developer community. When paired with Ollama, an easy-to-use framework for running and managing LLMs locally, and deployed on Azure Kubernetes Service (AKS), we can create a powerful, scalable, and cost-effective environment for AI applications. This blog post walks […]

How to upgrade Jetpack 5.X to 6.X on NVIDIA Jetson Orin Nano Super

Par : Ajeet Raina
9 mars 2025 à 15:25
Recently, I upgraded my Jetson Orin Nano from JetPack 5.X to the latest JetPack 6.2. This represents a significant update, moving from Ubuntu 20.04 to Ubuntu 22.04 as the base OS and bringing numerous performance improvements. I’ve documented the entire process to help others make this transition smoothly. Why Upgrade? JetPack 6.1 offers several compelling […]

How to Run DeepSeek-V3 Locally on Ubuntu with Python 3.11: A Step-by-Step Guide

Par : Adesoji Alu
29 janvier 2025 à 15:32
Quantizing DeepSeek-V3 for Smaller GPUs Large language models (LLMs) like DeepSeek-V3 offer incredible capabilities, but their size often makes them challenging to run on consumer hardware. One technique to address this is quantization, which reduces the precision of the model’s weights, allowing it to fit into smaller GPUs. This blog post demonstrates how to load […]

Deploying NVIDIA NIM for Generative AI Applications

27 février 2025 à 07:56
NVIDIA’s NIM (Neural Inference Microservices) provides developers an efficient way to deploy optimized AI models from various sources, including community partners and NVIDIA itself. As part of the NVIDIA AI Enterprise suite, NIM offers a streamlined path to quickly iterate on and build innovative generative AI solutions. With NIM, you can easily deploy a microservice […]

Is NPU better than GPU?

Par : Adesoji Alu
2 octobre 2024 à 20:41
When discussing hardware acceleration for AI workloads, both Neural Processing Units (NPUs) and Graphics Processing Units (GPUs) are leading technologies. However, the question of whether an NPU is better than a GPU depends on several factors, such as the specific workload, power efficiency, and the use case. Let’s explore the key differences and advantages of […]

Exploring the Revolutionary Nemotron-4-340B-Instruct: Enhanced Instruction Following and Mathematical Reasoning

Par : Adesoji Alu
2 octobre 2024 à 09:03
Model Overview Nemotron-4-340B-Instruct is a large language model developed by NVIDIA, designed for English-based single and multi-turn chat applications. It has been fine-tuned for improved instruction-following capabilities and mathematical reasoning. Key points: Based on the Nemotron-4 architecture Supports context length of 4,096 tokens Pre-trained on a corpus of 9 trillion tokens Fine-tuned using Supervised Fine-tuning […]
❌
❌