Original Title: "IOSG Weekly Brief|The Holy Grail of Crypto AI: Exploring the Frontiers of Decentralized Training #280"
Original Author: Jacob Zhao (X @0xjacobzhao), IOSG Ventures
Within the full value chain of AI, model training stands as the most resource-intensive and technically demanding phase, directly setting the upper limit of a model’s capabilities and practical application effects. Compared to the lightweight invocation of the inference stage, the training phase necessitates sustained large-scale computational power, intricate data processing workflows, and support from high-intensity optimization algorithms. It represents the true "heavy industry" in AI system building. From an architectural paradigm perspective, training methodologies can be categorized into four types: centralized training, distributed training, federated learning, and the primary focus of this article, decentralized training.
Centralized training is the most common traditional approach, with a single entity completing the entire training process locally within high-performance clusters. From hardware (e.g., NVIDIA GPUs), low-level software (CUDA, cuDNN), cluster orchestration systems (e.g., Kubernetes), to training frameworks (e.g., PyTorch with NCCL backend), all components are orchestrated by a unified control system. This deeply integrated system architecture enables optimal efficiency in memory sharing, gradient synchronization, and fault tolerance mechanisms. It is well-suited for training large-scale models such as GPT and Gemini, offering high efficiency and controllable resources. However, it also raises issues like data monopolization, resource barriers, energy consumption, and single point of failure risks.
Distributed Training is currently the mainstream approach for large model training. Its core lies in decomposing the model training task and distributing it across multiple machines for collaborative execution, thereby overcoming the computational and storage bottlenecks of a single machine. Despite being "distributed" in a physical sense, it is still centrally controlled for scheduling and synchronization, often operating in a high-speed local network environment via technologies like NVLink high-speed interconnect bus. This is coordinated by a master node that oversees all sub-tasks. Mainstream methods include:
· Data Parallelism: Each node trains on different data while sharing parameters, requiring synchronization of model weights.
· Model Parallelism: Different parts of the model are deployed on different nodes to achieve strong scalability.
· Pipeline Parallelism: Execution is divided into sequential stages to improve throughput.
· Tensor Parallelism: Matrix computations are fine-grained for increased parallelism granularity.
Distributed training is a combination of "centralized control + distributed execution," analogous to a single boss remotely coordinating multiple "offices" where employees collaborate to complete tasks. Almost all mainstream large models (e.g., GPT-4, Gemini, LLaMA) are trained using this approach.
Decentralized Training represents a more open and censorship-resistant future path. Its core feature involves multiple untrusted nodes (which could be home computers, cloud GPUs, or edge devices) collaborating on training tasks without a central coordinator. Tasks are typically distributed and coordinated through protocols, with cryptographic incentive mechanisms ensuring the integrity of contributions. The main challenges of this model include:
· Device heterogeneity and partitioning difficulties: Coordinating heterogeneous devices is challenging, and task partitioning efficiency is low;
· Communication efficiency bottlenecks: Network instability leads to significant bottlenecks in gradient synchronization;
· Lack of trusted execution: Absence of a trusted execution environment makes it difficult to verify if nodes are genuinely participating in computation;
· Lack of unified coordination: Without a central scheduler, task distribution and exception rollback mechanisms are complex.
Decentralized training can be understood as a group of global volunteers each contributing computing power to collaboratively train models. However, "truly feasible large-scale decentralized training" remains a systemic engineering challenge, involving various layers such as system architecture, communication protocols, cryptographic security, economic mechanisms, and model validation. Whether it can achieve "effective collaboration + honest incentives + correct results" is still in the early stage of prototype exploration.
Federated Learning serves as a transitional form between distributed and decentralized paradigms. It emphasizes local data retention with centralized aggregation of model parameters, making it suitable for privacy-compliant scenarios (e.g., healthcare, finance). Federated Learning combines the engineering structure of distributed training and local collaboration capabilities, along with the data dispersion advantage of decentralized training. However, it still relies on a trusted coordinator and lacks full openness and censorship resistance. It can be viewed as a "controlled decentralization" approach tailored for privacy-compliant use cases. With relatively milder requirements for tasks, trust structures, and communication mechanisms, it is better suited as a transitional deployment architecture in industry applications.
From the perspective of training paradigms, decentralized training is not suitable for all types of tasks. In certain scenarios, due to the complexity of task structures, high resource demands, or collaboration difficulties, it is inherently inefficient to complete such tasks across heterogeneous, trustless nodes. For instance, large model training often relies on high VRAM, low latency, and high-speed bandwidth, making it challenging to efficiently partition and synchronize within an open network environment. Tasks that are constrained by strong data privacy and sovereignty restrictions (e.g., medical, financial, or sensitive data) are bound by legal compliance and ethical limitations, making open sharing unfeasible. Similarly, tasks lacking a foundational incentive for collaboration (e.g., enterprise closed-source models or internal prototype training) lack motivation for external participation. These boundaries collectively define the current practical limitations of decentralized training.
However, this does not imply that decentralized training is a false proposition. In fact, for task types that are lightweight in structure, easily parallelizable, and incentivizable, decentralized training demonstrates clear application potential. These include but are not limited to: LoRA fine-tuning, behavior alignment post-training tasks (e.g., RLHF, DPO), crowd-sourced data training and annotation tasks, resource-controllable small foundational model training, and collaborative training scenarios involving edge devices. Such tasks are typically characterized by high parallelism, low coupling, and tolerance for heterogeneous computing power, making them highly suitable for collaborative training via P2P networks, Swarm protocols, distributed optimizers, and similar approaches.
Currently, in the frontier of decentralized training and federated learning, notable blockchain projects mainly include Prime Intellect, Pluralis.ai, Gensyn, Nous Research, and Flock.io. From the perspective of technical innovation and engineering complexity, Prime Intellect, Nous Research, and Pluralis.ai have made significant original contributions in system architecture and algorithm design, representing the cutting edge of theoretical research. On the other hand, Gensyn and Flock.io follow relatively clear implementation paths and have showcased initial progress in engineering. This article will analyze the core technologies and engineering architectures behind these five projects and further explore their differences and complementary aspects within the context of decentralized AI training frameworks.
Prime Intellect is committed to building a trustless AI training network where anyone can participate in training and receive credible rewards for their computational contributions. Through its three core modules—PRIME-RL, TOPLOC, and SHARDCAST—Prime Intellect aims to establish a decentralized AI training system that is verifiable, open, and equipped with a complete incentivization mechanism.
```htmlPRIME-RL: Decoupled Asynchronous Reinforcement Learning Task Architecture
PRIME-RL is a task modeling and execution framework tailored by Prime Intellect for decentralized training scenarios, designed specifically for heterogeneous networks and asynchronous participation. It prioritizes reinforcement learning as its main compatibility focus, structurally decoupling the training, inference, and weight uploading processes. This allows each training node to independently complete task loops locally, while collaborating with others via standardized interfaces and verification and aggregation mechanisms. Compared to traditional supervised learning workflows, PRIME-RL is more suited for achieving elastic training in environments without centralized scheduling. It reduces system complexity and lays the foundation for supporting multi-task parallelism and policy evolution.
TOPLOC: Lightweight Training Behavior Verification Mechanism
TOPLOC (Trusted Observation & Policy-Locality Check) is the core mechanism proposed by Prime Intellect for the verifiability of training, used to determine whether a node has effectively learned policies based on observed data. Unlike heavy solutions such as ZKML, TOPLOC does not rely on full model recomputation. Instead, it analyzes the localized consistency trajectory between "observation sequences ↔ policy updates" to complete lightweight structural verification. TOPLOC is the first mechanism to transform behavioral trajectories in the training process into verifiable entities, a key innovation that enables trustless distribution of training rewards and provides a feasible pathway for building auditable and incentivizable decentralized collaborative training networks.
SHARDCAST: Asynchronous Weight Aggregation and Propagation Protocol
SHARDCAST is the weight propagation and aggregation protocol designed by Prime Intellect, optimized for real-world network environments characterized by asynchrony, limited bandwidth, and dynamic node states. Combining the gossip dissemination mechanism with localized synchronization strategies, it allows multiple nodes to continuously submit partial updates even in unsynchronized states, enabling progressive convergence of weights and multi-version evolution. Compared to centralized or synchronous AllReduce methods, SHARDCAST significantly enhances the scalability and fault tolerance of decentralized training. It serves as the core foundation for building stable weight consensus and iterative training in dynamic decentralized systems.
```OpenDiLoCo: Sparse Asynchronous Communication Framework
OpenDiLoCo is a communication optimization framework independently developed and open-sourced by the Prime Intellect team, inspired by DeepMind's DiLoCo concept. It is specifically designed to tackle challenges commonly encountered in decentralized training, such as bandwidth limitations, device heterogeneity, and node instability. The framework's architecture is based on data parallelism and leverages sparse topological structures like Ring, Expander, and Small-World to avoid the high communication overhead of global synchronization. Instead, it relies solely on local neighbor nodes to achieve collaborative model training. With a combination of asynchronous updates and checkpoint fault-tolerance mechanisms, OpenDiLoCo enables consumer-grade GPUs and edge devices to stably participate in training tasks, significantly enhancing the accessibility of global collaborative training. It serves as a key communication infrastructure for building decentralized training networks.
PCCL: Collective Communication Library
PCCL (Prime Collective Communication Library) is a lightweight communication library specifically tailored by Prime Intellect for decentralized AI training environments. It addresses the adaptation bottlenecks of traditional communication libraries (e.g., NCCL, Gloo) under heterogeneous devices and low-bandwidth networks. PCCL supports sparse topology, gradient compression, low-precision synchronization, and checkpoint recovery. It can operate on consumer-grade GPUs and unstable nodes, serving as the foundational component that powers the asynchronous communication capabilities of the OpenDiLoCo protocol. By significantly improving bandwidth tolerance and device compatibility within the training network, PCCL bridges the "last mile" of communication for building truly open, trustless collaborative training networks.
Prime Intellect has constructed a permissionless, verifiable training network with economic incentives, allowing anyone to participate in tasks and earn rewards based on genuine contributions. The protocol operates through three core roles:
· Task Initiators: Define the training environment, initial models, reward functions, and validation criteria.
· Training Nodes: Perform local training, submit weight updates, and provide observation trajectories.
· Validation Nodes: Verify the authenticity of training behaviors using the TOPLOC mechanism and participate in reward calculation and strategy aggregation.
The core protocol workflow includes task publishing, node training, trajectory verification, weight aggregation (SHARDCAST), and reward distribution, forming an incentive loop centered around "genuine training behavior."
In May 2025, Prime Intellect released INTELLECT-2, the world's first reinforcement learning large model trained through asynchronous, trustless decentralized node collaboration, with a parameter size of 32B. The INTELLECT-2 model was collaboratively trained by over 100 heterogeneous GPU nodes spread across three continents, utilizing a fully asynchronous architecture. The training lasted over 400 hours, demonstrating the viability and stability of asynchronous collaborative networks. This model not only represents a breakthrough in performance but also marks the first systemic implementation of the "Training-as-Consensus" paradigm proposed by Prime Intellect. INTELLECT-2 integrates core protocol modules such as PRIME-RL (asynchronous training structure), TOPLOC (training behavior verification), and SHARDCAST (asynchronous weight aggregation), signifying the inaugural realization of an open, verifiable, and economically incentivized decentralized training network.
In terms of performance, INTELLECT-2 is based on QwQ-32B and incorporates specialized RL (reinforcement learning) training at both the code and mathematical levels, representing the cutting-edge of current open-source RL fine-tuned models. While it has yet to surpass closed-source models like GPT-4 or Gemini, its true significance lies in being the world's first decentralized model with a fully reproducible, verifiable, and auditable training process. Prime Intellect not only open-sourced the model but, more importantly, open-sourced the entire training process itself — including training data, strategy update trajectories, verification procedures, and aggregation logic, thus building a prototype of a decentralized training network that is accessible to all, trustworthy in collaboration, and conducive to shared rewards.
Pluralis is a Web3 AI project focused on "trustworthy collaborative training networks," with its core objective being to promote a decentralized, open, and long-term incentive-driven model training paradigm. Distinct from the current mainstream centralized or closed training approaches, Pluralis introduces an entirely new concept called Protocol Learning: protocolizing the model training process to build an open training system by leveraging verifiable collaboration mechanisms and mapping model ownership. This system is designed with an inherent incentive loop to encourage broad participation and collaboration.
Pluralis introduces Protocol Learning, which is built upon three key pillars:
1. Unmaterializable Models
Models are distributed in fragments across multiple nodes, making it impossible for any single node to reconstruct the full set of weights, ensuring the model remains closed-source. This design inherently turns the model into a "protocol-native asset," enabling access credential control, leakage prevention, and binding of revenue attribution.
2. Model-Parallel Training over the Internet
Through an asynchronous pipeline mechanism (SWARM architecture), different nodes hold only fragments of the model's weights and collaborate over low-bandwidth networks to complete training or inference tasks.
3. Partial Ownership for Incentives
All participating nodes are granted partial ownership of the model based on their contribution to training, thus entitling them to future revenue sharing and protocol governance rights.
Unmaterializable Models
First systematically introduced in "A Third Path: Protocol Learning," this mechanism ensures model weights are distributed in fragments, guaranteeing that "model assets" can only operate within the SWARM network. It also ensures both access and revenue are controlled by the protocol. This mechanism is foundational for enabling a sustainable incentive structure for decentralized training.
Asynchronous Model-Parallel Training
In "SWARM Parallel with Asynchronous Updates," Pluralis developed an asynchronous pipeline-based model-parallel training architecture and demonstrated it on LLaMA-3. The core innovation lies in introducing the Nesterov Accelerated Gradient (NAG) mechanism, which effectively addresses gradient drift and convergence instability issues during asynchronous updates. This makes training across heterogeneous devices feasible even in low-bandwidth environments.
```htmlColumn-Space Sparsification
In "Beyond Top-K," a structure-aware column-space compression method was introduced to replace traditional Top-K, avoiding the destruction of semantic paths. This mechanism balances model accuracy and communication efficiency. Testing in asynchronous model parallel environments shows that it can reduce communication data by over 90%, making it a key breakthrough in achieving structure-aware efficient communication.
Pluralis clearly positions "asynchronous model parallelism" as its core focus, emphasizing the following advantages over data parallelism:
· Supports low-bandwidth networks and heterogeneous nodes;
· Adapts to device heterogeneity, enabling participation by consumer-grade GPUs;
· Naturally supports elastic scheduling, allowing nodes to frequently join or leave;
· Achieves three major breakthroughs: structural compression + asynchronous updates + weight unextractability.
Currently, based on six technical blog posts published on its official website, the logical structure is consolidated into the following three main lines:
1. Philosophy and Vision: "A Third Path: Protocol Learning," "Why Decentralized Training Matters"
2. Technical Mechanism Details: "SWARM Parallel," "Beyond Top-K," "Asynchronous Updates"
3. Exploration of Institutional Innovation: "Unmaterializable Models," "Partial Ownership Protocols"
As of now, Pluralis has not launched a product, testnet, or opened its codebase. The reason lies in its chosen technical path, which is extremely challenging: fundamental issues like underlying system architecture, communication protocols, and ensuring weight irreversibility must be resolved before product services can be built on top of them.
In a new paper released by Pluralis Research in June 2025, the decentralized training framework is extended from model pretraining to the model fine-tuning stage. It supports asynchronous updates, sparse communication, and partial weight aggregation. Compared to its previous focus on theoretical developments and pretraining, this work emphasizes practical feasibility, marking further maturity in its full training lifecycle architecture.
``````htmlGensyn is a Web3 AI project focused on the "trusted execution of deep learning training tasks." Its core is not about reconstructing model architectures or training paradigms but about building a verifiable distributed training execution network that supports "task distribution + training execution + result verification + fair incentives" throughout the entire process. By employing an off-chain training and on-chain verification architecture, Gensyn establishes an efficient, open, incentive-compatible global training marketplace, turning the concept of "training as mining" into a reality.
Gensyn is not about "how to train" but about "who trains, how to verify, and how to distribute rewards." Essentially, it is a verifiable computation protocol for training tasks, aiming to solve core challenges such as:
· Who will execute the training tasks (compute resource distribution and dynamic matching)
· How to verify the execution results (without full recomputation, only verifying disputed operators)
· How to distribute training rewards (Stake, Slashing, and multi-role game mechanisms)
RL Swarm: Collaborative Reinforcement Learning Training System
Gensyn’s pioneering RL Swarm is a decentralized multi-model collaborative optimization system designed for the post-training phase, featuring the following key characteristics:
Distributed inference and learning pipeline:
· Generation Phase (Answering): Each node independently outputs an answer;
· Critique Phase (Critique): Nodes critique others' outputs, selecting the best answers and reasoning;
· Consensus Phase (Resolving): Nodes predict the majority's preference and adjust their own responses accordingly, achieving local weight updates.
The RL Swarm proposed by Gensyn is a decentralized multi-model collaborative optimization system where each node runs independent models and conducts local training without gradient synchronization. It naturally adapts to heterogeneous computational resources and unstable network environments, supporting elastic node ingress and exit. This mechanism draws on concepts from RLHF (Reinforcement Learning with Human Feedback) and multi-agent game theory, yet aligns more closely with the dynamic evolution logic of collaborative reasoning networks. Nodes earn rewards based on the alignment of their outputs with the group consensus, driving continuous optimization of reasoning capabilities and convergent learning. RL Swarm significantly enhances model robustness and generalization in open networks and has already been deployed as a core execution module in Gensyn's Ethereum Rollup-based Testnet Phase 0.
```Verde + Proof-of-Learning: Trusted Verification Mechanism
The Verde module of Gensyn incorporates three mechanisms:
1. Proof-of-Learning: Determines whether training has genuinely occurred based on gradient trajectories and training metadata;
2. Graph-Based Pinpoint: Identifies divergent nodes in the computational graph, requiring recomputation of only specific operations;
3. Refereed Delegation: Implements an arbitration-based verification system, where verifiers and challengers raise disputes that are locally verified, significantly reducing verification costs.
Compared to ZKP (Zero-Knowledge Proofs) or full recomputation verification solutions, the Verde approach strikes a better balance between verifiability and efficiency.
SkipPipe: Communication Fault Tolerance Optimization Mechanism
SkipPipe is designed to address communication bottlenecks in "low bandwidth + node dropout" scenarios. Its core capabilities include:
· Skip Ratio: Skips constrained nodes, avoiding training bottlenecks;
· Dynamic Scheduling Algorithm: Generates optimal execution paths in real time;
· Fault-Tolerant Execution: Maintains inference accuracy with only an approximate 7% drop, even if 50% of nodes fail.
This enables a training throughput improvement of up to 55% and supports key features such as "early-exit inference," "seamless rearrangement," and "inference completion."
HDEE: Cross-Domain Heterogeneous Expert Clusters
The HDEE (Heterogeneous Domain-Expert Ensembles) module is designed to optimize the following scenarios:
· Multi-domain, multi-modal, multi-task training;
· Imbalanced data distribution and significant variations in task difficulty;
· Task allocation and scheduling in environments with heterogeneous computational capabilities and inconsistent communication bandwidth.
Core Features:
· MHe-IHo: Assigns models of varying sizes to tasks of different difficulties (model heterogeneity, consistent training steps);
· MHo-IHe: Unified task difficulty, but with asynchronous adjustments to training steps;
· Supports heterogeneous expert models + pluggable training strategies, enhancing adaptability and fault tolerance;
· Emphasizes "parallel collaboration + ultra-low communication + dynamic expert allocation," suitable for complex task ecosystems in real-world scenarios.
Multi-Role Game Mechanism: Trust and Incentives in Parallel
The Gensyn network introduces four types of participants:
1. Submitter: Publishes training tasks, sets structure and budget;
2. Solver: Executes training tasks and submits results;
3. Verifier: Validates training behavior to ensure compliance and validity;
4. Whistleblower: Challenges verifiers, earning arbitration rewards or facing penalties.
This mechanism draws inspiration from the Truebit economic game design, leveraging forced error insertion + random arbitration to incentivize honest collaboration among participants and ensure trustworthy network operations.
Nous Research is one of the few decentralized training teams balancing philosophical depth with engineering implementation. Its core vision is rooted in the "Desideratic AI" concept: conceptualizing AI as an intelligent agent with subjectivity and evolutionary capability, rather than as a mere controllable tool. What sets Nous Research apart is that it does not treat AI training as an "efficiency problem" to optimize but rather as a process of forming "cognitive agents." Driven by this vision, Nous focuses on constructing an open training network involving heterogeneous nodes for collaborative training, without central scheduling and resistant to censorship validation, all while achieving systematic implementation through a full-stack toolchain.
Rather than investing heavily in incentive design or protocol economics, Nous seeks to alter the philosophical premise of training itself:
· Opposition to "alignmentism": Disagreeing with "alignment training" that solely aims to control AI under human preferences; advocating for training methods that encourage models to develop independent cognitive styles.
· Emphasis on model agency: Suggesting that foundational models should retain uncertainty, diversity, and the ability to generate hallucinations (hallucination as virtue).
· Model training as cognitive formation: Believing that models are not about "optimizing task completion" but rather about being individuals participating in a cognitive evolution process.
This approach to training, while "romantic," reflects the core logic of Nous's design of training infrastructures: how to enable heterogeneous models to evolve within an open network, rather than being uniformly regimented.
Nous's most critical contribution to decentralized training lies in the construction of the Psyche Network and the foundational communication optimizer DisTrO (Distributed Training Over-the-Internet), which together function as the execution hub for training tasks:
DisTrO + Psyche Network possess several key capabilities, including communication compression (using DCT + 1-bit sign encoding, drastically reducing bandwidth requirements), node adaptability (support for heterogeneous GPUs, reconnection after disconnection, and voluntary exit), asynchronous fault tolerance (enabling continued training without strict synchronization, with high fault tolerance), and a decentralized scheduling mechanism (eliminating the need for a central coordinator, leveraging blockchain for consensus and task distribution). This architecture provides a practical and feasible technical foundation for a low-cost, resilient, and verifiable open-training network.
This architectural design underscores practical feasibility: it operates without relying on central servers, adapts to global volunteer nodes, and ensures on-chain traceability of training results.
In addition to building decentralized training infrastructures, Nous Research has spearheaded several exploratory system experiments revolving around the concept of "AI agency."
Hermes open-source model series
The Hermes 1 to 3 series are representative open-source large-scale models released by Nous, based on LLaMA 3.1 training, covering three parameter scales: 8B, 70B, and 405B. This series reflects Nous's advocated training philosophy of "de-instructionalization and diversity retention," demonstrating stronger expressive and generalization capabilities in long-context retention, role-playing, and multi-turn dialogue scenarios.
Forge Reasoning API: Multi-Modal Reasoning System
Forge is a proprietary reasoning framework developed by Nous, combining three complementary mechanisms to achieve more flexible and creative reasoning capabilities:
· MCTS (Monte Carlo Tree Search): Suitable for strategic searches in complex tasks;
· CoC (Chain of Code): Integrating pathways that combine code chains and logical reasoning;
· MoA (Mixture of Agents): Allows multiple models to collaborate, enhancing output breadth and diversity.
This system emphasizes "non-deterministic reasoning" and compositional generation pathways, providing a strong alternative to the traditional instruction alignment paradigm.
TEE_HEE: AI Autonomous Agent Experiment
TEE_HEE represents Nous' frontier exploration in autonomous agents, aiming to verify whether AI can independently operate within a Trusted Execution Environment (TEE) while possessing a unique digital identity. This agent is equipped with dedicated Twitter and Ethereum accounts, with all control permissions managed remotely by a verifiable enclave, preventing developers from intervening in its actions. The experiment's goal is to construct AI entities with "immutability" and "independent behavioral intent," marking a significant step towards building autonomously intelligent agents.
AI Behavior Simulation Platform
Nous has also developed several simulators, including WorldSim, Doomscroll, and Gods & S8n, to study how AI behaves in multi-role social environments and the mechanisms behind the evolution of value systems. While these simulators do not directly contribute to training processes, their experiments form a semantic-layer foundation for cognitive behavior modeling in long-term autonomous AI.
Flock.io is a blockchain-based federated learning platform designed to decentralize data, computation, and model training. Flock adopts an integrated framework of "Federated Learning + Blockchain Incentive Layer," essentially representing an on-chain evolution of traditional FL architectures rather than a systemic exploration of new training protocols. Compared to decentralized training projects like Gensyn, Prime Intellect, Nous Research, and Pluralis, Flock focuses on privacy protection and usability improvements rather than theoretical breakthroughs in communication, validation, or training methodologies. Its most suitable comparisons are with federated learning systems like Flower, FedML, and OpenFL.
```htmlFederated Learning Architecture: Emphasizing Data Sovereignty and Privacy Preservation
Flock is built on the classic Federated Learning (FL) paradigm, allowing multiple data owners to collaboratively train a unified model without sharing raw data, focusing on addressing issues related to data sovereignty, security, and trust. The core process includes:
· Local Training: Each participant (Proposer) trains the model on their local device without uploading raw data;
· On-chain Aggregation: After training, local weight updates are submitted and aggregated into a global model by on-chain miners;
· Committee Evaluation: Voter nodes, randomly selected via VRF, evaluate the aggregated model using independent test datasets and provide scores;
· Incentivization and Penalization: Based on the scores, rewards are distributed or staked deposits are slashed, enabling anti-malicious behavior and dynamic trust maintenance.
Blockchain Integration: Establishing a Trustless Coordination System
Flock integrates all key stages of the training process—task assignment, model submission, evaluation and scoring, incentive execution—onto the blockchain to ensure transparency, verifiability, and censorship resistance. The main mechanisms include:
· VRF-Based Random Election Mechanism: Enhances fairness and resistance to manipulation for Proposer and Voter rotations;
· Proof-of-Stake (PoS) Mechanism: Uses token staking and penalization to constrain node behavior and improve system robustness;
· On-chain Incentive Automation: Leverages smart contracts to bind task completion and evaluation results to reward distribution and slashing, creating a trustless collaborative network.
zkFL: Privacy-Protecting Innovation with Zero-Knowledge Aggregation Mechanism
Flock introduces the zkFL zero-knowledge aggregation mechanism, enabling Proposers to submit zero-knowledge proofs of local updates. Voters can verify their correctness without accessing the original gradients, achieving improved privacy protection and enhanced trust in the training process. This represents a significant innovation in federated learning by merging privacy preservation and verifiability.
``````htmlAI Arena: This is the decentralized training platform of Flock.io, where users can participate in model tasks via train.flock.io by taking on roles as trainers, validators, or delegators. Users are rewarded for submitting models, assessing performance, or delegating tokens. Currently, tasks are published by the official team, but in the future, this will gradually open to community co-creation.
FL Alliance: This is the federated learning client of Flock, enabling participants to fine-tune models further using private data. Through mechanisms like VRF selection, staking, and slashing, the platform ensures the honesty and collaborative efficiency of the training process. It serves as the critical bridge between community pre-training and real-world deployment.
AI Marketplace: This is a platform for co-creation and deployment of models. Users can propose models, contribute data, and invoke model services. It supports database integration and RAG-enhanced reasoning, driving AI models' application and circulation across various practical scenarios.
Compared to decentralized training projects, systems like Flock, based on federated learning, offer advantages in training efficiency, scalability, and privacy protection. These attributes make it especially suitable for collaborative training of small- to medium-scale models. The approach is pragmatic and ready for deployment, emphasizing engineering feasibility optimizations. In contrast, projects such as Gensyn and Pluralis focus on deeper theoretical breakthroughs in training methods and communication mechanisms. While they tackle larger system challenges, they are more aligned with the exploration of truly "trustless and decentralized" training paradigms.
EXO is a highly representative AI project in edge computing scenarios, aiming to enable lightweight AI training, inference, and agent applications on consumer-grade home devices. Its decentralized training path emphasizes "low communication overhead + local autonomous execution," utilizing the DiLoCo asynchronous delayed synchronization algorithm and the SPARTA sparse parameter exchange mechanism to significantly reduce bandwidth demands for multi-device collaborative training. On the system side, EXO does not construct an on-chain network or introduce economic incentive mechanisms. Instead, it offers a single-machine multi-process simulation framework called EXO Gym, allowing researchers to easily conduct rapid validation and experimentation of distributed training methods within local environments.
· DiLoCo Asynchronous Training: Synchronizes nodes every H steps, suitable for unstable networks;
· SPARTA Sparse Synchronization: Exchanges only a minimal portion of parameters per step (e.g., 0.1%), maintaining model relevance while reducing bandwidth requirements;
```· Asynchronous Composite Optimization: Both methods can be used in combination to achieve a better trade-off between communication and performance.
evML Verification Mechanism Exploration: Edge-Verified Machine Learning (evML) proposes leveraging TEE / Secure Context for low-cost computation verification. Through remote verification and spot-check mechanisms, evML enables trusted participation of edge devices without staking requirements, serving as an engineered trade-off between economic security and privacy protection.
· EXO Gym: Simulates a multi-node training environment on a single device, supporting experiments on communication strategies for models like NanoGPT, CNN, and Diffusion;
· EXO Desktop App: A desktop AI tool designed for individual users, offering features such as local large model execution, iPhone mirroring control, and privacy-friendly personalized functions like private context integration (e.g., SMS, calendar, video recordings).
EXO Gym is more akin to an exploration-driven decentralized training experimental project, primarily achieved through integrating existing communication compression techniques (e.g., DiLoCo and SPARTA) to streamline the training pipeline. Compared to projects like Gensyn, Nous, and Pluralis, EXO has not yet entered the core phases of on-chain collaboration, verifiable incentive mechanisms, or real distributed network deployments.
In addressing core challenges in decentralized training such as device heterogeneity, communication bottlenecks, coordination issues, and a lack of trusted execution, projects like Gensyn, Prime Intellect, Pluralis, and Nous Research each propose differentiated system architecture pathways. From the perspectives of training methods and communication mechanisms, these four projects exhibit unique technological focuses and engineering implementation logics.
In terms of training method optimization, the four projects explore key dimensions such as collaborative strategies, update mechanisms, and asynchronous controls, covering all stages from pretraining to post-training.
Prime Intellect's PRIME-RL framework adopts an asynchronous scheduling structure tailored for the pretraining phase. By employing a strategy of "local training + periodic synchronization," it enables efficient and verifiable training scheduling in heterogeneous environments. This method is highly generalizable and flexible, offering significant theoretical innovation through its clear paradigms in training control structures. However, its engineering implementation demand is moderate to high, requiring substantial capabilities in underlying communication and control modules.
The DeMo optimizer, developed by Nous Research, focuses on addressing the issue of training stability in asynchronous, low-bandwidth environments. It enables a fault-tolerant gradient update process under heterogeneous GPU conditions, making it one of the few solutions that unify theory and engineering in the closed-loop "asynchronous communication compression" domain. Its theoretical innovation is highly significant, especially in the area of compression and scheduling synergy. From an engineering perspective, it is challenging, as it heavily relies on achieving precision in asynchronous parallel coordination.
Pluralis' SWARM + NAG is one of the most systematic and groundbreaking designs in the asynchronous training domain today. Built upon an asynchronous model parallel framework, it introduces column-space sparse communication and NAG momentum correction, forming a stable large-model training solution under low-bandwidth conditions. Its theoretical innovation is extremely high, positioning it as a structural pioneer in asynchronous collaborative training. The engineering difficulty is equally substantial, requiring deep integration of multi-level synchronization and model partitioning.
Gensyn's RL Swarm primarily serves the post-training phase, focusing on policy fine-tuning and multi-agent collaborative learning. Its training process follows a three-step cycle of "Generate - Evaluate - Vote," making it particularly suitable for dynamic adjustments of complex behaviors in multi-agent systems. The theoretical innovation is medium to high, primarily reflected in its collaborative logic for agents. The engineering implementation difficulty is moderate, with the main challenges revolving around system scheduling and behavior convergence control.
In terms of communication mechanism optimization, these four projects have each carved out targeted approaches, with a common focus on addressing system solutions for bandwidth bottlenecks, heterogeneity among nodes, and scheduling stability issues.
Prime Intellect's PCCL is a lower-level communication library designed to replace traditional NCCL, aiming to provide a more robust collective communication foundation for upper-layer training protocols. The theoretical innovation is medium to high, with certain breakthroughs in fault-tolerant communication algorithms. The engineering difficulty is moderate, with strong modular adaptability.
Nous Research's DisTrO serves as the communication core for DeMo, emphasizing minimal communication overhead under low-bandwidth conditions while ensuring closed-loop coherence in training. The theoretical innovation is high, offering generalizable value in scheduling collaborative structures. The engineering difficulty is high, with demanding requirements on compression precision and training synchronization.
Pluralis' communication mechanism is deeply embedded within the SWARM architecture, significantly reducing communication overhead in asynchronous training of large models while maintaining convergence with high throughput. The theoretical innovation is high, setting a paradigm for asynchronous model communication design. The engineering difficulty is extremely high, depending on distributed model orchestration and structural sparsity control.
Gensyn's SkipPipe is a fault-tolerant scheduling component designed to complement RL Swarm. It has a low deployment cost and mainly enhances training stability at the engineering deployment layer. The theoretical innovation is relatively modest, focusing more on the engineering application of known mechanisms. The engineering difficulty is low, but it proves to be highly practical in real-world deployments.
Furthermore, the value of decentralized training projects can be assessed via two overarching dimensions: the blockchain collaboration layer and the AI training system layer:
· Blockchain Collaboration Layer: Emphasizing protocol trustworthiness and incentivized collaboration logic
· Verifiability: Whether the training process is verifiable and whether game-theoretic or cryptographic mechanisms are introduced to establish trust;
· Incentive Mechanism: Whether task-driven token rewards/role mechanisms are designed;
· Openness and Admission Threshold: Whether nodes are easy to connect to, and whether there is centralization or permission control.
· AI Training System Layer: Highlighting engineering capabilities and performance attainability
· Scheduling and Fault Tolerance Mechanisms: Whether it supports fault tolerance, asynchronous, dynamic, and distributed scheduling;
· Training Method Optimization: Whether there are optimizations for model training algorithms or structures;
· Communication Path Optimization: Whether gradient compression/sparse communication is applied, adapting to low bandwidth.
The following table systematically evaluates the technical depth, engineering maturity, and theoretical innovations of Gensyn, Prime Intellect, Pluralis, and Nous Research in the context of decentralized training pathways, based on the above indicator framework.
In the complete value chain of decentralized training, projects such as Prime Intellect, Pluralis.ai, Gensyn, and Nous Research primarily focus on front-end foundational infrastructure such as model pre-training, communication mechanisms, and collaborative optimization. However, another category of projects specializes in the post-training phase, focusing on model adaptation and inference deployment (post-training fine-tuning & inference delivery). These projects do not directly engage in systematic training processes such as pre-training, parameter synchronization, or communication optimizations. Representative projects include Bagel, Pond, and RPS Labs. They all center on LoRA fine-tuning methods, forming a critical "post-chain" component within the ecosystem of decentralized training.
LoRA (Low-Rank Adaptation) is an efficient parameter fine-tuning method. Its core idea involves inserting low-rank matrices into pre-trained large models to learn new tasks while keeping the original model parameters frozen. This strategy significantly reduces training costs and resource consumption, improves fine-tuning speed, and enhances deployment flexibility, making it especially suitable for Web3 scenarios characterized by modular and composable invocation.
Traditional large language models like LLaMA and GPT-3 often have billions or even hundreds of billions of parameters, making direct fine-tuning prohibitively expensive. However, by training only the inserted low-rank parameter matrices, LoRA enables efficient adaptation of large models, establishing itself as one of the most practical mainstream methods today.
Direct Preference Optimization (DPO), a post-training method for language models that has gained traction in recent years, is often used in conjunction with LoRA fine-tuning mechanisms during the model alignment stage. Compared to traditional RLHF (Reinforcement Learning from Human Feedback) methods, DPO achieves preference learning through direct optimization of paired samples, eliminating the need for complex reward modeling and reinforcement learning processes. Its structure is simpler, convergence is more stable, and it is particularly suitable for fine-tuning tasks in lightweight or resource-constrained environments. Due to its efficiency and ease of use, DPO is gradually becoming the preferred solution for many decentralized AI projects during the model alignment phase.
From a long-term perspective, an increasing number of projects regard Reinforcement Learning (RL) as a core pathway with greater adaptability and evolutionary potential in decentralized training. Compared to supervised learning or parameter fine-tuning mechanisms that rely on static data, RL emphasizes continuously optimizing strategies in dynamic environments. This makes it inherently compatible with the asynchronous, heterogeneous, and incentive-driven collaborative patterns of Web3 networks. Through sustained interaction with the environment, RL enables highly personalized and incremental learning processes, laying the foundation for "behavior intelligence" infrastructure in Agent networks, on-chain task markets, and smart economic systems.
This paradigm not only aligns philosophically with the decentralized ethos but also offers significant system advantages. However, due to high engineering complexity and challenging orchestration mechanisms, RL's implementation faces substantial hurdles at this stage, making widespread adoption difficult in the short term.
Notably, projects like Prime Intellect's PRIME-RL and Gensyn's RL Swarm are advancing RL from a post-training fine-tuning mechanism towards being a core pre-training architecture. They aim to build a collaborative training framework centered around RL, one that requires no trust-based coordination.
Bagel, based on the LoRA fine-tuning mechanism, introduces zero-knowledge proof (ZK) technology, aiming to solve the challenges of trust and privacy protection in the "on-chain model fine-tuning" process. zkLoRA does not participate in actual training computations; instead, it provides a lightweight and verifiable mechanism that enables external users to confirm that a fine-tuned model indeed originates from a specified base model and LoRA parameters, without accessing the original data or weights.
Unlike Gensyn's Verde or Prime Intellect's TOPLOC, which focus on dynamic verifications of "whether training activity genuinely occurred," Bagel emphasizes static verification of "whether fine-tuned results are trustworthy." The biggest advantages of zkLoRA are its low resource consumption for verification and strong privacy protection. However, its application scope is typically limited to fine-tuning tasks with minimal parameter changes.
Pond is currently the only decentralized training project in the industry that focuses on fine-tuning graph neural networks (GNNs), serving structured data applications such as knowledge graphs, social networks, and transaction graphs. It allows users to upload graph-structured data and participate in model training feedback, offering a lightweight and controllable platform for personalized training and inference tasks.
Like Bagel, Pond also adopts efficient fine-tuning mechanisms such as LoRA. Its core objective is to establish modular, deployable intelligent agent systems on GNN architectures, pioneering a new exploration path in the decentralized context for "small model fine-tuning + multi-agent collaboration."
RPS Labs is a decentralized training project built on Transformer architecture, dedicated to applying fine-tuned AI models to DeFi liquidity management, primarily deployed within the Solana ecosystem. Its flagship product, UltraLiquid, is a proactive market-making engine that dynamically adjusts liquidity parameters using fine-tuned models. This reduces slippage, enhances depth, and optimizes the token issuance and trading experience.
Additionally, RPS has launched the UltraLP tool, which enables liquidity providers to optimize their capital allocation strategies on DEXs in real-time. This enhances capital efficiency and reduces the risks of impermanent loss, showcasing the practical value of AI fine-tuning in financial scenarios.
In the complete ecosystem map of decentralized training, the whole process can be divided into two major phases: the pre-chain engine, corresponding to the model pre-training stage, and the post-chain ecosystem, corresponding to the model fine-tuning and deployment stage. Together, these form a complete closed loop from infrastructure to application delivery.
The pre-chain engine focuses on building the foundational protocols for model pre-training. It is represented by projects such as Prime Intellect, Nous Research, Pluralis.ai, and Gensyn. They are committed to developing system architectures that enable asynchronous updates, sparse communication, and verifiable training. These efforts aim to achieve efficient and reliable distributed training capabilities in a trustless network environment, establishing the technical foundation for decentralized training.
Simultaneously, Flock, acting as an intermediary layer, applies federated learning approaches, integrating mechanisms such as model aggregation, on-chain verification, and multi-party incentives. By bridging the gap between training and deployment, Flock provides a practical paradigm for collaborative learning across multiple nodes.
The post-chain ecosystem focuses on model fine-tuning and application-layer deployment. Projects like Pond, Bagel, and RPS Labs revolve around the LoRA fine-tuning methodology: Bagel offers an on-chain verifiable trust mechanism; Pond specializes in evolving small-scale models for graph neural networks; and RPS applies fine-tuned models to DeFi scenarios for intelligent market-making. Through components like inference APIs and Agent SDKs, they provide low-barrier, composable model invocation and personalized customization solutions for developers and end-users, serving as critical entry points for decentralized AI adoption.
We believe that decentralized training is not only a natural extension of the blockchain ethos into the AI era but also a foundational prototype for a globally collaborative intelligent productivity system. In the future, when we look back on this challenging journey, we will still be inspired by this guiding principle: Decentralization is not merely a means; it is a value in itself.
Original Link
Welcome to join the official BlockBeats community:
Telegram Subscription Group: https://t.me/theblockbeats
Telegram Discussion Group: https://t.me/BlockBeats_App
Official Twitter Account: https://twitter.com/BlockBeatsAsia
免责声明:投资有风险,本文并非投资建议,以上内容不应被视为任何金融产品的购买或出售要约、建议或邀请,作者或其他用户的任何相关讨论、评论或帖子也不应被视为此类内容。本文仅供一般参考,不考虑您的个人投资目标、财务状况或需求。TTM对信息的准确性和完整性不承担任何责任或保证,投资者应自行研究并在投资前寻求专业建议。