The Integration of AI and Web3: Opening a New Era of Data, Computing Power, and Innovation Landscape

2025-08-01 08:17:33

AI+Web3: Towers and Squares

TL;DR

Web3 projects with AI concepts have become targets for capital attraction in both primary and secondary markets.
The opportunities of Web3 in the AI industry are manifested in: using distributed incentives to coordinate potential supply in the long tail ------ across data, storage, and computation; at the same time, establishing an open-source model and a decentralized market for AI Agents.
AI's main application in the Web3 industry is on-chain finance (crypto payments, trading, data analysis) and assisting development.
The utility of AI+Web3 is reflected in the complementarity of the two: Web3 is expected to counteract the centralization of AI, while AI is expected to help Web3 break boundaries.

Introduction

In the past two years, the development of AI has been like pressing the fast forward button. This butterfly effect triggered by Chatgpt has not only opened up a new world of generative artificial intelligence but has also stirred up a wave in the realm of Web3 on the other side.

With the support of AI concepts, there is a noticeable boost in financing in the slowing cryptocurrency market. Media statistics show that in the first half of 2024, a total of 64 Web3+AI projects completed financing, with the AI-based operating system Zyber365 achieving a maximum financing amount of 100 million USD in its Series A.

The secondary market is even more prosperous. Data from crypto aggregation websites shows that in just over a year, the total market value of the AI sector has reached $48.5 billion, with a 24-hour trading volume close to $8.6 billion. The benefits brought by advancements in mainstream AI technology are evident; after the release of a company's Sora text-to-video model, the average price of the AI sector rose by 151%. The AI effect is also spreading to one of the cryptocurrency fundraising sectors, Meme: the first AI Agent concept MemeCoin------GOAT has quickly gained popularity and achieved a valuation of $1.4 billion, successfully sparking an AI Meme craze.

The research and topics related to AI+Web3 are equally hot, ranging from AI+Depin to AI Memecoin, and now to AI Agent and AI DAO. The FOMO sentiment has already failed to keep up with the speed of the new narrative rotation.

AI+Web3, this combination of terms filled with hot money, trends, and future fantasies, is inevitably seen as a marriage arranged by capital. It seems difficult for us to discern whether beneath this glamorous robe lies the playground of speculators or the eve of an explosive dawn?

To answer this question, a key consideration for both parties is whether it will become better with the other. Can benefits be gained from the other's patterns? In this article, we also attempt to examine this pattern from the shoulders of predecessors: How can Web3 play a role in various aspects of the AI technology stack, and what new vitality can AI bring to Web3?

Part.1 What opportunities does Web3 have under the AI stack?

Before diving into this topic, we need to understand the technology stack of AI large models:

To express the entire process in simpler terms: the "large model" is like the human brain. In the early stages, this brain belongs to a newborn baby who needs to observe and absorb a vast amount of information from the surrounding world to understand it. This is the "collection" phase of data. Since computers do not possess the multiple senses like human vision and hearing, before training, the large-scale unlabelled information from the outside world needs to be transformed through "preprocessing" into a format that computers can understand and use.

After inputting data, the AI constructs a model with understanding and prediction capabilities through "training," which can be seen as the process of a baby gradually understanding and learning about the outside world. The parameters of the model are like the language skills that the baby continuously adjusts during the learning process. When the content of learning begins to specialize or when feedback is received from communication with others and corrections are made, it enters the "fine-tuning" stage of the large model.

As children gradually grow up and learn to speak, they can understand meanings and express their feelings and thoughts in new conversations. This stage is similar to the "reasoning" of large AI models, which can predict and analyze new language and text inputs. Infants express feelings, describe objects, and solve various problems through language ability, which is also similar to how large AI models are applied to various specific tasks during the reasoning phase after being trained and put into use, such as image classification and speech recognition.

The AI Agent is moving closer to the next form of large models - capable of independently executing tasks and pursuing complex goals, possessing not only the ability to think but also to remember, plan, and interact with the world using tools.

Currently, in response to the pain points of AI across various stacks, Web3 has initially formed a multi-layered and interconnected ecosystem that covers all stages of the AI model process.

1. Basic Layer: Computing Power and Data's Airbnb

Hash Rate

Currently, one of the highest costs of AI is the computing power and energy required for training and inference models.

One example is that a company's LLAMA3 requires 16,000 H100 GPUs produced by a certain company (this is a top-of-the-line graphics processing unit designed for artificial intelligence and high-performance computing workloads) to complete training in 30 days. The unit price of the latter's 80GB version ranges from $30,000 to $40,000, which necessitates an investment of $400-700 million in computing hardware (GPU + network chips). Meanwhile, monthly training consumes 1.6 billion kilowatt-hours, with energy expenses nearing $20 million each month.

The relaxation of AI computing power is precisely the area where Web3 first intersects with AI------DePin (Decentralized Physical Infrastructure Network). Currently, a data website has listed more than 1,400 projects, among which GPU computing power sharing represents multiple projects.

The main logic is that the platform allows individuals or entities with idle GPU resources to contribute their computing power in a permissionless decentralized manner. By creating an online marketplace similar to that of certain companies' buyers and sellers, it improves the utilization of underused GPU resources, and end-users thus gain access to more cost-effective and efficient computing resources. At the same time, the staking mechanism ensures that if there is a violation of the quality control mechanism or a disruption of the network, resource providers will face corresponding penalties.

Its characteristics are:

Aggregating idle GPU resources: The suppliers are mainly small and medium-sized independent data centers, surplus computing power resources from operators such as cryptocurrency mining farms, and mining hardware that uses the PoS consensus mechanism, such as certain project mining machines. Currently, there are also projects dedicated to initiating equipment with lower entry barriers, such as certain projects utilizing specific devices to establish computing power networks for running large model inference.
Facing the long-tail market of AI computing power:

a. "From a technical perspective," a decentralized computing power market is more suitable for inference steps. Training relies more on the data processing capabilities brought by large-scale GPU clusters, while inference has relatively lower demands on GPU computing performance, such as certain projects focusing on low-latency rendering work and AI inference applications.

b. "From the demand side perspective," small and medium computing power demanders will not train their own large models separately, but will only choose to optimize and fine-tune around a few leading large models, and these scenarios are naturally suitable for distributed idle computing power resources.

Decentralized Ownership: The technological significance of blockchain lies in the fact that resource owners always retain control over their resources, adjusting flexibly according to demand while also generating profits.

Data

Data is the foundation of AI. Without data, computation is as useless as floating duckweed, and the relationship between data and models is like the saying "Garbage in, Garbage out"; the quantity of data and the quality of input determine the final output quality of the model. For the training of current AI models, data determines the model's language ability, understanding ability, and even values and human-like performance. Currently, the data demand dilemma of AI mainly focuses on the following four aspects:

Data hunger: AI model training relies on a large amount of data input. Public information shows that a certain company trained a model with a parameter count reaching trillion-level.
Data Quality: With the integration of AI and various industries, the timeliness of data, diversity of data, professionalism of vertical data, and the incorporation of emerging data sources such as social media sentiment have raised new requirements for its quality.
Privacy and compliance issues: Currently, countries and companies are gradually recognizing the importance of high-quality datasets and are imposing restrictions on dataset scraping.
High data processing costs: large data volume and complex processing. Public information shows that over 30% of AI companies' R&D costs are used for basic data collection and processing.

Currently, web3 solutions are reflected in the following four aspects:

Data Collection: The availability of real-world data that can be scraped for free is rapidly diminishing, and the spending of AI companies on data is increasing year by year. However, this spending has not been reflected back to the true contributors of the data, as platforms fully enjoy the value creation brought by the data. For example, a certain platform achieved a total revenue of $203 million through data licensing agreements with AI companies.

The vision of Web3 is to allow users who truly contribute to also participate in the value creation brought about by data, and to obtain more private and valuable data from users in a cost-effective manner through distributed networks and incentive mechanisms.

A certain project is a decentralized data layer and network where users can run nodes to contribute idle bandwidth and relay traffic to capture real-time data from the entire internet and receive token rewards;
A certain project has introduced a unique Data Liquidity Pool (DLP) concept, where users can upload their private data (such as shopping records, browsing habits, social media activities, etc.) to a specific DLP and flexibly choose whether to authorize specific third parties to use this data;
In a certain project, users can use certain tags on a specific platform and @ a specific account to achieve data collection.

Data Preprocessing: In the data processing of AI, the collected data is usually noisy and contains errors, so it must be cleaned and converted into a usable format before training the model, involving repetitive tasks such as normalization, filtering, and handling missing values. This stage is one of the few manual steps in the AI industry, which has given rise to the profession of data annotators. As the model's requirements for data quality increase, the threshold for data annotators has also risen, and this task is naturally suited to the decentralized incentive mechanisms of Web3.

Currently, some projects are considering incorporating data labeling as a key step.
A certain project has proposed the concept of "Train2earn", emphasizing data quality, where users can earn rewards by providing labeled data, annotations, or other forms of input.
A certain data labeling project gamifies the labeling tasks and allows users to stake points to earn more points.

Data Privacy and Security: It is important to clarify that data privacy and security are two distinct concepts. Data privacy involves the handling of sensitive data, while data security protects data information from unauthorized access, destruction, and theft. Thus, the advantages of Web3 privacy technology and its potential application scenarios are reflected in two aspects: (1) Training on sensitive data; (2) Data collaboration: Multiple data owners can collaboratively participate in AI training without having to share their raw data.

The currently common privacy technologies in Web3 include:

Trusted Execution Environment ( TEE ), for example, a certain project;
Fully Homomorphic Encryption (FHE), for example, certain projects;
Zero-knowledge technology (zk), such as a certain project using zkTLS technology, generates zero-knowledge proofs of HTTPS traffic, allowing users to securely import activity, reputation, and identity data from external websites without exposing sensitive information.

However, the field is still in its early stages, and most projects are still in exploration. One current dilemma is that the computing costs are too high, some examples are:

A certain framework takes about 80 minutes to generate a proof for a 1M-nanoGPT model.
According to data from a certain company, the overhead of zkML is more than 1000 times higher than pure computation.

Data Storage: After obtaining the data, a location is also needed.

AGENT4.71%

MEME5.55%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

15 Likes