The Integration of AI and Crypto Assets: A Comprehensive Analysis of the Deep Learning Industry Chain

2025-07-04 19:59:34

AI x Crypto: From Zero to Peak

Introduction

Recent developments in the artificial intelligence industry are viewed by some as the fourth industrial revolution. The emergence of large models has significantly improved the efficiency across various sectors, with studies suggesting that GPT has enhanced work efficiency in the U.S. by about 20%. Meanwhile, the generalization capabilities brought by large models are considered a new paradigm in software design. Unlike the precise code design of the past, modern software design increasingly integrates generalized large model frameworks, which allows these software to perform better and support a wider range of modal inputs and outputs. Deep learning technology has indeed ushered in a new wave of prosperity for the AI industry, and this trend has also extended into the cryptocurrency sector.

This report will detail the development history of the AI industry, the classification of technologies, and the impact of deep learning technology on the industry. It will then conduct an in-depth analysis of the upstream and downstream of the industry chain in deep learning, including GPUs, cloud computing, data sources, edge devices, as well as its current status and trends. After that, we will explore in detail the relationship between cryptocurrency and the AI industry and outline the structure of the AI industry chain related to cryptocurrency.

The Development History of the AI Industry

The AI industry started in the 1950s, and in order to realize the vision of artificial intelligence, academia and industry have developed many schools of thought to achieve artificial intelligence under different historical periods and diverse academic backgrounds.

Modern artificial intelligence technology primarily uses the term "machine learning." The concept of this technology is to enable machines to iteratively improve system performance on tasks based on data. The main steps involve sending data to algorithms, using this data to train models, testing and deploying models, and using the models to perform automated prediction tasks.

Currently, there are three main schools of thought in machine learning: connectionism, symbolism, and behaviorism, which respectively mimic the human nervous system, thinking, and behavior.

Currently, connectionism represented by neural networks is dominant (, also known as deep learning ). The main reason is that this architecture has an input layer, an output layer, but multiple hidden layers. Once the number of layers and neurons ( parameters ) becomes sufficiently large, there is enough opportunity to fit complex general tasks. By inputting data, the parameters of the neurons can continually be adjusted, and after experiencing multiple data inputs, the neuron will reach an optimal state ( parameters ). This is also referred to as "great effort leads to miracles," which is where the term "deep" comes from—enough layers and neurons.

The deep learning technology based on neural networks has gone through multiple iterations and evolutions, such as the earliest neural networks, feedforward neural networks, RNNs, CNNs, and GANs, finally evolving into modern large models like GPT that use Transformer technology. The Transformer technology is just one direction of evolution for neural networks, adding a converter ( Transformer ), which is used to encode data from all modalities ( such as audio, video, images, etc. ) into corresponding numerical values to represent them. This data is then input into the neural network, allowing the neural network to fit any type of data, thus achieving multimodality.

The development of AI has gone through three technological waves. The first wave occurred in the 1960s, a decade after AI technology was proposed. This wave was triggered by the development of symbolic technology, which addressed issues of general natural language processing and human-computer dialogue. During the same period, expert systems were born, notably the DENRAL expert system completed under the supervision of Stanford University and NASA in the United States. This system possesses very strong chemical knowledge and infers answers similar to those of a chemistry expert through questioning. This chemical expert system can be seen as a combination of a chemical knowledge base and an inference system.

After expert systems, in the 1990s, Israeli-American scientist and philosopher Judea Pearl ( proposed Bayesian networks, which are also known as belief networks. During the same period, Brooks introduced behavior-based robotics, marking the birth of behaviorism.

In 1997, IBM's Deep Blue defeated chess champion Garry Kasparov 3.5:2.5, and this victory was seen as a milestone for artificial intelligence, marking the peak of the second wave of AI development.

The third wave of AI technology occurred in 2006. The three giants of deep learning, Yann LeCun, Geoffrey Hinton, and Yoshua Bengio, introduced the concept of deep learning, an algorithm based on artificial neural networks for representation learning of data. Subsequently, deep learning algorithms gradually evolved, from RNN and GAN to Transformer and Stable Diffusion, which together shaped this third technological wave, marking the peak of connectionism.

Many iconic events have gradually emerged along with the exploration and evolution of deep learning technology, including:

In 2011, IBM's Watson) defeated humans and won the championship in the quiz show Jeopardy(.
In 2014, Goodfellow proposed GAN), Generative Adversarial Network(, which learns by having two neural networks compete against each other, capable of generating photos that are indistinguishable from real ones. At the same time, Goodfellow also wrote a book titled "Deep Learning", known as the "flower book", which is one of the important introductory books in the field of deep learning.
In 2015, Hinton and others proposed a deep learning algorithm in the journal "Nature". The introduction of this deep learning method immediately caused a huge response in both the academic and industrial sectors.
In 2015, OpenAI was founded, and several prominent individuals announced a joint investment of $1 billion.
In 2016, AlphaGo, based on deep learning technology, competed against the world champion and professional 9-dan Go player Lee Sedol in a man-machine Go showdown, winning with a total score of 4 to 1.
In 2017, the Hong Kong-based company Hanson Robotics ) developed the humanoid robot Sophia, which is referred to as the first robot in history to be granted citizenship, possessing a rich array of facial expressions and human language understanding capabilities.
In 2017, Google, which has a wealth of talent and technical reserves in the field of artificial intelligence, published the paper "Attention is all you need" proposing the Transformer algorithm, and large-scale language models began to emerge.
In 2018, OpenAI released the GPT( Generative Pre-trained Transformer) built on the Transformer algorithm, which was one of the largest language models at that time.
In 2018, the Google team DeepMind released AlphaGo based on deep learning, which is capable of predicting protein structures and is regarded as a significant milestone in the field of artificial intelligence.
In 2019, OpenAI released GPT-2, which has 1.5 billion parameters.
In 2020, OpenAI developed GPT-3, which has 175 billion parameters, 100 times more than the previous version GPT-2. The model was trained using 570GB of text and can achieve state-of-the-art performance in multiple NLP( natural language processing) tasks( such as answering questions, translation, and article writing).
In 2021, OpenAI released GPT-4, which has 1.76 trillion parameters, making it 10 times more than GPT-3.
The ChatGPT application based on the GPT-4 model was launched in January 2023, and by March, ChatGPT reached one hundred million users, becoming the fastest application in history to reach one hundred million users.
In 2024, OpenAI will launch GPT-4 omni.

Deep Learning Industry Chain

Currently, the large language models are all based on deep learning methods using neural networks. Led by GPT, these large models have created a wave of artificial intelligence enthusiasm, with a large number of players entering this field. We have also noticed a significant surge in the market's demand for data and computing power. Therefore, in this part of the report, we mainly explore the industrial chain of deep learning algorithms, how the upstream and downstream are composed in the AI industry dominated by deep learning algorithms, and what the current situation and supply-demand relationship of the upstream and downstream are, as well as their future development.

First, we need to clarify that when training large models led by GPT based on Transformer technology, (, it is divided into three steps.

Before training, because it is based on Transformer, the converter needs to convert text input into numerical values, a process called "Tokenization". After that, these numerical values are referred to as Tokens. Under general rules of thumb, an English word or character can be roughly viewed as one Token, while each Chinese character can be roughly viewed as two Tokens. This is also the basic unit used for GPT pricing.

Step one, pre-training. By providing enough data pairs to the input layer, similar to the examples given in the first part of the report, such as )X, Y(, to find the optimal parameters for each neuron in the model, a large amount of data is needed at this stage, and this process is also the most computationally intensive, as neurons need to iteratively try various parameters. After a batch of data pairs is trained, a second training session is generally conducted using the same batch of data to iterate the parameters.

Step two, fine-tuning. Fine-tuning involves training on a smaller batch of very high-quality data, which will lead to higher quality output from the model. Pre-training requires a large amount of data, but much of it may contain errors or be of low quality. The fine-tuning step can enhance the model's quality through the use of high-quality data.

Step three, reinforcement learning. First, a brand new model will be established, which we call the "reward model". The purpose of this model is very simple, which is to rank the output results, so implementing this model will be relatively straightforward, as the business scenario is quite vertical. Then, this model is used to determine whether the output of our large model is of high quality, allowing us to use a reward model to automatically iterate the parameters of the large model. ) However, sometimes human involvement is also needed to assess the output quality of the model (.

In short, during the training process of large models, pre-training has a very high requirement for the amount of data, and the GPU computing power needed is also the highest. Fine-tuning requires higher quality data to improve parameters, and reinforcement learning can iteratively adjust parameters through a reward model to output higher quality results.

During the training process, the more parameters there are, the higher the ceiling of its generalization ability. For example, in the function example Y = aX + b, there are actually two neurons, X and X0. Therefore, how the parameters change can only fit a limited amount of data, because it essentially remains a straight line. If there are more neurons, more parameters can be iterated, allowing for the fitting of more data. This is why large models can achieve miraculous results, and it is also why they are commonly referred to as large models. Essentially, it involves a massive number of neurons and parameters, a large amount of data, and simultaneously requires immense computational power.

Therefore, the performance of large models is mainly determined by three aspects: the number of parameters, the amount and quality of data, and computing power. These three factors jointly affect the quality of the results and the generalization ability of large models. We assume the number of parameters is p, and the amount of data is n) calculated in terms of the number of Tokens(. Then we can estimate the required computing power needed and the training time based on general empirical rules.

Computing power is generally measured in Flops, representing a single floating-point operation. Floating-point operations refer to the general addition, subtraction, multiplication, and division of non-integer numerical values, such as 2.5 + 3.557. Floating-point indicates the ability to have decimal points, while FP16 represents precision that supports decimals, and FP32 is a more commonly used precision. According to empirical rules in practice, pre-training )Pre-traning( a single ) generally requires multiple training sessions ( for large models, which approximately needs 6np Flops, with 6 being referred to as the industry constant. Inference ) is the process where we input data and wait for the output from the large model (, divided into two parts: inputting n tokens and outputting n tokens, which would then require about 2np Flops in total.

In the early days, CPU chips were used for training to provide computational power support, but gradually GPUs began to replace them, such as A100 and H100 chips from certain companies. This is because CPUs exist as general-purpose computing, whereas GPUs can serve as dedicated computing, far surpassing CPUs in terms of energy efficiency. GPUs primarily perform floating-point operations through a module called Tensor Core. Therefore, general chips have Flops data under FP16 / FP32 precision.

GPT2.52%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

13 Likes