Once anything enters the bubble period, it is inevitable that people will worry about when to crash, and the current AI chip has entered the recognized bubble period.
From Diana Nao in the Upper Cambrian in ASPLOS'14 to TPUv3 in Google today, AI chips have achieved tremendous success in only five years. Taking the fast lane of AI computing explosion and clamoring for the end of Moore's Law, Domain Specific Architecture seems to be the only way out.
But when innumerable giants and start-ups design one AI chip after another, we need to answer the question: Do we really need so many AI chips?
With the rapid development of AI chips, one of the unavoidable problems is the exponential improvement of software complexity. Many companies have spent two years or less to build a chip, but find it takes longer to support a wide range of frameworks, keep up with the progress of algorithms, and adapt to various platforms from mobile phones to data centers. When the window period of deployment and mass production is missed, even the chip will soon fall behind.
Unlike the design of general architecture, the design of AI chip architecture needs to take into account both the design and optimization of software. Chip companies often optimistically estimate the cost of software adaptation and optimization and expect to solve all problems through middleware and compilers. In fact, from Intel to Google to Nvidia, a large number of software engineers are being invested in adapting various platforms to manually optimize network performance. For startups, there are many problems with tapeout but delays in delivery.
Essentially, when we begin to tap the potential of chip architecture, the abstraction of software layer becomes more and more difficult, because it has to introduce the model or parameters of the underlying architecture into the upper abstraction. Nowadays, the usual way is to do middleware between the underlying chip architecture and the upper software. However, the cost of developing these middleware is often underestimated. Some time ago, my classmates from a chip startup consulted me. How much manpower and how long does it take to develop an Inference middleware like TensorRT? It wasn't an easy question to answer, so I asked them how much resources they had for the project.
Surprisingly, his boss gave only three or four heads, because they assumed that they already had a low-level compiler and a high-level model transformation tool, so such a middleware for architectural abstraction did not need much effort. I guess this kind of investment should make a well-functioning product, but I don't believe that the final product can achieve the desired performance in practical applications. After all, making chips is not just for running Benchmark like ResNet-50.
It is a long-standing demand of software engineers to write a set of code to run on different platforms. The fragmentation of AI chips with different architectures will greatly discourage their enthusiasm to apply AI in real software products. Unlike previous experience, poor explanatory depth learning can lead to many unexpected defects. For example, a common problem is that a private model can achieve satisfactory results on a local CPU, but its performance is greatly degraded when deployed to a particular device. How to debug these problems, who is responsible for debugging, through what tools to debug, and even debugging engineers can get private models? These questions are difficult to answer.
Fragmentation also shows that proprietary architectures tend to abandon forward compatibility in order to exploit absolute performance. As mentioned above, one end of middleware is fragmented AI software framework, the other end is generation after generation of chip architecture. How to maintain multiple incompatible instruction set architectures at the same time and ensure that every software update can cover all devices? There is no alternative but to invest more manpower. A common argument is that, like current consumer chips, only one short-term (2-3 years) software support is maintained. However, in the common applications of AI chips, such as smart cameras, industrial intelligence, and auto-driving, the life cycle of a chip can be as long as 10 years. It's hard to imagine how large a company needs to provide lasting technical support. If a startup is expected to live for less than two or three years, how can it safely deploy its products to a consumer-oriented production vehicle?
AI chips are just transitional products
From a software engineer's point of view, I personally believe that customized AI processors will only be a transitional product. A unified, programmable and highly concurrent architecture should be the direction we pursue. Looking back over the past two decades, we have witnessed the shrinking market for small computers with dedicated architectures, the development of graphics processors to general vector processors, and even the convergence of platforms for our mobile phones and computers. There is reason to believe that putting resources into customized AI chips is by no means a good investment.