800G Optical Transceiver Market Analysis

How much 800G DSP share does Marvell own?

In the past, for PAM4 DSPS, that is, single-channel from 50G, 56G all the way up to 100G, Marvell’s market share was at least 60-70%. If calculated by shipments, the market share can reach 80%.

800G was first demanded only by Google because Google had already planned to upgrade its data center including switches from 400G to 800G two years ago. And Google’s 800G basically reached the mass production level by the end of last year, including the use of Broadcom’s TH 5 switch chips, and the optical modules are using Marvel DSPs Google DSP chips are mainly used in 800G optical transceiver, no matter whether they are used in the past for 200G, 400G or now. Now 800G modules, are mainly used lnphi chip program.

What will be the demand for 800G optical transceivers in 2024?

According to a relatively conservative algorithm, if the demand is 5 million, Google may need to reach a quantity of at least 2-3 million. If the demand for AI continues to heat up, the ratio between Google and NVIDIA should be around 4:6, with Google occupying 4 and NVIDIA-related products occupying 6. As for NVIDIA-related products, Marvell currently has not predicted demand based on the guidance of companies such as Amazon, Meta, or Microsoft, so they mainly rely on NVIDIA for predictions.

Next year, there should be a large system called GH200, which integrates GPUs like A100 and H100 and also connects them through Mellanox switches. In the future, there will be some new proportions in the sales of A100, H100, the production time of the new system DGX GH 200, and the overall sales situation. Currently, the ConnectX-7 network card is 400G, but an 800G network card, ConnectX-8, will be available next year. Currently, A100 and H100 still use some 400G optical modules, and some use 800G modules. However, in the future, the trend may be toward 800G, so there may be some different algorithms or predictions for the demand for these parts.

800G

Can the demand for DSP this year and next year be broken down?

Marvell’s revenue for the year should be around $6 billion. DSP-related business should be around $1.4-1.5 billion. This includes modules related to 800G, 400G, 200G, and 100G, as well as another type called coherent DSP. This type of DSP chip is used for applications that require transmission over hundreds to thousands of kilometers, such as between data centers or for telecom operators like Huawei. There is also a customized module business for Microsoft, so the total revenue for these categories should be around $1.4-1.5 billion. If it is only PAM4 DSP, it may be around $700-800 million.

Did AWS add 600,000 400G single-mode fibers for AI data centers or for long-distance optical modules?

This is an old PAM4 DSP module because, in the past, Google’s leading suppliers of 400G were FiberMall and Cloud Light. Google’s demand for 400G should be relatively high, with an annual demand of around 2 million pieces. AWS’s 400G is 400G DR4. AWS’s demand will be higher than Google’s, with over 3 million pieces per year.

After the demand slowed down in the second half of last year, some of the capital investment of these large North American data center customers would be canceled or suspended. Therefore, at that time, the demand for 400G DR4 modules from Amazon decreased, resulting in a higher inventory of 400G DSP modules. Recently, their share in the 400G category may have rebounded, or Amazon’s demand has increased. The pulling force from the second half of last year to the first half of this year was relatively weak because the situation of their inventory is still ongoing. The inventory level of Amazon in the 400G category in the second half of the year may decrease.

What is the situation of major data center customers such as Google, Microsoft, Meta, and Amazon?

In the past, Google used a 400G environment, and the data center architecture used 400G, which was an 8×50 electrical port converted to an 8×50 optical port, requiring an 8-in-8-out DSP chip.  Amazon’s 400G uses an 8×50 electrical port that is converted into a 4×100 optical port using a DSP chip. Therefore, the 400G used by Google and Amazon uses different types of chips.  Comparing Microsoft and Meta, they should still mainly use 100G in their data center. Initially, Meta wanted to increase to 200G or 400G, but currently, the main demand is in the 200G area.  I heard that they have more than one million demands per year. Microsoft originally planned to upgrade the data center to 400G last year, but later due to various internal considerations and the slowdown in market demand, it did not start the firmware or upgrade work for 400G. There is a saying that he may upgrade to 400G in the second half of this year. However, there is also a saying that he may reconsider because of AI factors and may go to 800G. So basically, before NVIDIA’s AI-related demand came, Google was the fastest in the numerical environment and was the earliest to move towards 800G, then Amazon was 400G, Meta was 200G, and Microsoft wanted to move towards 400G but has not yet started to build the infrastructure. So this is a plan for the existing data center.

Now, because of AI, Microsoft, Meta, and even Amazon will make some accelerated upgrades to the components in the data center. It seems that Amazon will still continue to use 400G. If there is a need for AI-related applications that require 800G, they will upgrade to 800G, but 400G is still their main focus.

Is the demand upgrade for long or short distances?

The distance within the data center is considered short. The distance between the switch and the server storage is usually no more than three meters, and most of them use AOC interface. The DR module is used between Spine-Lea, mainly single-mode fiber, usually between 100 meters and 500 meters, and sometimes it may reach one or two kilometers. In addition, some super-large data centers will also use FR modules, which can reach up to one or two kilometers in distance. Therefore, there are at least these three modules in the data center: AOC, DR, and FR.

leaf-spine network

Will the servers required for AI inference use 800G modules?

I think if it is an inference, there may also be a possibility of using 800G modules in the future. But the current situation is that AI servers generally use network cards. The highest-speed network card of Mellanox, which was acquired by NVIDIA, should also only have 400G, and other high-speed network card manufacturers such as Broadcom or Intel can currently only achieve up to 400G network cards. Therefore, AI servers currently mainly use 400G network cards, so they can only connect to 400G modules. Next year, there should be some 800G network cards available, and then the possibility of using 800G optical transceivers will increase. In addition, the wiring method inside NVIDIA DGX GH200 also needs to be considered. The connection interface between the CPU or GPU inside is made of copper NVLink. Some of them also need to be connected to the Mellanox switch, and currently mainly use 400G or 800G AOC modules. If the system is limited to the data center in the future, the external interface may also use 800G modules.

Can Marvell’s network switch chips enter NVIDIA’s system?

I think it is unlikely that Marvell’s switch chips can enter NVIDIA’s system. Currently, in traditional data centers, the main market share is Broadcom’s Tomahawk series, Tomahawk 3, Tomahawk 4, and Tomahawk 5. Most of the data centers, such as Google, use TH 5 chips for their 800G switches. Marvel previously acquired a company called lnnovium, which has 12.8T, or 400G switch chips. The main customer that adopted them is Amazon. After Marvel completed the acquisition, they provided some 400G switch chips to Amazon, which could maintain a revenue of 100 million a year. Marvel is now developing a new 51.2T switch chip, which will be available by the end of the year and maybe mass-produced next year if it goes well. This is a chip that competes with Broadcom’s Tomahawk 5 level. If there are no major problems, Amazon should be able to take a certain share, and other companies can try to take some market share from Broadcom.

However, in the NVIDIA DGXGH200 system, neither Marvell’s nor Broadcom’s switch chips have any chance to get in, because it is a total solution that integrates Mellanox’s Spectrum series of switch chips with the whole system architecture.

As for the DSP business, this year’s revenue is $1.4-$1.5 billion, and $200 million comes from 800G products. Coherent DSP business accounts for about $300 million. Another part is the customized module for Microsoft, which uses Marvell’s own DSP inside. This module is counted as a separate business unit because it has a higher price and lower volume than other products. The customized module is used for connecting with data centers over a distance of up to 80-100 kilometers. This part may contribute more than $200 million in revenue as well. These two parts add up to about $500-$600 million in revenue.

We can categorize DSP products into three groups based on their functions: PAM4 DSP are mainly used in data centers and have different speeds ranging from 100G to 800G; customized modules for Microsoft are also based on PAM4 DSPs but have a different form factor and application scenario; Coherent DSPs are mainly used for long-distance transmission such as telecom networks and have speeds from 100G to 400G.

Forecast of the situation in 2024

Next year, the revenue of 400G PAM4 DSP may remain stable or slightly decline, because the shipment volume will not have a significant growth, and Google will shift to 800G demand, resulting in a slowdown of 400G demand. Therefore, 800G optical transceiver is the main growth driver, and if this year’s revenue is $200 million, it will reach $400 million next year. Other DSP businesses will have a slight growth, probably between 10-20%. The customized module for Microsoft and Coherent DSP business may have an average growth rate of about 20%, equivalent to an increase of $100-200 million. Therefore, if this year’s total revenue is $1.4 billion, it will increase by $300-400 million next year, and the total revenue will be $1.7-$1.8 billion.

sales forecast

Considering the popularity of LPO solutions in the future, how will the market share change in the next 2-3 years?

Currently, the smaller DSP players are Credo and Maxlinear. I think that in the next two or three years, Marvell will still be the mainstream DSP supplier, and its share will remain at a high level. Broadcom’s DSP basically has no big problems, but Google basically does not use it, and Amazon, Microsoft, and Meta do not have a special preference for their products. Google and Marvell have very close cooperation and much information will not be shared with Broadcom, so basically, Marvell and Google’s business will be relatively stable in the next few years. Amazon, Microsoft, and Meta will start later than Google. They talk to Marvell about some technical specifications, but they rely more on these module manufacturers. They have a better understanding of the technical or production schedule, but when it comes to demand, they mainly talk to FiberMall or Coherent about the price and demand of the whole module. If FiberMall or Coherent’s newly developed 400G or 800G optical transceivers use Marvell’s DSP solution, they will have a hard time switching to other DSPs. So FiberMall and Coherent’s priority is definitely above Marvell. I think it is very difficult for Maxlinear and Credo to do better than Broadcom or Marvell because they have very few resources of their own. Credo had some cooperation with Microsoft on AEC before, but Microsoft did not really adopt Credo’s solution, resulting in a sharp decline in Credo’s revenue in the first quarter. At that time, the stock price fell from nearly $20 to $8-$10. Recently, they have slowly risen, because they heard that they are discussing some new AEC solutions with Meta and Amazon. If they have some opportunities, they may still lean towards non-traditional optical modules. It is possible that some customers want to do non-Marvell solutions and look at more second sources as a backup.

How much revenue can an ASIC project provide?

Generally speaking, it can reach a scale of several million dollars. There are also larger projects that may last for 3-5 years, and the revenue may exceed 100 million dollars. Is the main reason why TSMC sees a very fast growth of ASIC chip tape-out volume next year because some projects did not reach the mass production stage this year, but they will reach the mass production stage next year? Yes, that is part of the reason. Another part of the reason is that there were fewer large chips for AI in the past. In the past few years, more customers made chips for network cards or SSDs, etc., which were not so large or complex and did not need to use CoWoS packaging technology. Next year, there will be more business for high-end chips related to AI and relatively more packaging technology of CoWoS will be used.

What is the gross margin of ASIC business?

I heard that this part has a lower gross margin, which is lower than Marvell’s average gross margin, which is roughly between 60%-65%. To calculate the gross margin of ASIC, we need to consider how much manpower is used, how many years it takes, and other factors. Actually, the cost structure of ASIC chips is also clear to customers. The gross margin of ASIC is about 50%.

Among the 40 projects this year and 50 projects next year, how many are related to AI?

Probably no more than 5. Making customized chips, just like Marvell’s own internal chip development, sometimes takes a long time. Sometimes customers also change the specifications or some modules, and some core technologies are developed by customers, so both sides have to cooperate with each other, but the chip development time will be extended to 1-2 years.

How do you see the growth of Enterprise Networking, Auto next year?

Enterprise’s revenue is roughly around $1 billion, and a quarter should be $300-$400 million. In principle, the whole market does not have much room for growth, but it may be slightly better, higher than the average level. This is because Broadcom did not invest too much in this area, and they usually charge higher prices, so some customers will choose Marvell. In the Enterprise business, Marvell’s solution is basically on par with Broadcom’s solution, and sometimes customers do not want to give this business to Broadcom. Enterprise is also a stable growth business, but the growth rate is relatively low. Currently, Auto accounts for a relatively low proportion of the company’s revenue, but the growth is good. Most of the major car manufacturers also use Marvell’s solution, which is to provide car Ethernet switch chips. In this field, Marvell does better than Broadcom and Taiwan’s Realtek. Although the overall situation is still good in the future, because of the downturn in the automotive industry from last year to this year, car manufacturers may start to eliminate some products in the next one or two years. Some car manufacturers may not bring any revenue business in the future.

In the long run, do you think LPO will be something that Marvell should worry about?

I think it will not have a big market impact in the next two or three years. The principle of LPO is to remove the DSP. Marvell itself has products of DSP, Driver, and TIA, so Marvell also has a solution to remove the DSP. Marvell has also done a lot of experimental tests in the past to evaluate the technical difficulty and market potential of LPO. Marvell will also use its own technology to compare the performance and pros and cons of LPO. What I have learned so far is that there are not many manufacturers that can demonstrate LPO solutions, there are more theoretical things in the literature, and there are fewer things about physical objects and tests. There was a similar situation before the 400G era. Before 400G, analog solutions were mainly used, and then it entered the era of Marvell or Broadcom’s PAM4 DSP. So I think it will take some time for LPO to become mature and stable. There are still many technical challenges to overcome, both theoretically and physically, and also considering the coordination and compatibility between different manufacturers, I don’t think it will be so fast to say that it can be done.

Leave a Comment

Scroll to Top