Huawei Dorado 5000 V6 review and test - all flash storage for large enterprises
The Dorado V6 Series is an All-Flash storage solution for large enterprises such as banks, transportation companies, think tanks and data centers. For such a customer, Huawei is ready to offer storage scaling up to 16 controllers, the use of a 100G RDMA bus for connecting disk shelves, PCI Express drives, constant deduplication, predictive analysis of component failure, a triple parity RAID array and up to 20 million IOPS for the entire storage system.
Considering the Dorado 5000 V6 "head" that came to us for testing, I always caught myself thinking that in real conditions an enterprise will install several such devices at once, creating one common fault-tolerant data ecosystem, which can be further expanded as horizontally, and modify, making an upgrade without stopping services, smoothly flowing from the old controller to the new one, because in the world of high speeds and large volumes, even a second idle storage system is unacceptable.
Of course, it is difficult for someone to realize that today a storage system for 1600 drives with 16 controllers is a kind of middle class, and 100G Ethernet Front-End is a given in the supplier's warehouse. Yes, the world is changing, requirements are growing, and storage systems are trying to offer you more: not only more hardware, but also more software and service for the same money. Therefore, in this testing, it is important for you not only the equipment from which Huawei forged its storage system, but also the main functions and options that the company offers to the end customer.
Storage systems based on ARM-processors Kunpeng
I would like to start by looking at the Dorado 5000 V6 architecture. This storage system is based on two 64-core ARM-Kunpeng processors, developed by Huawei itself and introduced to the market last year. We have already covered the Kunpeng 920 series in detail in our article to get an idea of the capabilities of these processors. From a technical point of view, these are solutions with performance level AMD EPYC, and from a political point of view, this is a huge victory for Huawei, which allowed the company to get off the needle of the "x86 platform", because previously the company used mainly Intel processors in its data storage systems. Thus, Huawei can be used where, due to geopolitical risks, companies cannot install x86 processors, or reduce their percentage in the total server park (read the article CPU vendor diversification ). And besides, for a customer with the highest security requirements, Huawei is something like a "one-stop shop": a single vendor responsible for the absence of bookmarks in everything: from the processor to the firmware of the device.
But the political component is far from the main thing, which is why Huawei switched to Kunpeng in its storage systems, note how progressive controller architecture is used here: two 64-core Kunpeng 920 processors allow deduplication to be enabled simultaneously in Inline and Background modes, the ability which in this storage system is enabled by default. Each controller has 128 GB of cache memory, and if necessary, this amount can be doubled to 256 GB. In total, from a platform point of view, each Dorado 5000 V6 head unit is a powerful compute module, comparable in performance to top servers for cloud applications.
By the way, when ordering, it is better to consider immediately the option with the maximum memory size, because Huawei OceanStor Dorado 5000 V6 uses shared cache technology at the head storage level: each of the controllers always participates in disk operations, and the write cache is constantly synchronized between controllers within one head unit and cannot be added together. For older systems, for example Dorado 18000 v6, an operation mode with triple mirroring of the contents of the write cache is possible, which makes it possible to achieve fault tolerance even if 2 controllers fail simultaneously. In addition, there is no traditional binding "LUN-controller" here at all, so the departure of one of the controllers does not affect the speed. Here, each pool is always tied to all controllers, between which the load is constantly balanced. And this is one of the reasons why OceanStor Dorado V6 show crazy reliability: N-1 controllers can take off in the system, and the latter will continue to pull on all disk arrays.
We ran a test measuring the average access time when the controller was turned off, and the results are in the diagram above. Try to guess at what period of time the controller was disabled, and to find out the answer, select the text in brackets: [40-60 seconds].
Of course, Huawei, as a developer of chips for artificial intelligence of the Ascend series (read our Ascend review ), could not but endow their storage systems with advanced functions implemented by the Ascend 310 processor. In particular, the "smart" cache, which is an Ai add-on to traditional memory caching facilities. Yes, this is exactly what you thought: artificial intelligence analyzes your disk access patterns and, based on this, implements mechanisms for enabling and displacing from the cache. True, in the Dorado 5000 V6 series, to use this functionality, you need to purchase and install an AI board, which was not in our test sample.
With regard to OceanStor 5000 V6, we can say that here RAID is not a RAID, and LUNs are not LUNs, because the storage system is based on a proprietary block virtualization technology, which we examined block virtualization in great detail in our review Huawei OceanStor 2800 V3 . As part of this article, we just remind you that instead of real SSD disks, "chunks" are involved in RAID arrays - slicing of disk space from each SSD. Thanks to this design, the storage system is not tied to either the volume or the number of drives, and if, for example, you created a RAID 5 of 8 SSDs, then up to 5 drives can consistently fail in this array: each time the storage system will restore the array using the free space on the disks , leaving the array up and running. The main condition is that during the mass flight, the drives give time for recovery and the data should not be more than there is free space.
By the way, the improvements over the hybrid OceanStor are that LUNs and disk pools (RAID arrays) are created instantly, and in the expansion shelves, the pools are managed by their own processors, so the "head" only gives a command to the shelf: "Repair RAID" and is no longer involved in the process, freeing up processor resources for other tasks.
|Test: RAID 5, 8 SSD x 7.68 Tb Rebuild Speed |
First disk offset
Second disk crash
1 hour 57 minutes
Third disk crash
Returning a disk to an array, rebuild to its original state
For an array that initially has 8 drives of 7.68 TB each, a rebuild within an hour or two is a very good indicator. At the departure of the third SSD, the free space on the pool ran out, the rebuild lasted only 24 minutes, after which the storage system honestly warned that the array would not stand the shutdown of another SSD, so it was decided to test it. Please note that when installing back-up disks, the array is actually not rebuilt: the entire volume becomes available immediately.
This is not the only advantage of block virtualization technology. Huawei has managed to reduce SSD wear by adapting Garbage Collection mechanisms: the controller distributes data across lists of flash drive blocks depending on the frequency of their change. By sorting data by type into frequently changed and infrequently changed, the controller manages to reduce the execution time of garbage collection by slipping the algorithm into exactly those areas where hot spots are usually stored. data. This helps to reduce the Write Amplification (WA) effect by 60% over typical SSD usage in an array. In addition, due to the use of aggregation of I/O operations at the controller level (the so-called Full Stripe is written), it was possible to reduce the number of additional read/write operations inherent in traditional RAID arrays.
Thus, the Raid Penalty effect (about which you can read here ) in Huawei Dorado 5000 V6 is completely leveled. In addition, the storage system has a technology for global SSD wear leveling, and for additional reliability, a RAID-TP array is offered, which can withstand up to 3 disks flying out while maintaining the original volume. So, we can say that increased attention is paid to the issue of SSD viability. At the same time, in our testing for about a week, SSD drives with a volume of 7.68 TB showed wear at the level of 0.1%, which I think is too high for such a volume.
At the beginning of the article, we mentioned that the highest processing power allows for on-the-fly deduplication, but Huawei uses a different, faster scheme. Yes, from ZFS experience, we know that deduplication steals performance even on the fastest CPUs, and in order to ensure maximum speed, some vendors, including Microsoft in the ReFS file system, offer to perform a search process for the same extents in the background or on a schedule. Both are flawed compromises, and it seems Huawei has found a balance.
As each extent is written, the storage system checks its hash against the existing duplicate table. If this element is already present in it (in fact, it is a copy of what has already been written to disk), the actual record is not made, but only the table with duplicates is updated. If the elements are not present in the list of duplicates, they are written to disk, but not explicitly, but in compressed form, and the degree of compression is selected depending on the load on the storage system. Subsequently, in the background, between blocks of new data, duplicates are searched for and eliminated, and the information is entered into the duplicate extent table.
Storage array provided by ELKO (https://www.elko.ru/):
ELKO Group is an international distributor of IT products and household appliances. The Russian representative office of ELKO Group has been operating since 1995. The headquarters of the representative office is located in Moscow, regional offices in St. Petersburg and Krasnoyarsk, logistics centers in Moscow and St. Petersburg. The interests of the group in Russia are represented by ELKO Russia (IT products) and Trading House "Absolute" (household appliances). More than 500 employees of the representative office provide work with more than 10,000 clients and partners of the company.
Unlike the same ZFS or ReFS, where the length of the extents during deduplication is fixed, Huawei cannot change, which makes it possible to subtract duplicate data even if it is located in different files. This solution works very well when backing up, in particular, virtual machines, as well as when using object access with versioning.
Unfortunately, the existence of deduplication is known only from the documentation for the Dorado 5000 V6: there are no settings or reports associated with it in the system. During testing, I repeatedly wrote the same 8 GB file to LUNs presented via ESXi to a Windows virtual machine. The only metric you can use to navigate deduplication issues is the performance indicator on the volume information tab.
The background search process for duplicate extents consumes approximately 2-3% of the CPU resources of each of the controllers (thus all 4 CPUs are used). Unfortunately, the storage system does not produce any other more detailed statistics.
SSD and expansion cards
Huawei Dorado 5000 V6 uses Huawei proprietary PCI Express drives. These drives have a slightly different type of case with a thickness of only 9.5 mm, which made it possible to fit a record 36 hot-swappable drives on the front panel, one and a half times more than in standard 2U cases for 2.5-inch SSDs!
Secondly, Huawei used its own controller in these drives, optimized for minimal latency. As a result, the typical storage access time in read/write operations is 0.1 ms! Yes, we are talking about the entire storage system and access from the Front-End interface to the disk installed in the expansion shelf.
Huawei calls these drives Palm SSDs, and the thickness is not their only plus: each drive has two physical ports for connecting to the backplane instead of one large port for SAS drives. This allowed the Backplane to be mounted horizontally, parallel to the motherboard, which improved ventilation and heat dissipation. At the time of writing this review, the maximum capacity of one Palm SSD was an impressive 15.36 TB.
Powerful lithium-ion batteries are used to protect the contents of the cache, in which power supplies are installed as cartridges. It should be borne in mind that replacing batteries is possible only when the corresponding power supply is disconnected, which does not affect the performance of the storage system in any way.
Interface cards have remained unchanged from previous generations of Dorado, and in general, there is nothing special here: each controller supports the installation of six interface cards for Front-End connections. Today it could be:
- 4-port 8/16/32 Gb FC/FC-NVMe 4-port 8/16/32 Gb FC
- 4-port 10/25 Ethernet
- 2-port 40/100 Gb ETH
To connect the control network, plug-in modules with 1GBase-T and RS-232 interfaces are used. For interconnection between controller chassis, 4-port 25G RDMA host adapters are used.
Each controller uses two 100 Gigabit RDMA ports to communicate with the disk shelves. Once again, I want to remind you that Dorado 5000 V6 uses active shelves with their own processors, which themselves rebuild arrays in case of breakdown.
Notice how RDMA boards look like: just a piece of a printed circuit board with tracks, no controllers, no memory, or some other binding. No controllers are needed to transfer data along a direct memory-to-memory path, and this is the pinnacle of modern technology.
As for the control interface, here Huawei, it seems to me, has simplified everything too much. On the other hand, the process of mounting LUNs for hosts has become much easier: now you do not have to create LUN groups, port groups and host groups: you create a LUN, select a host, and it is mounted. In the area of replication and creating snapshots on a schedule, the confusion remains: you create protection groups, add groups of LUNs to them, adjust the frequency of snapshots... and if later you want to delete the LUN, you will have to roll it all back in the opposite order, manually, opening a new tab at each stage in the interface.
From the unpleasant moments - the interface is clearly designed for robots, not people. Well, let's say it would be logical to bring information about the reconstruction of the disk pool to the start page: it's still not an ordinary event, but no, it is hidden in one short line in the properties of the disk pool itself. And so in everything: there are practically no settings related to data storage. You can choose a preset for a LUN when you create it (database, VDI or shared pool for VMware), but you can’t dive into the parameters.
Current monitoring and status of head unit components in traditional style. Here you can click on drives, power supply, interface ports and transceivers, and see the speed, date of manufacture, firmware version and serial number of the node.
While the storage system itself remains extremely uninformative, the Dorado 5000 V6 has very strong statistics for collecting performance metrics: literally for every port, every disk, every pool and LUN, you can get speed indicators and keep their history for as long as you like. I think that the Web interface, as conceived by Huawei, remains here only for configuring hosts and LUNs, and all statistics should be processed through a cloud service.
eService: monitoring with predictive analysis
I have already said that, as for me, the Dorado 5000 V6 interface is extremely uninformative, since you have the opportunity to monitor the storage system through the eService cloud service. When registering a device, it starts sending telemetry to the "cloud", at the same time becoming registered with Huawei's technical support service.
Each alert, before being displayed in the system, undergoes an inspection in the cloud, and along with the degree of importance, you are given a recommendation on what to do. For example, when an SSD was taken out of the array, the service recommended closing Alert and not worrying in vain - the system rebuilt the RAID in free space and returned to a healthy state. But when registering with the service, I specifically contacted technical support and asked to disconnect our system from monitoring by staff, otherwise they would have started calling and agreeing on the time of arrival to replace the departed SSD. This service is provided with the system by default and greatly facilitates the work of IT personnel: even if you missed something or overlooked something, Huawei looks after you, and in case of a breakdown, it is already coming to you with a replacement.
Of course, eService is at some points more informative than storage: it evaluates your entire infrastructure (including Huawei servers) and predicts the occurrence of bottlenecks, builds patterns of the intensity of access to resources, and more importantly, predicts the failure of which parts of equipment. In our case, it showed only the probability of SSD failure, which was zero on new drives. But under the hood service hidden massive analysis of more than 500,000 disks for various installations, whose metrics were analyzed for 600 days. As a result, eService predicts a drive failure as early as 14 days before its occurrence with a probability of 80% and only 0.1% of false positives. Cool? Still would!
But again, from the point of view of a system administrator: there are a lot of graphs, and such information as the temperature of components is not available anywhere. There is a general temperature for the controllers, but you cannot get these values either from the disks or the processor: the system collects them, analyzes them in its cloud, but does not speak. As soon as something goes out of scope, a signal will go off and a Huawei technician will call you.
In the same eService window, you can chat with technical support specialists, draw up a report and view logs. In general, the only thing missing here is detailed analytics on the speed of storage nodes: it is in the Web interface, and it is striking in detail. You can get information in the form of graphs on the load and access times of ports, disks, pools, LUNs, memory, caches, etc. The report can be saved in .pdf or .csv format.
By the way, a separate program is used to collect logs, which, without interrupting the storage system, conducts the entire range of tests and checks and saves the report in the archive, which you then upload to the support service when considering requests.
Test: Access Time
There was a little confusion with the speed tests: I connected the storage system via 4 FC 8Gbps ports in Multipath mode and expected to see 500-600K IOPS, but I hardly managed to squeeze out 130K. After talking with the nice guys from tech support, I found out that the ceiling for this configuration is 200K IOPS. As it turned out later, it's all about SSDs, which give 10-15K IOPS for reading / writing in 8K blocks (the default size of the internal block for storage). Let me remind you that we have a RAID 5 of 8 SSDs with a volume of 7.68 TB each, and in my laptop an SSD gives out 500 thousand IOPS at any time of the day or night. Turning to Huawei technical specialists for help, and going through the chain from bottom to top, I received a very original answer. Its meaning was that I was the only one who initialized this storage system with a starter kit for 8 disks. Typically, a customer of such systems installs hundreds of drives, duplicating head units, creating a single cluster of them, and of course the cumulative performance reaches millions of IOPS on dozens of Front-End ports, and this speed is present in all presentations. Every SSD in the Dorado 5000 V6 is optimized for the fastest response time, and this is the only parameter that we can adequately measure in our setup. Huawei is proud that the response time, even when accessing data located in the expansion shelf, dances around 0.1-0.2 ms. Well, let's see. We will use an 800 GB LUN so that, on the one hand, we can see the operation of the cache, and, on the other, not be limited by its size.
Let's start by looking at how block size affects random reads.
Average access time in normal mode while reading ranges from 0.23-0.24 ms,
while increasing the block size by 16 times increases the access time by only 40%. In general, I interpret the access time curve as follows: the upper values refer to reads from disks, while the lower values refer to cache hits. To see how this is true for multi-threaded access, let's take a look at the 8-thread access times with a constant 10 hour load:
Even in a fully synthetically generated random load, the cache is clearly visible, but on such fast drives it gives only a 30% increase in speed.
On complex patterns, the Dorado 5000 V6 shows itself just fine, even with a load of 1024 threads,
Both VDI and SQL patterns show record low latency:
Huawei itself claims that the Dorado 5000 V6 series storage system can easily service 7200 VDI machines when installing 100 SSD drives.
Expansion and upgrade options
Let's talk about the big numbers: each Dorado 5000 V6 head unit supports up to 48 external ports, it can connect 5 NVME SSD shelves or 7 SAS SSD shelves for a total of up to 200 drives (including those installed in the head device).
But the most interesting thing is FlashEver, which is available for Huawei customers. Its essence lies in the fact that within 10 years you can change the nodes of your storage system to new ones without stopping services. For example, by purchasing a Dorado 5000 V6, over time you will be able to replace the controllers with V7, plus - you can also change disk shelves along with the drives, and it is not necessary to change with the disposal of the old shelf. In one storage system, it is allowed to use disk shelves of different generations (from V6 and above), and it is natural that the replacement is made without stopping your services, and the storage system will continue to work.
Within your organization, you can federate storage systems by uniting Dorado V6, V7 and V8 into a single network, up to 128 controllers in total. You can move data between them without losing communication with hosts using a failover storage cluster.
Usually, when developing storage systems, the manufacturer takes into account the capabilities of third-party chips that he will use in the device, as well as the algorithms for working with data that he uses in the software layer. In the case of Huawei, everything is different: the company was not limited by someone else's framework and created its own chips for SSD controllers, for host adapters, its own processors and its own artificial intelligence. From start to finish, this is a closed development that does not use third-party code, and even more so open-source solutions. This is always better in terms of safety, but at the same time, it is much more expensive in terms of the manufacturer's strength. The manufacturer simply did not have time to make some points before the final release, and among them is file access, which will appear here only in 2021 along with a firmware update.
It is clear that transferring from x86 to ARM64 is too difficult a task even for such a giant as Huawei, but it also provides amazing opportunities. Where else have you seen a storage system with 256 physical cores in a 2U case? And even one where each shelf has its own Kunpeng 920 processor, which speeds up the rebuilding of arrays? Huawei has its own processors, and the company is not tied to the pricing policy of anyone else, so here, with all its generosity, it can increase the performance of the AI module, increase the number of cores or frequencies in the controllers, and change the interface speeds. This is a real plus of its own ecosystem, which the company builds exclusively on its own developments. The final solutions are impressive: predictive analysis, and the ability to upgrade and federate resources, and even wear and tear on the SSD and an immortal RAID array that will always be in Optimal state until you run out of free space.
In general, today Huawei Dorado 5000 V6 combines all the developments of the IT industry over the past 20 years, and can be used in mission-critical applications that generate a high load on the SAN network.
Mikhail Degtyarev (aka LIKE OFF)