EPYC hosting: exploring how AMD is changing the VDS hosting market

The data center market demonstrates double-digit growth y/y together with the incessant exponential growth of information on the Internet. Today, own “VDS-point in the cloud” has become as commonplace for modern active people as a credit card used to be. And if 15 years ago the concept of "hosting" was associated exclusively with websites, today it is already inextricably linked with virtual machines, inside which anything can be run: what you want to work around the clock, reliably, quickly and stable, you send “live to the cloud” - this is the foundation of IoT, the foundation for IT startups and just a damn convenient thing.

In the retail segment, VDS is bought for different needs: for backups, for accounting and warehouse accounting, for processing data from IoT sensors, for trading robots and, of course, for hosting websites, therefore, from a typical hosting provider that works with individuals and small businesses, there is no general portrait of the consumer and his load, for which it is worth optimizing the service or projecting the infrastructure. Roughly speaking, you have to organize your work in such a way as to please everyone. If you are wondering whether to create your own cloud hosting in order to make money on migrating workflows online, then we have selected a good example for you.

VDSina.ru: the startup that could

There is a widespread myth among ordinary people that a hosting provider must necessarily have its own data center, even in the basement, but its own. However, this long-standing stereotype has nothing to do with reality: most modern hosting providers simply rent server racks in data centers, and even the well-known service CloudFlare does not have not only its own data center , but also its own maintenance personnel, paying only for the placement of its own EPYC-based servers in data centers around the world. If you want to be afloat, take everything on lease and don't buy anything, - globalization dictates to us. And of course, the ability to rent a server rack or space for a server to put equipment there and start selling VDS services looks like a very viable idea for a startup with very low capital investments.

Sometime in 2014, the VDSina service was such a startup (www.vdsina.ru), on whose servers more than 12 thousand client virtual machines are running today, most of which came from the end of 2019. In the conditions of the fiercest competition in the VDS market, the company was able to purchase equipment and adjust tariffs in such a way that the influx of customers turned it into one of the market leaders. Retail customers do not have the habit of getting attached to the provider and if they feel that the hosting is getting cramped, they will leave as suddenly as they came, therefore, for example, it is important for VDSina that the purchased equipment is available in the warehouse “just yesterday”, commissioning takes in a matter of hours, and the custom, factory build, and test sales model used by brands such as HPE, Dell, and Cisco is not even covered here.

Pavel Karpenko Pavel Karpenko, CEO Coloded (coloded.ru):

Our company is the exclusive supplier of server equipment for the hosting provider VDSina. Working with hosters has its own peculiarities: in the first place is the availability of equipment in the warehouse and the willingness to ship it within an hour. Any breakdown of equipment for a hoster is an inevitable loss, therefore, as a rule, we change a failed server for a new one in 1 day, and then we deal with the manufacturer under warranty. You always have to imagine what kind of equipment your client may need next month and deliver it to the warehouse in advance. Over the years of cooperation, we have developed such experience.

The VDSina architecture is built on open source software: CentOS 7 is used as a virtualization platform, a Ceph cluster is allocated for storage systems, so the company is not tied to the hardware vendor, and does not make licensing fees for the hypervisor. Of course, renting a rack in a data center does not mean that you have to use the same communication channels that go into the data center. According to VDSIna representatives, the use of data center channels is the best way to bury the hosting business at the initial stage, so the first thing the provider does is to improve the channel topology (peering) to the main traffic exchange points.

Moreover, a modern service is simply obliged to offer the client protection against DDoS attacks or even WAF (Web Application Firewall), and here VDSIna's business model again relies on a third-party solution DDoS-Guard , included in the price for each client of the service. Yes, by default all virtual machines of the service are already protected from DDoS attacks, but WAF is not provided yet.

Sergey Krasnov, CTO VDSina:

We, at VDSina, have a team of cool professionals who, in one way or another, have been providing hosting services for more than 15 years. During this time, we realized that you cannot save on important things: hardware, data centers and software development. We always keep our finger on the pulse - we use the best data centers and equipment with the most modern filling. We have the best and most convenient hosting service control panel on the market, in which a lot of money and effort has been invested. We always keep feedback from clients and listen to their ideas and wishes. Probably, all this in a complex and allows you to win more and more customers every year.

That is, if you want to become a leader in the field of VDS services and serve tens of thousands of clients at the same time, take an example from VDSins: this company does not have unnecessary costs for air conditioning, backup power supply of server racks, or smart security gateways ... The only equipment purchased by a modern hosting provider is TOR switches (manufactured by fs.com) and servers on which the entire software-defined infrastructure works. Servers are bought at wholesale prices, and the minimum range reduces the cost of spare parts, and, of course, the choice of server depends on whether you can give a competitive price for your services next month or run to stop the churn of customers.

The hosting company's economy is completely CPU dependent

When choosing a hosting server, the provider, as a rule, has only one question: how many virtual machines can be placed on it, because on average, one CPU core should bring 4.28-7.14 USD per month, depending on the tariff. It is more profitable for a hoster to sell multi-core virtual machines not only for 12-16 cores, but even for 32-64 vCPUs. the price for such services includes a kind of "exclusivity" and is clearly focused on the wealthy client. Although, the mass client is still content with 1 virtual processor. This means that everything related to processors with 12 cores or less is gradually moving into the category of  "cloud-exotic", for example, for 1-thread applications with poorly optimized code, interpreters that do not support caching, high-load VPN servers, etc.

We have already said that in the retail segment it is impossible to predict the nature of the load of the next client: someone will run a backup program once a week, and someone will constantly compile the next project. Also, here you cannot allocate separate servers for clients with SAP, as big clouf providers does, working with corporate specialists. Therefore, in order not to be mistaken when choosing, hosting providers find it beneficial to install the most powerful x86 processors with 48 and 64 cores.

VDSina.ru tariff schedule *

Number of cores

1

1

1

2

4

4

8

12

16

32

Platform

AMD

Intel

Intel

Intel

Intel

Intel

Intel

AMD

AMD

AMD

Max. CPU frequency, GHz **

3.2

3.2

4.5

4.5

3.2

4.5

3.2

3.4

3.4

3.4

Price per month, USD ***

4.28

4.71

6.85

14.14

35.5

51.42

57

96.4

192.8

385.7

Price for 1 core per month, USD

4.28

4.71

6.85

7.07

8.88

12.85

7.11

8.02

12.04

12.04

Estimated price of 1 GHz 1 core, USD

1.32

1.47

1.52

110

1.57

2.85

2.21

2.35

3.54

3.54

* - Rates are as of September 2020.
** - using technologies of dynamic frequency increase
*** - no discounts, surcharges for excess traffic, bonuses, etc.

Our table shows that in the economy segment, the top-end AMD EPYC 7742 allows VDSina hosting to dump, selling the frequency 11% cheaper than it can be done on Intel processors. Let's talk again where this figure comes from: so, if we conditionally sell 100 virtual machines consuming 320 MHz each, then taking into account the cost of the required 320,000 MHz (the product of cores at the maximum frequency), on the Intel platform we can sell them for 470$, and for AMD - for 430$ per month. Note again, these are retail prices that include Turbo Boost and Max Boost algorithms, purchase and warranty costs, and electricity and secretary salaries.

Below we will figure out why hosters consider the frequency, and not the number of cores or the amount of RAM.

Comparative characteristics of 64-core EPYC

Model

AMD EPYC 7702

AMD EPYC 7662

AMD EPYC 7742

Number of cores

64

64

64

Base frequency, GHz

2.0

2.0

2.25

Base total frequency (frequency capacity, GHz)

128

128

144

Configurable TDP range with all cores, W

165 - 200

225 - 240

225 -240

Retail price, $

6450

6150

6950

Price per GHz, $

50.4

48.04

48.2

Max Boost

Maximum frequency, GHz

3.35

3.3

3.4

Expected maximum frequency *

3.2

3.2

3.2

Maximum total frequency (frequency capacity, GHz)

214.4

211.2

217.6

Price per maximum GHz, $

30.08

29.11

31.9

* - in general, AMD EPYC processors have a dynamic adjustment of the frequency of each core, depending on the load, temperature and power sufficiency. And although our tests show that in EPYC 2 all cores can operate at the maximum frequency, AMD itself makes a reservation that they say "not always", and there may be options when, if the threshold of 95C is exceeded, the processor will begin to reduce the core frequencies in steps of 25 MHz until the temperature drops below the threshold of 95 degrees Celsius. Therefore, along with the "maximum frequency" you should also take into account the "expected maximum frequency", which is 5-7% lower than the declared maximum. What's more, when looking at a trio of three 64-core processors, one of them, the 7702 (64-core, TDP 200W, cTDP 165-200W), supports TDP limitation to 165W, allowing the customer to squeeze into the power limit of the server rack, if any. ... But if there are no limits on electricity and cooling, then the EPYC 7762 is the best buy in terms of paying per GHz.

Memory comes second in importance, with Linux memory. The fact is that the free CentOS 7 hypervisor, unlike VMWare ESXi, does not know how to dynamically allocate RAM allocated for Windows virtual machines and gives them the entire specified amount. With Linux, there is no such problem, and the RAM of these operating systems is compressed and deduplicated at the hypervisor level, so the ideal client is a buyer of a 1-core virtual machine on Linux with a minimum amount of RAM. In general, you can buy more RAM or SSD drives than your business needs, and a server with 1-2 TB of RAM is no longer a rarity. You can scale everything except the CPU frequency.

When it comes to disk space, surprisingly, customers don't really care what kind of SSD you have: NVMe, PCI Express, or SATA/SAS. By and large, it is understandable - the difference in speed is almost impossible to notice, and the percentage of NVME hosting is growing every day. Any modern storage system scales well, so you shouldn't focus on the costs of SSDs: drives can be bribed as needed, watching your customer base grow along with profit.

AMD EPYC 7742. You give 1000 virtual machines on 1 server

In our article "what factors should be used to choose a server processor" I indicated that the only the correct parameter when choosing a CPU for the Cloud segment will be the total frequency capacity (the number of all cores multiplied by their frequency) and the price per Megahertz derived from it. Let's do some small calculations: the latest generation of hypervisors (be it VMware ESXi, Microsoft Hyper-V or Linux KVM) makes it possible to run up to 1024 virtual machines on one server (theoretically, you can hang 4 thousand virtual machines on KVM, but we are not such options consider). Practically, modern technologies allow placing up to 8 virtual computers on each processor core without any noticeable performance loss. In idle mode, a regular VM under Linux, be it Ubuntu or CentOS, consumes about 33 MHz, and Windows Server 2016 - 26 MHz. Even by the most conservative estimates, idle 1000 virtual machines will require a total of 33.7 GHz, which is about 16 modern cores with a base frequency of 2.1 GHz.

What does one EPYC 7742 have to offer? The base frequency capacity is 144 GHz (64 cores, each with a frequency of 2.25 GHz), which is 4 times more than it takes to run 1024 client virtual machines. We do not consider the HyperThreading mode, since it is usually useless in virtualization, so much so that VMWare recommends disabling it altogether. Of course, during the day, the load of clients on IT/OT tasks increases, but not as much as it is commonly believed, and the more clients you have, the more peak loads are smoothed out, spreading between free resources. In our test cluster on HWP, using a hyper-converged architecture (NAS + gateway + mail + sites + Python Jupyter + Prometheus/Grafana are collected in one box), the average consumption of one virtual machine from 11 to 15 o'clock in the afternoon is 280 MHz. On a scale of 1024 virtual machines, such a load will require a total frequency of processor cores of 286 GHz, that is, 2 times more than one EPYC 7742 can provide. Therefore, hosters buy 2-processor configurations.

Frequency capacity (total frequency) of modern server processors

Model

Intel Xeon 8380HL

AMD EPYC 7742

Number of cores

28

64

Base frequency, GHz

2.9

2.25

Base frequency capacity (total frequency of all cores in the socket, GHz)

81.2

144

Base frequency capacity (total frequency of all cores in two GHz sockets) for 2 processors

162.4

288

I want to say that today Intel does not have such technologies, and even the top-end Xeon Scalable of the 3rd generation model 8380HL has 28 cores with a frequency of 2.9 GHz, which gives a total capacity of 81.2 GHz, that is, no 1024 virtual machines in the operating mode and speech can not be, neither when installing one processor, nor when installing two. No, I'm certainly aware that Intel has a top line of Xeon Platinum 9200, but in our article "why xeon 9200 fails" it is described in detail why we do not take this processor into account.

But AMD EPYC 7742 has another trump card - the Max Boost system, which allows you to increase the frequency of ALL_CORES_AT_SAME_TIME up to 3.4 GHz, which was proven by us in the EPYC versus Threadripper test, but I want to repeat - it is not a fact that it is in your case that the processor will be “turbo-booted” at the maximum declared frequencies. AMD guarantees that all the cores of the second generation EPYC processor (with the exception of the even higher-frequency 7Fx2 series) can simultaneously operate at frequencies around 3.2GHz. What is especially nice is that at this increased frequency the server can work for a long time until the need for a boost disappears. In total, 2 processors at maximum speed offer you 435.2 GHz of total frequency capacity, which is more than 1024 client virtual machines.

Frequency capacity (total frequency) of modern server processors in Turbo Boost mode

Model

Intel Xeon 8380HL

AMD EPYC 7742

Number of cores

28

64

Maximum possible frequency, GHz

4.3

3.4

Maximum possible frequency for all cores, GHz

2.9

3.2

The number of cores running simultaneously at maximum frequency with 100% load of all cores

1

64

Maximum frequency capacity of 1 processor, GHz

82.6

204.8

Do I need to explain that if you decide to migrate all your 12 thousand clients to new AMD servers, then today you will need only 3 boxes 2U high, each of which has 4 dual-processor nodes (today this format is called 2U4N = 2 Units, 4 Nodes), plus a storage cluster and switches. And this is not some kind of exoticism, almost all major vendors have already presented their 2U4N solutions at AMD EPYC:

What used to occupy server racks from floor to ceiling, today you can fit under the table, well, or in dry business language - if you rented 3-4 42U racks yesterday, today you need one 42U rack ( they just don’t give less), of which you can sub-lease to other hosting providers for additional profit.

The creators of AMD EPYC gave the world of cloud providers the ability to install 512 physical x86 cores in a 2U chassis, divided into 4 servers with 2 processors each, and Supermicro has brought this idea to life in its platform 2124BT-HNTR , which we will now test.

TEST

We were allocated a VIP virtual machine for the test, which has 240 cores and 512 GB of RAM. In general, the server node had 1 TB of memory, but as it turned out, not all modern software, and especially Windows Server 2016, can run stably on such amounts of RAM, so the memory was artificially limited. What can I say? In desktop applications, you cannot realize how powerful this server is: Facebook and Youtube also open for 10 seconds (on a 10-gigabit Internet channel), a regular archive with Cinebench with a volume of 200 MB can be unpacked on the desktop for about a minute to PCI Express SSD. This Windows of yours with antiviruses and ad blockers kills any speed, and you are only happy that Google Chrome will not gobble up all your memory, although who knows...

But as soon as you touch something that works in a multithreaded way, even one node tears to shreds everything that you knew about the speed of servers until now, and there are four such nodes.

Favorite by many of my colleagues, the Cinebench R20 benchmark shows a record in the field of rendering, and this state of absolute victory is confirmed in each of the AIDA64 tests. How to Interpret Hosting User Test Results? It's very simple!

< a href = "https://hwp.media/Servers/Epyc_hosting/images/Aida64-240-threads_0008.png">

Well look: the speed of AES encryption (used in VPN) is higher than the total speed of all interfaces that could be installed on this server. Of course, VeraCrypt is more restrained in its estimates, but even here we understand that we will never run into a lack of processor speed. Now the concept of data-at-work encryption, complementing Data-at-Flow and Data-at-Rest, no longer looks so crazy: you can use this latest 2020 cloud trend now to stand out from the general competitive mass, and for sure, if the client of your hosting encrypts the disk of his virtual machine (and he will), then this will not affect performance in any way.

The following configuration was used to test the database

ОС.

CentOS 8, MariaDB 10.3.17, Sysbench 1.0.20

Additional DB settings

query_cache_type = on
query_cache_limit = 2M
query_cache_size = 32M
query_cache_min_res_unit = 8

join_buffer_size = 1M
read_rnd_buffer_size = 1M
max_heap_table_size = 32M
tmp_table_size = 32M

thread_cache_size = 32
innodb_sort_buffer_size = 2M
max_allowed_packet = 16M
innodb_log_file_size = 128M
expire_logs_days = 10
max_binlog_size = 100M

innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
transaction-isolation = READ-COMMITTED
default-storage-engine = innodb
innodb_buffer_pool_size = 4G
innodb_file_per_table = 1

Run test

sysbench ./src/lua/oltp_read_only.lua --mysql-db = test --mysql-user = root --warmup-time = 30 --time = 300 --threads = 1 --table-size = 200000000 run

Redis Server, in principle, suffers from virtualization a little, offering a maximum 256-thread mode just below 1 million transactions per second. According to our other tests, we know that even with memory encryption, Redis can give more than 1.15 million transactions in second.

Actually, we talked so much about what the hoster will get from using AMD EPYC that we completely forgot about money.

Price calculation   

Even on the minimum tariff plans, one server with 1024 virtual machines on board will bring 4385$ per month. The lower its purchase cost, the faster it will pay off and start making a profit. Let's count three different configurations to find out which type of machine has the best profitability.


Option1 - simple dual-processor AMD server

Option 2 - high density server for 4 AMD dual-processor nodes

Option 3 - Simple Intel Dual Processor Server

Platform

Gigabyte R282-Z91

Gigabyte H262-Z63

Supermicro case + X11DPI-N

Processor

2 x AMD EPYC 7742

8 x AMD EPYC 7742

2 x Intel Xeon Platinum 8280

Memory

32 x 32 Gb
DDR4 ECC Reg

64 x 64 Gb
DDR4 ECC Reg 2933MHz

16 x 64 Gb DDR4 LR ECC DIMM

System drives

2 x 240Gb SSD Samsung 883 DCT

8 x 240 Gb Intel SSD

2 x 240 Gb Intel SSD S4510

25 Gbps interface

Mellanox ConnectX-4 Lx EN 25 Gb/s SFP28
21 570 rub

4 x 2x SFP28 LAN ports, Supports 25Gb/s per port, Marvell FastLinQ QL4102-A2G OCP

1 x Mellanox ConnectX-4 SFP28

Total

$ 25,460

$ 119,153

37 490 $

Cost of 1 virtual machine (1vCPU, 1Gb RAM)

26.5$

31.2$

39.2$

Server payback when hosting 1024 virtual machines at the price of 4.28$/month per 1 node

185 days

218 days

274 days

Server profit for the warranty period, 36 months

122 122 $

470 357 $

$ 109,966

All prices are indicated without taking into account project discounts, and with a bulk purchase of servers, the payback period can be reduced even more dramatically, since today, when delivered to projects, prices for EPYC can be reduced by 30-40% from retail.

Is EPYC overheating?

In our calculation, we did not use the Supermicro 2124BT-HNTR 4-node 8-processor server, and here's why. The fact is that all modern top-end processors, be it AMD EPYC 77x2 or Intel Xeon Platinum, work at the limit of the technological capabilities of the server's heat sink, and some server manufacturers manage to remove 1800 W of processor heat from a 2U case (like Gigabyte ), but some do not (like Supermicro). Supermicro's page states that although the server supports 225W CPUs, “some processors may only work under certain conditions, please contact us for more information.” It reminds me of the situation with `` letters of happiness '', in which the conditions of the action were written in small print inside the envelope, and of course, VDSina specialists installed all 8 most powerful EPYC 7742 processors in the 4-node 2124BT-HNTR model, faced overheating and throttling of the server platform at a price of almost $ 150 thousand. At the same time, according to their statements, the temperature of the cold corridor in the server room was 21 degrees Celsius at the recommended 25 degrees. At the same time, in 4-node Gigabyte H262-Z63 servers, fans are installed in 2 rows to protect against overheating, our colleagues from ServeTheHome when testing a 4-node Dell PowerEdge C6525 we also did not notice overheating.

In dual-processor servers, this problem is absent at all, but to be on the safe side, you can choose 64-core AMD EPYC 7702 processors: their base frequency is 2.0 GHz, and in Turbo Boost it increases to 3.35 GHz, but the thermal package of each CPU is 200 watts This saves 50W in a 2-processor and 200W in an 8-processor server compared to the EPYC 7742.

Finally, it is worth remembering that modern motherboards allow you to tightly limit the TDP of the processor through the BIOS, as well as change the power consumption profile, which gives some room for maneuver if the machine starts to overheat during operation.

Conclusions

Perhaps, for the first time, we were able to calculate the benefits of using the AMD EPYC platform in hosting, in order to explain it to you on our fingers. Oddly enough it sounds, but it is the most powerful and most expensive processors that make VDS hosting available to literally everyone. Moreover, tariff plans for 16-32 cores appear, VDS tariff constructors come into use, and now providers can compete not only in price, but also in capabilities. You shouldn't assume that everything is already distributed and occupied in this market: VDSina has proved that with a competent approach, you can quickly break out into the market leaders and form a huge client base.

The experience of American hyperscalers shows that such sales models as "pay per frequency", "pay for consumed resources", sales of SaaS services and protected virtual machines are in demand, and our calculations show that AMD EPYC multi-core processors reduce the payback period of investments in the server to 6 months.

We would like to thank the companies that contributed to this article:

Mikhail Degtyarev (aka LIKE OFF)
17/09.2020


Read also:

Why AMD bought Xilinx: simple, clear language

When you look at how large companies are rowing everything that is bad in the world of protocols and network solutions, you do not immediately understand what is happening, so let's figure it out together.

Securing cloud resources during the move to remote work

One of the key challenges associated with the increased reliance on cloud resources is creating and maintaining ongoing security, including unified visibility and control to spot and mitigate threats and seamlessly remediate ...