Characteristics.
It is planned to achieve the following indicators:
- Number of cells: 64
- Process technology: 28 nm
- Clock frequency: 1.6 GHz
- On-chip memory size: 8 MB
- Crystal area: 40mm2
- Power consumption: 6 W
Real numbers will be announced based on the results of tests of manufactured samples in 2020.
In addition to the characteristics of the crystal itself, the processor will support up to 16 GB of RAM standard DDR4 3200MHz, PCI Express bus and PLL. It should be noted that the 28 nm process technology is the lowest household range that does not require special permissions for use, so it was chosen. In terms of the number of cells, different options were considered: 128 and 256, but with an increase in the crystal area, the percentage of defects increases. We settled on 64 cells and, accordingly, a relatively small area, which will give a greater yield of usable crystals on the plate. Further development is possible within the framework of the SVK (system in a package), where it will be possible to combine several 64-cell crystals into one package.
It must be said that the purpose and use of the processor is changing dramatically. S1 will not be an embedded microprocessor like P1 and R1 were, but a computing accelerator. Just like the GPGPU, the board with S1 can be inserted into the PCI Express motherboard of a regular PC and used for data processing.
Architecture
In S1, the minimum computational unit is now a “multicell”: a set of 4 cells that execute a certain sequence of commands.
At first, they planned to unite multicells into groups called a cluster for joint execution of commands: the cluster was supposed to contain 4 multicells, for a total of 4 separate clusters on the chip. However, each cell has full communication with all other cells in the cluster, and as the group of connections increases, there are too many connections, which dramatically complicates the topological design of the microcircuit and reduces its characteristics. Therefore, we decided to abandon cluster division, since the complication does not justify the results obtained. In addition, for maximum performance, it is most beneficial to run the code in parallel on each multicell. In total, the processor now contains 16 separate multicells. The multicell, although it consists of 4 cells, differs from the 4-cell R1, in which each cell had its own memory, its own command fetch unit, and its own ALU. The S1 is designed a little differently. The ALU has 2 parts: a floating point arithmetic unit and an integer arithmetic unit. Each cell has a separate integer block, but there are only two floating-point blocks in a multicell, so two pairs of cells share them between themselves. This was done mainly to reduce the area of the crystal: 64-bit floating-point arithmetic, unlike integer arithmetic, takes up a lot of space. Having such an ALU block on each cell turned out to be redundant: fetching instructions does not ensure that the ALUs are loaded and they are idle. When reducing the number of ALU blocks and maintaining the rate of fetching commands and data, as practice has shown, the total time for solving problems practically does not change or changes slightly, and ALU blocks are fully loaded. In addition, floating point arithmetic is not used as often as integer arithmetic.
A schematic view of the R1 and S1 processor blocks is shown in the diagram below. Here:
- CU (Control Unit) – instruction fetch unit
- ALUFX – arithmetic logic unit for integer arithmetic
- ALUFP – arithmetic logic unit for floating point arithmetic
- DMS (Data Memory Scheduler) – data memory management unit
- DM – data memory
- PMS (Program Memory Scheduler) – program memory management unit
- PM – program memory
Architectural differences of S1:
- Commands can now access the results of commands from previous paragraphs. This is a very important change that allows you to significantly speed up transitions when branching code. In the P1 and R1 processors there was no other choice but to write the desired results into memory and immediately read them back with the very first commands in the new paragraph. Even when using on-chip memory, write and read operations take 2 to 5 clock cycles each, which can be saved by simply referring to the result of the command from the previous paragraph
- Writing to memory and registers now occurs immediately, and not at the end of the paragraph, which allows you to start executing write commands before the end of the paragraph. As a result, potential downtime between paragraphs is reduced.
- The command system has been optimized, namely:
- Added 64-bit integer arithmetic: addition, subtraction, multiplication of 32-bit numbers, returning a 64-bit result.
- The method of reading from memory has been changed: now for any
As an argument to a command, you can simply specify the address from which you want to read data, while maintaining the order of execution of read and write commands.This also made the separate read from memory command obsolete. Instead, the command to load the value into the switch is used
(formerly
get
), specifying the memory address as an argument:
.data foo: .long 0x1234 .text habr: load_l foo ; will load the label address foo load_l [foo] into the switch; will load 0x1234 add_l [foo], 0xABCD into the switch; read the value and add it to the constant ; with one complete command - Added command format allowing 2 constant arguments. Previously, you could only specify a constant as the second argument; the first argument always had to be a reference to the result in the switch. The change applies to all two-argument commands. The constant field is always 32-bit, so this format allows, for example, to generate 64-bit constants with one command.
Was:load_l 0x12345678 patch_q @1, 0xDEADBEEF Now: patch_q 0x12345678, 0xDEADBEEF
- Vector data types have been changed and supplemented. What were previously called “packed” data types can now be safely called vector. In P1 and R1, operations on packed numbers only took a constant as the second argument, i.e., for example, in addition, each element of the vector was added to the same number, and there was no good use for this. Now similar operations can be applied to two full-fledged vectors. Moreover, this way of working with vectors is fully consistent with the vector mechanism in LLVM, which will now allow the compiler to generate code using vector types. patch_q 0x00010002, 0x00030004 patch_q 0x00020003, 0x00040005 mul_ps @1, @2 ; result - 00020006000C0014
- Removed processor flags.
As a result, about 40 commands that relied solely on flag values were removed. This made it possible to significantly reduce the number of commands and, accordingly, the chip area. And all the necessary information is now stored directly in the switch cell.- When comparing with zero, instead of the zero flag, just the value in the switch is now used
- Instead of the sign flag, the bit corresponding to the instruction type is now used: 7th for byte, 15th for short, 31st for long, 63rd for quad. Due to the fact that the sign is multiplied up to the 63rd bit, regardless of the type, you can compare numbers of different types: .data long: .long -0x1000 byte: .byte -0x10 .text habr: a := load_b [byte] ; The switch will receive the value 0xFFFFFFFFFFFFFFF0, ; according to the byte type, the 7th bit was multiplied to the 63rd. b := loadu_b [byte] ; The switch will receive the value 0x00000000000000F0, ; because the loadu_b command does not multiply the sign c := load_l [long] ; The switch will receive the value 0xFFFFFFFFFFFFF000. ge_l @a, @c ; The result of the "greater than or equal" command will be 1: ; the comparison takes into account the 31st bit, according to its type. lt_s @a, @b ; 1, because b was read as a positive number complete
The carry flag is no longer needed since there is 64-bit arithmetic
- The time for transitioning from paragraph to paragraph has been reduced to 1 bar (instead of 2-3 in R1)
Checking ASIC Aixin A1 25th
Stated characteristics of ASIC A1:
- Goes to 25 TH.
- Mines Bitcoin. We wrote more about mining algorithms earlier.
- Consumption 2100 Watt. At least that's how it should be consumed. You will learn about the results of the check at the end of the article.
- Income from the operation of the device is 475 rubles, excluding electricity payments.
- Payback of Aixin a1 25th under current conditions is from 3.7 months.
- Cost of the device: from 48,000 to 52,000 rubles (depending on quantity).
13 minutes after connection, the device “gained” only 23.3 Th, the overclocking did not go further. Unfortunately, with such an error it is impossible to contact the manufacturer, since the following is usually indicated: “Th +- 5%” or “Th +- 8%”.
The ASIC control panel is initially presented in Chinese, for simple control it is enough to translate it into English.
Later, the ASIC accelerated to 23.8 Th/s; it was too early to measure power consumption.
The panel contains quite convenient charts, pools, and 4 hash boards. The “Dashboard” tab provides information on the operating system, registering pools and workers.
It is also possible to switch the mod to maximum hashrate or minimum electricity consumption. It is indicated that the specified indicators will be displayed in less than 10 minutes. At the moment, the minimum consumption mode is in effect. There is a section for installing firmware.
LLVM-based compiler
The C language compiler for S1 is similar to R1, and since the architecture has not fundamentally changed, the problems described in the previous article, unfortunately, have not disappeared.
However, during the implementation of the new instruction set, the amount of output code itself decreased, simply due to the update of the instruction set. In addition, there are many more minor optimizations that will reduce the number of commands in the code, some of which have already been done (for example, generating 64-bit constants with one command). But there are even more serious optimizations that need to be made, and they can be arranged in order of increasing both efficiency and complexity of implementation:
- Ability to generate all two-argument commands with two constants.
Generating a 64-bit constant via patch_q is just a special case, but a general one is needed. In fact, the point of this optimization is to allow commands to substitute the first argument as a constant, since the second argument could always be a constant, and this has long been implemented. This is not a very common case, but, for example, when you need to call a function and write the return address from it to the top of the stack, you canload_l func wr_l @1, #SP optimize to wr_l func, #SP
- The ability to substitute memory access through an argument in any command. For example, if you need to add two numbers from memory, you can load_l [foo] load_l [bar] add_l @1, @2 optimize to add_l [foo], [bar] This optimization is an extension of the previous one, but analysis is already needed here: such a replacement can be Only carry out if the loaded values are used only once in this addition command and nowhere else. If the reading result is used even in just two commands, then it is more profitable to read from memory once as a separate command, and in the other two refer to it through a switch.
- Optimization of transfer of virtual registers between basic blocks. For R1, the transfer of all virtual registers was done through memory, which generates a very large number of reads and writes to memory, but there was simply no other way to transfer data between paragraphs. S1 allows you to access the results of the commands in the previous paragraphs, so, in theory, many memory operations could be removed, which would give the greatest effect of all optimizations. However, this approach is still limited by the switch: no more than 63 previous results, so not every virtual register transfer can be implemented this way. How to do this is a non-trivial task, and an analysis of the possibilities for solving it has yet to be done. The compiler sources will probably be made publicly available, so if anyone has ideas and would like to join in on the development, you can do so.
System requirements for running PhoenixMiner:
- AMD or NVIDIA - the miner supports cards from both manufacturers.
- Windows x64 (Windows 7 is better for GTX 9*0 cards).
- For AMD GPUs, you should use the latest drivers, version 18.1.1 or 18.2.1.
- If you plan to use third-party monitoring programs, the only ones that work correctly with PhoenixMiner are GPU-Z and MSI Afterburner.
Our first impression of the PhoenixMiner was that it was a pretty pleasant experience. From the detailed and constantly updated topic on the BitcoinTalk forum to the ease of setup and use. Working with PhoenixMiner is no more difficult than working with everyone’s favorite Claymore Ethereum Dual Miner.
PhoenixMiner can be launched with or without command line parameters. There is an additional configuration file called "epools" that serves as a list of pools for the miner. This fact makes PhoenixMiner a good choice for use in large mining farms; in addition to the fact that you can run a separate instance of the miner on each separate card in the rig, you can also copy your configuration files to other rigs.
Like Claymore, PhoenixMiner supports remote management. However, in this case you cannot use RDP (Windows Remote Desktop) - use VNC or TeamViewer instead. Both configuration files can be updated remotely in real time. A complete list of commands supported by the miner can be found in the official thread on the BitcoinTalk forum.
Although PhoenixMiner is a completely standalone project and is not a fork of the Claymore miner, it supports Claymore's command line and configuration file syntax. This fact will make the transition from Claymore to Phoenix quite simple and painless - you can use the same configuration files without having to change anything.
Benchmarks
Since the processor has not yet been released on-chip, it is difficult to assess its actual performance.
However, the RTL code of the kernel is already ready, which means that evaluation can be done using simulation or FPGA. To run the following benchmarks, simulation was used using ModelSim to calculate the exact execution time (in clock cycles). Since it is difficult and time consuming to simulate the entire crystal, one multicell was simulated and the result was multiplied by 16 (if the task is intended for multithreading), since each multicell can work completely independently of the others. In parallel, a multicell simulation was carried out on Xilinx Virtex-6 to test the performance of the processor code on real hardware.
CoreMark
CoreMark is a set of tests for a comprehensive assessment of the performance of microcontrollers and central processors, as well as their C compilers.
As you can see, the S1 processor is neither of those things. However, it is designed to execute absolutely arbitrary code, i.e. anyone that could be run on a central processor. This means CoreMark is no worse suited for evaluating the performance of the S1. CoreMark includes work with linked lists, matrices, state machines and CRC sum calculation. In general, most code turns out to be strictly sequential (which tests the strength of multi-cell hardware parallelism) and with a lot of branches, which is why compiler capabilities play a significant role in the final performance. The compiled code contains quite a lot of short paragraphs, and despite the fact that the speed of transition between them has increased, branching involves working with memory, which I would like to avoid as much as possible.
Comparative table of CoreMark indicators:
Multiclet R1 (llvm compiler) | Multiclet S1 (llvm compiler) | Elbrus-4S (R500/E) | Texas Inst. AM5728 ARM Cortex-A15 | Baikal-T1 | Intel Core i7 7700K | |
Year of issue | 2015 | 2019 | 2014 | 2018 | 2016 | 2017 |
Clock frequency, MHz | 100 | 1600 | 700 | 1500 | 1200 | 4500 |
Overall CoreMark | 59 | 18356 | 1214 | 15789 | 13142 | 182128 |
CoreMark/MHz | 0.59 | 11.47 | 5.05 | 10.53 | 10.95 | 40.47 |
The result of one multicell is 1147, or 0.72 / MHz, which is higher than that of R1. This suggests the benefits of developing a multicellular architecture in the new processor.
Whetstone
Whetstone is a set of tests for measuring processor performance when working with floating point numbers.
Here the situation is much better: the code is also sequential, but without a large number of branches and with good internal parallelism. Whetstone consists of many modules, which allows you to measure not only the overall result, but also the performance of each specific module:
- Array elements
- Array as parameter
- Conditional jumps
- Integer arithmetic
- Trigonometric functions (tan, sin, cos)
- Procedure calls
- Array references
- Standard functions (sqrt, exp, log)
They are divided into categories: modules 1, 2 and 6 measure floating point performance (lines MFLOPS1-3);
modules 5 and 8 - mathematical functions (COS MOPS, EXP MOPS); modules 4 and 7 - integer arithmetic (FIXPT MOPS, EQUAL MOPS); module 3 - conditional jumps (IF MOPS). In the table below, the second row of MWIPS is the overall indicator. Unlike CoreMark, Whetstone will be compared on a single core or, in our case, on a single multicell. Since the number of cores varies greatly in different processors, then, for the purity of the experiment, we will consider the indicators per megahertz.
Whetstone Comparison Chart:
CPU | MultiClet R1 | MultiClet S1 | Core i7 4820K | ARM v8-A53 |
Frequency, MHz | 100 | 1600 | 3900 | 1300 |
MWIPS/MHz | 0.311 | 0.343 | 0.887 | 0.642 |
MFLOPS1/MHz | 0.157 | 0.156 | 0.341 | 0.268 |
MFLOPS2/MHz | 0.153 | 0.111 | 0.308 | 0.241 |
MFLOPS3/MHz | 0.029 | 0.124 | 0.167 | 0.239 |
COS MOPS/MHz | 0.018 | 0.008 | 0.023 | 0.028 |
EXP MOPS/MHz | 0.008 | 0.005 | 0.014 | 0.004 |
FIXPT MOPS/MHz | 0.714 | 0.116 | 0.998 | 1.197 |
IF MOPS/MHz | 0.081 | 0.196 | 1.504 | 1.436 |
EQUAL MOPS / MHz | 0.143 | 0.149 | 0.251 | 0.439 |
Whetstone contains much more direct computational operations than CoreMark (which is very noticeable when looking at the code below), so the important thing to remember here is that the number of floating point ALUs is halved.
However, the calculation speed was almost unchanged compared to R1. Some modules fit very well into a multicellular architecture.
For example, module 2 counts a lot of values in a loop, and thanks to the full support of double-precision floating-point numbers by both the processor and the compiler, after compilation, large and beautiful paragraphs are obtained that truly reveal the computing capabilities of the multicellular architecture: Large and beautiful paragraph on 120 commands
pa: SR4 := loadu_q [#SP + 16] SR5 := loadu_q [#SP + 8] SR6 := loadu_l [#SP + 4] SR7 := loadu_l [#SP] setjf_l @0, @SR7 SR8 := add_l @SR6, 0x8 SR9 := add_l @SR6, 0x10 SR10 := add_l @SR6, 0x18 SR11 := loadu_q [@SR6] SR12 := loadu_q [@SR8] SR13 := loadu_q [@SR9] SR14 := loadu_q [ @SR10] SR15 := add_d @SR11, @SR12 SR11 := add_d @SR15, @SR13 SR15 := sub_d @SR11, @SR14 SR11 := mul_d @SR15, @SR5 SR15 := add_d @SR12, @SR11 SR12 : = sub_d @SR15, @SR13 SR15 := add_d @SR14, @SR12 SR12 := mul_d @SR15, @SR5 SR15 := sub_d @SR11, @SR12 SR16 := sub_d @SR12, @SR11 SR17 := add_d @SR11, @SR12 SR11 := add_d @SR13, @SR15 SR13 := add_d @SR14, @SR11 SR11 := mul_d @SR13, @SR5 SR13 := add_d @SR16, @SR11 SR15 := add_d @SR17, @SR11 SR16 := add_d @SR14, @SR13 SR13 := div_d @SR16, @SR4 SR14 := sub_d @SR15, @SR13 SR15 := mul_d @SR14, @SR5 SR14 := add_d @SR12, @SR15 SR12 := sub_d @SR14, @ SR11 SR14 := add_d @SR13, @SR12 SR12 := mul_d @SR14, @SR5 SR14 := sub_d @SR15, @SR12 SR16 := sub_d @SR12, @SR15 SR17 := add_d @SR15, @SR12 SR15 := add_d @SR11, @SR14 SR11 := add_d @SR13, @SR15 SR14 := mul_d @SR11, @SR5 SR11 := add_d @SR16, @SR14 SR15 := add_d @SR17, @SR14 SR16 := add_d @SR13, @SR11 SR11 := div_d @SR16, @SR4 SR13 := sub_d @SR15, @SR11 SR15 := mul_d @SR13, @SR5 SR13 := add_d @SR12, @SR15 SR12 := sub_d @SR13, @SR14 SR13 := add_d @ SR11, @SR12 SR12 := mul_d @SR13, @SR5 SR13 := sub_d @SR15, @SR12 SR16 := sub_d @SR12, @SR15 SR17 := add_d @SR15, @SR12 SR15 := add_d @SR14, @SR13 SR13 := add_d @SR11, @SR15 SR14 := mul_d @SR13, @SR5 SR13 := add_d @SR16, @SR14 SR15 := add_d @SR17, @SR14 SR16 := add_d @SR11, @SR13 SR11 := div_d @SR16 , @SR4 SR13 := sub_d @SR15, @SR11 SR4 := loadu_q @SR4 SR5 := loadu_q @SR5 SR6 := loadu_q @SR6 SR7 := loadu_q @SR7 SR15 := mul_d @SR13, @SR5 SR8 := loadu_q @ SR8 SR9 := loadu_q @SR9 SR10 := loadu_q @SR10 SR13 := add_d @SR12, @SR15 SR12 := sub_d @SR13, @SR14 SR13 := add_d @SR11, @SR12 SR12 := mul_d @SR13, @SR5 SR13 := sub_d @SR15, @SR12 SR16 := sub_d @SR12, @SR15 SR17 := add_d @SR15, @SR12 SR15 := add_d @SR14, @SR13 SR13 := add_d @SR11, @SR15 SR14 := mul_d @SR13 , @SR5 SR13 := add_d @SR16, @SR14 SR15 := add_d @SR17, @SR14 SR16 := add_d @SR11, @SR13 SR11 := div_d @SR16, @SR4 SR13 := sub_d @SR15, @SR11 SR15 : = mul_d @SR13, @SR5 SR13 := add_d @SR12, @SR15 SR12 := sub_d @SR13, @SR14 SR13 := add_d @SR11, @SR12 SR12 := mul_d @SR13, @SR5 SR13 := sub_d @SR15, @SR12 SR16 := sub_d @SR12, @SR15 SR17 := add_d @SR15, @SR12 SR15 := add_d @SR14, @SR13 SR13 := add_d @SR11, @SR15 SR14 := mul_d @SR13, @SR5 SR13 := add_d @SR16, @SR14 SR15 := add_d @SR17, @SR14 SR16 := add_d @SR11, @SR13 SR11 := div_d @SR16, @SR4 SR13 := sub_d @SR15, @SR11 SR15 := mul_d @SR13, @ SR5 SR13 := add_d @SR12, @SR15 SR12 := sub_d @SR13, @SR14 SR13 := add_d @SR11, @SR12 SR12 := mul_d @SR13, @SR5 SR13 := sub_d @SR15, @SR12 SR16 := sub_d @SR12, @SR15 SR17 := add_d @SR14, @SR13 SR13 := add_d @SR11, @SR17 SR14 := mul_d @SR13, @SR5 SR5 := add_d @SR16, @SR14 SR13 := add_d @SR11, @SR5 SR5 := div_d @SR13, @SR4 wr_q @SR15, @SR6 wr_q @SR12, @SR8 wr_q @SR14, @SR9 wr_q @SR5, @SR10 complete
popcnt
To reflect the characteristics of the architecture itself (without dependence on the compiler), we will measure something written in assembly language, taking into account all the features of the architecture.
For example, counting unit bits in a 512-bit number (popcnt). For clarity, we will take the results of one multicell so that they can be compared with R1. Comparison table, number of clock cycles per 32-bit calculation cycle:
Algorithm | Multiclet R1 | Multiclet S1 (one multicell) |
BitHacks | 5.0 | 2.625 |
Here, new updated vector instructions were used, which made it possible to reduce the number of instructions by half compared to the same algorithm implemented in the R1 assembler.
The speed of work, accordingly, increased almost 2 times. popcnt
bithacks: b0 := patch_q 0x1, 0x1 v0 := loadu_q [v] v1 := loadu_q [v+8] v2 := loadu_q [v+16] v3 := loadu_q [v+24] v4 := loadu_q [v+ 32] v5 := loadu_q [v+40] v6 := loadu_q [v+48] v7 := loadu_q [v+56] b1 := patch_q 0x55555555, 0x55555555 i00 := slr_pl @v0, @b0 i01 := slr_pl @ v1, @b0 i02 := slr_pl @v2, @b0 i03 := slr_pl @v3, @b0 i04 := slr_pl @v4, @b0 i05 := slr_pl @v5, @b0 i06 := slr_pl @v6, @b0 i07 := slr_pl @v7, @b0 b2 := patch_q 0x33333333, 0x33333333 i10 := and_q @i00, @b1 i11 := and_q @i01, @b1 i12 := and_q @i02, @b1 i13 := and_q @i03, @ b1 i14 := and_q @i04, @b1 i15 := and_q @i05, @b1 i16 := and_q @i06, @b1 i17 := and_q @i07, @b1 b3 := patch_q 0x2, 0x2 i20 := sub_pl @v0 , @i10 i21 := sub_pl @v1, @i11 i22 := sub_pl @v2, @i12 i23 := sub_pl @v3, @i13 i24 := sub_pl @v4, @i14 i25 := sub_pl @v5, @i15 i26 : = sub_pl @v6, @i16 i27 := sub_pl @v7, @i17 i30 := and_q @i20, @b2 i31 := and_q @i21, @b2 i32 := and_q @i22, @b2 i33 := and_q @i23, @b2 i34 := and_q @i24, @b2 i35 := and_q @i25, @b2 i36 := and_q @i26, @b2 i37 := and_q @i27, @b2 i40 := slr_pl @i20, @b3 i41 := slr_pl @i21, @b3 i42 := slr_pl @i22, @b3 i43 := slr_pl @i23, @b3 i44 := slr_pl @i24, @b3 i45 := slr_pl @i25, @b3 i46 := slr_pl @i26, @ b3 i47 := slr_pl @i27, @b3 b4 := patch_q 0x4, 0x4 i50 := and_q @i40, @b2 i51 := and_q @i41, @b2 i52 := and_q @i42, @b2 i53 := and_q @i43 , @b2 i54 := and_q @i44, @b2 i55 := and_q @i45, @b2 i56 := and_q @i46, @b2 i57 := and_q @i47, @b2 i60 := add_pl @i50, @i30 i61 : = add_pl @i51, @i31 i62 := add_pl @i52, @i32 i63 := add_pl @i53, @i33 i64 := add_pl @i54, @i34 i65 := add_pl @i55, @i35 i66 := add_pl @i56, @i36 i67 := add_pl @i57, @i37 b5 := patch_q 0xf0f0f0f, 0xf0f0f0f i70 := slr_pl @i60, @b4 i71 := slr_pl @i61, @b4 i72 := slr_pl @i62, @b4 i73 := slr_pl @ i63, @b4 i74 := slr_pl @i64, @b4 i75 := slr_pl @i65, @b4 i76 := slr_pl @i66, @b4 i77 := slr_pl @i67, @b4 b6 := patch_q 0x1010101, 0x1010101 i80 := add_pl @i70, @i60 i81 := add_pl @i71, @i61 i82 := add_pl @i72, @i62 i83 := add_pl @i73, @i63 i84 := add_pl @i74, @i64 i85 := add_pl @i75, @ i65 i86 := add_pl @i76, @i66 i87 := add_pl @i77, @i67 b7 := patch_q 0x18, 0x18 i90 := and_q @i80, @b5 i91 := and_q @i81, @b5 i92 := and_q @i82 , @b5 i93 := and_q @i83, @b5 i94 := and_q @i84, @b5 i95 := and_q @i85, @b5 i96 := and_q @i86, @b5 i97 := and_q @i87, @b5 iA0 : = mul_pl @i90, @b6 iA1 := mul_pl @i91, @b6 iA2 := mul_pl @i92, @b6 iA3 := mul_pl @i93, @b6 iA4 := mul_pl @i94, @b6 iA5 := mul_pl @i95, @b6 iA6 := mul_pl @i96, @b6 iA7 := mul_pl @i97, @b6 iB0 := slr_pl @iA0, @b7 iB1 := slr_pl @iA1, @b7 iB2 := slr_pl @iA2, @b7 iB3 := slr_pl @iA3, @b7 iB4 := slr_pl @iA4, @b7 iB5 := slr_pl @iA5, @b7 iB6 := slr_pl @iA6, @b7 iB7 := slr_pl @iA7, @b7 wr_q @iB0, c wr_q @iB1 , c+8 wr_q @iB2, c+16 wr_q @iB3, c+24 wr_q @iB4, c+32 wr_q @iB5, c+40 wr_q @iB6, c+48 wr_q @iB7, c+56 complete
RMC holding, multiclet miner and ICO generator: full report
August 14, 2017
On August 7, a press conference was held by the leaders of the Party of Growth, dedicated to the launch of the first company on the market through the ICO generator created by the party. Boris Titov, the Presidential Commissioner for Entrepreneurs' Rights and Chairman of the Growth Party, opened the press conference with his speech.
He spoke briefly about the party and the directions of its activities. Boris called the issues of the new economy the core direction. Therefore, the Party of Growth has created a so-called ICO generator, the main task of which will be expert verification and professional preparation of projects for entering the ICO. According to Boris Titov: “If we put our stamp, then there will be no risks with these people.” The first project of this generator was the mining project of Dmitry Marinichev and Sergei Bobylev, which Boris Titov called a domestic breakthrough. Gibadullin Airat took the floor next. According to him, the main thing is what is inside the project. There will be no competition as such in the generator. All projects that have passed the examination will be launched at ICO. Dmitry Marinichev noted that on the one hand, the ICO essence is simple, on the other it is complex, and at the moment there are not enough clear rules and legal support. In this regard, a video call was organized with Elina Sidorenko, head of the interdepartmental working group for assessing the risks of cryptocurrency turnover under the State Duma.
Elina noted that she was very enthusiastic about the idea of creating a generator, because this is a unique opportunity to understand how conscientious the participants are. At the moment, there are no laws regulating this area, and from a legal point of view, freedom of contract allows such transactions to be concluded. In her opinion, purchasing tokens for fiat currency is equivalent to purchasing a service, and this scheme works in the Russian legal framework.
Elina Sidorenko: “There is responsibility if this service is provided improperly. If tokens are purchased for virtual currency, then this cannot be called a sale; in fact, cryptocurrencies are a kind of digital entity that gives a person the right to receive a discount, and although this percentage is very large, we can say that the transaction is carried out in fiat currency.”
RMC tokens are perceived as a certain asset that certifies a person’s right to membership in a closed mining club. There is no definition of mining itself at the moment; it is neither prohibited nor permitted. According to the Central Banks of some countries, it is considered a form of independent economic activity, which has not yet been regulated in any way. At the same time, Russian citizens, in the event of a violation of their rights, can safely turn to Russian courts to resolve disputes regarding the provision of services.
Elina Sidorenko: “This project is interesting precisely because we are finding the first optimal solution within the Russian jurisdiction. This distinguishes the project from all others, and, as it seems to me, it provides certain guarantees.”
Elina also noted that she has a very indirect connection to this project, and acts as a representative of the generator. And a fundamentally important issue for her is protecting the interests of investors. Even for cryptocurrencies, and this is achieved in a very interesting way:
Elina Sidorenko: “Transferring Bitcoin into a so-called information unit, which does not have a financial nature in the context of this agreement, opens up Russian jurisdiction for us. Even if there is a settlement of some dispute, we can demand the return of these same information units in the same amount, because in this part they are not money.”
After Elina, Sergey Bobylev took the floor and started with the advantages of Russia for mining.
Cheap electricity, a cold climate, a reliable energy system with a surplus of 20 gigawatts, and the availability of a skilled workforce were all affected. According to Sergei, existing miners are all produced according to von Neumann principles, which were created back in the forties. The uniqueness of the multiclet chip, which is planned to be installed in new miners, is in a fundamentally different architecture. He said that the von Neumann architecture performs all actions sequentially, and all the things necessary for parallelism are done artificially. For the multicell architecture, parallelism of calculations is natural, which gives benefits in well-parallelized tasks. You can learn more about the multicell architecture from the video with explanations from Boris Zyryanov and Dmitry Marinichev:
Video. Multiclet for mining: analysis of architecture from Boris Zyryanov and Dmitry Marinichev
Dmitry Marinichev also later clarified that there are currently plans to release processors with no more than 256 cells. But increasing the number of chips themselves on the board will allow you to effectively scale the miner; the architecture allows this. For those interested, Dmitry explained the technical details:
Dmitry Marinichev:
“The main advantage of this microprocessor is that it does not fetch standard commands from memory and implement them, but assigns mnemonics to data and mnemonics to tasks for each cell of the processor. As a result of this, we get not a scalar architecture, and not a hyperpipeline architecture, but simply an architecture that runs on a common bus.”
A question was also asked about working with memory, because this can be a bottleneck when mining Ethereum.
Dmitry Marinichev: “The answer is simple. Our memory is exactly the same as on video cards, in some moments it is even worse. But working with memory is again a command at the clock moment. Our multicell processor does not have this; it takes data from memory when it needs it for work. In one clock cycle, while simultaneously counting other data. Therefore, working with memory does not affect the multicell as much as, for example, the video card, which collects tasks and distributes tasks. In any case, it has artificial parallelism, unlike a multicell.”
At the same time, the multiclet chip can be switched for various forks such as Ethereum, Monero, etc., unlike ASIC miners.
No special knowledge is required to flash or change the operating algorithm, as is the case with FPGAs; programs in C or assembler are sufficient, which is much more understandable for most programmers. Dmitry even predicted the possibility of the emergence of programmers specializing in such miners, because they can be used for a wider range of tasks than mining. Sergey Bobylev emphasized that this is not about mining bitcoins, but only altcoins, and the comparison results are with non-specialized equipment such as video cards. The processor itself already exists and is working. The goal is to simplify it by removing all the blocks that are not needed for mining, increase the number of cells and transfer the production technology from 180 nm to 28 nm.
Sergey also noted that his company produces heating equipment with miners as a heating element, so the efficiency can be even higher due to the absence of the need for cooling and recycling of heat from the miners.
Another problem that Sergey raised is the excessive concentration of mining capacity in China. Moreover, even for the Chinese miners themselves, this poses a threat, because such a concentration of power in one country undermines confidence in the cryptocurrency. The decisions of the Chinese authorities can have a very significant impact on the network right now, since China already has more than 50% of the mining capacity, and the majority of mining equipment manufacturers are concentrated in China. If China is not the only center, then confidence in cryptocurrencies will only grow.
Then the conversation was about the RMC holding, the abbreviation of which has already been deciphered several times in different sources: Russian Miner Coin, Russian Mining Center, and at the press conference it sounded like the Russian Mining Company. We previously wrote about the RMC holding and excursions to their mining farm. The RMC holding already has a 20 MW mining center in Technopolis Moscow, and 2 more similar ones are being built. Miners under the Pantech brand were developed and produced; the next ones planned to be released are Sunrise 16 nm miners. Bitfury chips, and after them miners on multiclet chips. The chips are planned to be made in Taiwan, and the production of boards, assembly and software is planned to be done independently. The token itself will have the same name RMC, which will allow its owners to get next-generation miners and join a closed mining club. Tokens are planned to be issued using colored coin technology and the Mycelium wallet. According to Sergei Bobylev, this wallet has 170 thousand users with a total balance of cryptocurrencies on it of approximately $1.5 billion, so he is calm for the $100 million future ICO. The solution was chosen due to its simplicity and reliability, because... technology has been around for several years, there is no need for smart contracts, which often contain errors, and there is no need to involve third-party platforms. It is enough to download the wallet from the market, and you will already be able to work with the RMC token and receive income to your wallet. One RMC token will be sold at the presale for $4,000, the entry threshold is $250,000. During the ICO, tokens will be priced at $4,100-$4,900. The ICO validity period is from August 28 to September 20. The presale is already underway, everyone who wants to participate in it can come to the office and conclude a legally binding agreement. You can pay in rubles; foreign partners also have the opportunity to enter into transactions within the legal framework of Austria. The ICO will accept Bitcoin and Ether, but Ether will be converted to Bitcoin at the current rate. One token will give its owner the opportunity to receive 1 miner with multiclet chips in 13 months, and until they are produced - to receive income from classic Bitcoin mining and buy Sunrise miners at a discount from September.
The second ICO idea after the multiclet chip is a security chip. This chip will be located directly on the boards and will be responsible for dividing the hashrate. The majority of the hashrate and income will be received by the physical owner of the miner, and he can dispose of it as he pleases. And 20% of the hashrate will go to the RMC pool, from which payments to token owners will be made. Desoldering the chip and changing the board layout so as not to give away this 20% will not be economically feasible, according to the developers. And taking into account the fact that Sunrise miners will have 2 times better ROI than competitors, the loss of 20% of hashrate should not stop buyers. The authors hope that it will be possible to use the benefits of individuals, such as social tariffs for electricity, which are not available to organizations. Thus, it is planned to create a distributed farm that operates throughout the country. The owners themselves are interested in ensuring that their miners are constantly working and not idle, so RMC hopes to get a stable hashrate for their pool. On the economic side, Sergey noted that this ICO will be limited, including the maximum attracted amount of $100 million, because It is not practical to produce a huge number of multiclet miners; internal competition will begin. In total, it is planned to release 25,000 multiclet miners. The income from the ICO is planned, although it is less than that of many other ICOs, which sometimes “shoot”, but it is easy to predict, and clearly higher than the average for a classic business. Investors will also be able to receive dividends from the joint mining club and the possibility of colocating miners in the RadiusHost data center. It is possible to conclude a classic contract for the supply of goods. Since the team consists of developers who have been working in the market for many years, Sergei Bobylev has no doubts about the success of production.
He also compared his miners with the miners of the current leader in the production of such devices - BITMAIN. According to Sergei, Chinese miners are specially manufactured with reduced quality, since it is beneficial for the manufacturer that the miner fails in six months to a year, and buyers go to buy a new one.
Sergey Bobylev: “Therefore, such miners lack all protection; the circuitry is built in such a way that everything is done to the very, very limit. Even such a simple thing as thermal protection is not needed in many models. That is, the cooling fan gets filled with dust and fluff, and the miner has no choice but to burn out.”
Miners produced by RMC are equipped with protection, this makes the miner somewhat more expensive, but increases its reliability.
It is beneficial for RMC to keep miners working as long as possible, since 20% of the hashrate goes to their pool, also known as the joint mining club. The income from this pool is distributed among token holders in proportion to the ownership share. The miners will have a three-year warranty, which is quite a lot for this class of devices. It is understood that physically miners must remain operational for at least 5-7 years. If you look at specific power consumption, Sunrise miners consume quite a lot for 16nm. Bitfury chips. This is explained by the fact that at the beginning of the life cycle, a high hashrate is more important in order to have more time to earn, and the equipment is configured for maximum performance by overclocking. And when the complexity increases significantly, and energy efficiency comes to the fore, then it will be possible to change the specific energy consumption by half, albeit with the loss of part of the hashrate, but remain profitable on the market with the same miners. Chinese miners will not allow this, said Sergei Bobylev. Sunrise s11i miners are scheduled to begin shipping from the end of September 2020, and when purchased with RMC there will be a fixed cost of $1,600 per device with power supply. An example of such a miner was presented at a press conference, but a version with some technical modifications will go into production. According to the presentation slides, Pantech sx6 and Bitmain s9 have a payback period of 12 months, and for Sunrise s11i this period will be 5 months, even taking into account the deduction of 20% hashrate. Sergey noted that the prices are indicated with power supplies and delivery to the Russian Federation.
Sergey Bobylev: “What an interesting synergy here: we sell the miner at cost, while the owner of the miner, if he is an individual, has a low price for electricity. And it’s not difficult for him to share 20% of the hashrate, it’s as if his miner consumes 20% more electricity. That is, everyone shares what they have the cheapest: the investor shares money, we produce the miner at cost, the final owner shares his electricity and time, and he monitors the work of the miners. As a result, we have obtained an ecosystem that will outperform any of the largest Chinese farms in terms of efficiency.”
The team plans to release the multiclet chip in 10 months, and the owner of the tokens, if successful, will be able to choose whether to stay mining on Sunrise ASIC miners, or release and receive a multiclet miner in three months. Three months are counted from the receipt of a successfully working sample of the chip, during which time the production of finished miners will be established. In total, according to the plan, physical devices on multicell chips should appear in 13 months. In this way, RMC insures consumers in the event that something goes wrong with the multicell; token holders will receive income from Sunrise miners. You can consider this diagram graphically in more detail:
Author: Ivan Tikhonov
Attached documents
RMC_v.10_RU.pdf
Source: Bits Media
Link: https://bits.media/rmc-holding-multiclet-mainer-ico-generator/