Neuromorphic packet-based architecture of neural CPU based on traditional CPU nucleus (without memristors)
Slide 2. Major constraints in the development of neural networks imposes that for their emulation uses the traditional von Neumann architecture with separation of executable code and data to be processed.
Slide 3. IBM Experts estimate that the transition to specialized neural CPU architecture will gain in speed of several tens of thousands of times. But IBM is using memristor architecture, which is still in R&D phase.
Slide 4. The limitation of von Neumann architecture is the bottleneck in data transfer between the CPU and memory - when large volumes of calculations required a huge bandwidth on the bus. Natural neural networks neurons are built on the principle of distribution, where neurons appear as small computer systems, and the whole network - is theoretically fully meshed, any neuron, in principle, can be contacted with any other neuron.
Slide 5. The metaphor of a packet. Imagine that "neurons" - as a computing system that emulate small neural network (10-20 knots), which send each other "signals" in the form of data packets via a common data bus. Each 'neuron' can send a packet to any other "neuron". Similarly, these packages can be "wrapped", and encapsulated in a virtual environment, such as Internet Protocol (IP), transferred to other neural networks.
Slide 6. This "neuron" can be emulated simple microcontrollers nuclei (without memristor and optionally even with GPU), connected to a common bus, which has (for example, for compatibility with IP) - address lines 32 "FROM" address 32 "To" line, 8 or 16-bit data bus. Overhead, is also added line "training mode" (Teach mode forward line), the mode back propagation (back propagation line) and the line "Transfer / busy» (Transfer / Busy line).
Slide 7. MCUs are connected to a common bus. Each initially assigned 32-bit address of the form # ABCDEF98. This creates a physical addressable information sharing environment in which packet distribution is limited only by the physical properties (the light velocity and frequency constraints of dielectrics).
Mode 1 Network Programming.
Slide 8. Teaching mode is set by the external control. Through the bus is loaded program code emulation neural network with the FROM address: # 00000000. Cores controller programmed so that the instruction code of the address mode is perceived as a software.
Slide 9. If the address «TO:» is a broadcast, (TO: #FFFFFFFF), the code is loaded into all microcontrollers. You can use network masks type # AB.CD.EE.00, which is similar to an IP mask that will load only a portion of the nuclei whose address matches the given mask. This allows different controllers to load Neural Networks emulators (CNN, RNN / LSTM etc.), thereby creating layers of different types of neurons, thus creating a flexible network architecture.
2. Training Mode "growing axons"
Slide 10. - This mode emulates the establishment of new connections between neurons. "Neuron" system, as usual, are initialized with random values. The input system serves the training data, and the "neurons" are trying to "grow" to the axons of other neurons.
Slide 11. The concrete core - "neuron," "listens" on the bus signals from other packages "neurons" as dendrites receive signals from the axons of neurons. Neural network algorithm in the kernel information accumulates and stores any packets with any address (signals from which "neurons") came.
Slide 12. When a particular "neuron" obtained from "switched" neuron data, it do:
1. Pending the release of the transmission line Transfer / Busy and takes it,
2. Sends from its broadcast packet address (all "neurons") with the calculated value (or simply "1"),
3. Releases Transfer / Busy line
4. Other neurons receive these packets and storing the source address and the value and begin to process the signal of his neural network (see. Prev. Slide)
Thus, the "neurons" are trying to establish a connection, "all with all" - "sprout axons" to each other. To optimize the work area to establish connections can be restricted "layers", "domains", applying a mask on the addressing of packets.
3. The back-propagation mode errors.
Slide 13. When the direct teaching mode is switched feedback error mode (Back propagation line is ON). In this mode, the output value of the entire system serves the training sample.
Slide 14. Backpropagation implemented as a comparison of their data on the output received from the teaching of values. Accordingly, the neural network in a particular weight signals kernel counts packets received from other "neurons". Addresses of "neurons" with the greatest mistake "crossed out" are forgotten.
Slide 15. Over "neurons" (from which came the packets signals with the most correct values) sent packages with a corrective value.
Slide 16. Accordingly, the "neurons" that took on the address packet with a reverse fault, repeat the previous step 2 and adjust its own set of "reliable sources".
Thus, there is a network of training and communication are built similarly to the axons, dendrites in natural neural networks - packages of "neurons" that best predict the value of the training sample is obtained greater weight and less likely prediction - smaller. However, if the neuron's experience a "deficit" of information (it is at the inlet is too small to trigger packets), it may "listen" mode (slide 10) to find new sources to provide network flexibility.
4. Operating mode
Slide 17. When working on the network input mode serves the real values and the network itself, classifies the incoming data.
Slide 18. Each "neuron" receives packets from the "neurons", with the largest weight obtained by learning processing and its neural network.
Slide 19. When triggered, "neuron" expects bus release, captures it, sends from his address broadcast packet (or a certain segment of the network, if it is limited to the network mask).
Thus, after the processing operation of "neurons" of the network region are formed of data packets that can be processed or sent for processing to another part of the neural network.
Scale and Economy
Slide 20 Data flows in the cerebral cortex.
Actually, a number of such units is enough to emulate the work of the human cerebral cortex? If we calculate, the data streams in the cortex of course great, but it's workability modern technologies. We can say that in one lobe of the cortex are transmitted - gigiabity data sekudndu.
Slide 21 How so?
Throughout the cortex ~ 14 billion neurons, then in the same hemisphere -. 7 billion then we can say that in each lobe of the cortex of the order of 1 billion neurons.. If we take that the characteristic frequency of 7-14 Hz brain activity, while "work" is not more than 10% of the neurons (actually 5-7%), the data flow within one lobe of the cortex will be about 1 gigabit per second, that is, the speed of office network.
Slide 22. That is, for the processing of such data stream we can use a variety of relatively simple cores, avoiding "bottleneck von Neumann" coupled medium speed, by modern standards, bus.
Slide 23 - for comparison - PCI express bus speed - Up to 16 Gbit / s
Slide 24 - The development of a model and software architecture of the system will require 1-2 years and 2-3 million dollars for a group of 20 developers. Start of production and manufacturing systems, which emulates the brain cortex will cost 400-600 million dollars.
Slide 25. Possible applications:
neyrokarty for computers, "neyromozgi" for robots, etc.
Slide 26 Strong artificial intelligence
Using 32 bit addressing, we can link up to 4 billion. Nuclei, as if each core to emulate 10 neurons, then, gathering up to 1000 cores on a chip, we may well raise the cortex of 14 billion neurons of 14 000 computers on 350 racks
Slide 27.
For transmission between network segments is sufficient to maintain the flow of the order of 0.4 Gb / s speed that is normal office switch for $ 200.
Slide 28 This entire system will consume up to 400 MW in the training mode and 40 MW in operation.
Slide 29 The cost of this project will amount to 400-600 million dollars
Slide 30 For a full disclosure of the potential of neural networks is not enough to emulate the traditional architecture of von Neumann, needs its own processor architecture.