Table of contents

Programming a 144-computer chip to minimize power

youtube

%3 cluster_3ea11973_452f_45e2_aab5_dac1c975f7f3 Programming a 144-computer chip to minimize power cluster_447ab8ea_404d_4bf8_a793_5fbd5b6c5fcf Programming cluster_4944cfeb_6320_46b3_a8c5_3a5a7ee7804e Optimum programming _2b64bc5c_2909_40db_b455_b0c6c9140be7 Chip/computer design _59fc4e7b_58c1_4cb4_b958_c89f08c77f71 Fast _8efe4a99_2c21_4151_87eb_d700d4e6a6ce Compact _c0ed9690_7a98_4b8e_8f23_60288e4e0e5d Low energy _a2deaf33_ca2b_485a_8edc_8f3cc6d00061 Color-forth __0:cluster_447ab8ea_404d_4bf8_a793_5fbd5b6c5fcf->_a2deaf33_ca2b_485a_8edc_8f3cc6d00061

Chip/computer design

  • About

    the GA144 chips

    • A matrix of 144 "computers" on a ~5mm chip

      • The whole chip

        • 96gips, 550milliwatts, 15µw idle

      • Each one of the 144 computers

        • 1ns/instruction -> 666mips

        • 4milliwatts of power running -> 7picojoules/instruction

          • 100nanowatts idle

  • Relevant, recent, design concerms

    Interested in reducing power consumption both for portable devices and for server farms

  • Computer parameters

    • 18 bits/word

    • 4 instructions/word

    • A 32 instruction (5-bit) ISA (instruction set architecture)

    • 64 word RAM, 5nano second access

    • 2 x 9-deep push-down stacks

      • t

        top of stack register

      • s

        parameter stack

      • r

        return stack

    • 3 address registers a, b (registers) p (program counter)

    • Can communicate with 4 "neighbours" (up, down, left, right)

Programming

Optimum programming

Fast

  • Minimize instructions executed

  • Use all slots on word

  • Fetches @ and stores ! should be used early in word so address bus can be used to prefetch the next instruction word

  • Position code to allow jump in slot 2

    • Slot 2 allows jumps inside a 8-word page

  • Use of a better algorithm

Compact

  • minimize instructions

  • avoid literals

    • dup or (duplicate, exclusive or) instead of 0

    • A literal is a fetch from the address and the p register

  • prefer unext to next

  • initialize registers, stacks from port

  • better algorithm

Low energy

  • measure with µa meter (see)

  • low duty cycle

    • wait on neighbour, pin 0 power

    • avoid timing loops

  • zero stacks and drop garbage

  • position loop to minimize address bit change

  • avoid literals

  • better algorithm