A pattern to express data parallelism. It separates hardware related behavior from application behavior, then uses automated infrastructure to specialize a single source to multiple hardware platforms. Specializer exists for multi-core; coming soon (as of March 2009) are ones for GPGPU and large distr memory machines. Sample code includes H264 deblocking written in C and Matrix Multiply written in Java.
An extensible language for parallel programming with a visual representation. Originally designed to allow theoretical physicists define custom equation-manipulation transforms that are invoked symbolically, then automatically run data-sets through the resulting equation. Uses on-the-fly specialization to generate efficient executables of on-the-fly modified code. Embeds properties of operations and operands into the syntax graph to enable symbolic transformations of source by specializers. Hence specialization may perform transforms that alter the code to better fit particular parallel hardware.
The operating system used by EQNLang that is also at the heart of the CodeTime platform. CTOS is based on the abstraction that everything is a processor. Data exists only inside a processor and processors have pure names which are free from implied information. Hence an instance of the operating system is itself a processor, and files are processors that live inside it. Each OS instance is persistent, maintaining state across power cycles. Programs are data that are used to create one or more processors inside an OS instance.
A platform that includes an OS interface, languages, and development tools. It is intended to be stewarded by a non-profit entity that organizes development, publishes standards, and maintains compatibility with the standards through testing and offering services to help implementors.
A treatise describing the mental framework behind the CodeTime OS and EQNLang.
A low power high throughput multi-threaded architecture that may deliver as much as 200 times lower energy per instruction completed, while delivering 10 times higher throughput than an out-of-order core of the same die size in the same process. It makes extensive use of banking and queues to achieve the effect of pipelineing and to decouple levels of the memory hierarchy. Its memory system uses tens of narrow main memory banks to sustain BW. A loosely coupled form of VLIW increases single-software-thread execution rate while filling empty VLIW slots with work from other threads.
A method of modelling out of order pipelines as a series of non-traditional queues. Each queue has an invariant function that determines how its state updates. Any system of queues has a recurrence relation that is cycle-accurate. A semi-analytic technique is used to approximate the recurrence relation, to estimate the throughput of a given set of hardware choices on a given input program. Because the state-update function is invariant, the model does not require "tuning" such as typical processor models do. Each hardware choice has a simple, fixed, translation into the equivalent state-update functions.
A massively parallel processor designed in 1995 and briefly funded as a startup to perform geometry transforms for 3DFx chips. It is SIMD circuit-wise, but runs SPMD code with MIMD hardware behavior. The on-chip network introduces the "H-Bridge" topology for mesh-like connectivity with only 2 connections per PE. The programming model uses cumulative conditions to enable each instruction. The global effect of this is to make the Kernel a vector instruction with a rich mask, making vectorized code run efficiently, and enabling time-tested compiler techniques for generating code. The PEs are bus-centric rather than register-centric, and provide an explicit form of multiple-contexts to overlap latency.