Attributed to the exponential growth in the number of active components of integrated circuits, the capability and the cost of integrated circuits could be reduced over many years. At the same time, the challenges grew as well – key dimensions being the reliability of an integrated circuit and its energy consumption. Modern ICs contain more transistors than human beings on this planet. Mechanisms such as redundancy have been put in place to compensate for fall-out. Further scaling of dimensions and the reduction of energy consumed per operation will increase interferences by external factors (such as ionizing particles) or by internal effects (such as electromagnetic coupling), variations in the properties of components and lower the noise margin with reduced supply voltage (for reliability reasons and energy savings). Employing well known concepts to mitigate failures on architecture and circuit level leads to additional cost in area and energy, which will overcompensate any savings achieved with scaling. With physical modelling of the reliability and the conception of statistical error models on the physical implementation layers, we target the quantitative assessment of reliability on system level as well as the specification of counter measures. This includes steps of selective replacement of individual components such as for example the standard cells, which allows to mitigate selected issues on a specific layer.

Moreover, we work on concepts to tolerate random and/or deterministic errors. Memory design presents a classical example for error protection. Fabricated in aggressively miniaturized technology nodes for high integration density, on-chip memories suffer from process variations more profoundly than logic devices. Traditional error protection schemes, such as ECC, could introduce substantial costs especially on memory storage capacity, and also show limits on energy efficiency when coping with the relatively high bit-error rate at low-voltage operation. Memory design for high density and low leakage power at specified reliability is a topic of continuous research on all levels from architecture over circuit down to device level. A classic application example is the digital signal processing at the end of the physical transmission channel that already provides error correction measures. Furthermore, modern algorithms in the domain of machine learning (such as DNN) offer new dimension as the statistical nature of the results become relevant – as opposed to exactness of single results as is required in banking or similar.