

# **Application Note**



# Zynq Platform with UTIA EdkDSP Accelerator and Toshiba Sensor Video Processing in HW for TE0720-03-2IF SoM on TE0701-05 Carrier

Jiří Kadlec, Zdeněk Pohl, Lukáš Kohout kadlec@utia.cas.cz, xpohl@utia.cas.cz, kohoutl@utia.cas.cz phone: +420 2 6605 2216 UTIA AV CR, v.v.i.

#### Revision history:

| Rev. | Date       | Author      | Description                              |
|------|------------|-------------|------------------------------------------|
| 1    | 09.07.2016 | Jiří Kadlec | Evaluation package for Xilinx SDK 2015.4 |
|      |            |             |                                          |
|      |            |             |                                          |

#### Acknowledgements:

This work has been partially supported by ENIAC JU project THINGS2DO No. 621221 and by related MEYS (CZ NFA) project 7H14007.

# **Table of contents**

| Zynq Platform with UTIA EdkDSP Accelerator with Toshiba Sensor Video Processing in TE0720-03-2IF SoM on TE0701-05 Carrier |    |
|---------------------------------------------------------------------------------------------------------------------------|----|
| 1. Summary                                                                                                                | 4  |
| 1.1 Key features                                                                                                          | 4  |
| 1.2 Project sh01: EdkDSP accelerator with edge detection in single HLS accelerator                                        | 6  |
| 1.3 Project sh02: EdkDSP accelerator with edge detection in two HLS accelerators                                          | 7  |
| 1.4 Project sh03: EdkDSP accelerator with edge detection in three HLS accelerators                                        | 8  |
| 1.5 Project md01: EdkDSP accelerator with motion detection in HLS accelerators                                            | 12 |
| 2. Installation of evaluation package                                                                                     | 14 |
| 2.1 Import of SW projects in Xilinx SDK 2015.4                                                                            | 14 |
| 2.2 HW setup                                                                                                              | 18 |
| 2.3 Test demos                                                                                                            | 19 |
| 2.4 EdkDSP C compiler                                                                                                     | 28 |
| 3. Conclusions                                                                                                            | 30 |
| 4. References                                                                                                             | 31 |
| 5. Evaluation license                                                                                                     | 32 |
| Disclaimer                                                                                                                | 33 |



© 2016 ÚTIA AV ČR, v.v.i.

All disclosure and/or reproduction rights reserved

# **Table of figures**

| Figure 1: Toshiba Full HD evaluation platform HW                                              | 5  |
|-----------------------------------------------------------------------------------------------|----|
| Figure 2: Project sh01 - Edge detection with single HW accelerator                            | 6  |
| Figure 3: Project sh01 - Energy per frame reduction and used HW resources                     | 6  |
| Figure 4: Project sh02 - Edge detection with two HW accelerators                              |    |
| Figure 5: Project sh02 - Energy per frame reduction and used HW resources                     |    |
| Figure 6: Project sh03 - Edge detection with three HW accelerators                            |    |
| Figure 7: Project sh03 - Energy per frame reduction and used HW resources                     | 8  |
| Figure 8: Project sh01 – EdkDSP and edge detection in one Arm SW function                     | 9  |
| Figure 9: Project sh01 – EdkDSP and edge detection in one HLS accelerator                     | 9  |
| Figure 10: Project sh02 – EdkDSP and edge detection with two Arm SW functions                 | 10 |
| Figure 11: Project sh02 – EdkDSP and edge detection with two HLS accelerators                 |    |
| Figure 12: Project sh03 – EdkDSP and edge detection with three Arm SW functions               |    |
| Figure 13: Project sh03 – EdkDSP and edge detection with three HLS accelerators               |    |
| Figure 14: Project md01 - Motion detection with single HW accelerator data path               | 12 |
| Figure 15: Project md01 - Energy per frame reduction and HW resources                         |    |
| Figure 16: Project md01 – EdkDSP and motion detection in Arm SW functions                     |    |
| Figure 17: Project md01 – EdkDSP and motion detection with single HLS accelerator path        |    |
| Figure 18: Select the SDK Workspace                                                           |    |
| Figure 19: Import Existing Projects into Workspace                                            |    |
| Figure 20: Select "Copy projects into workspace" and finish the import of all projects        | 16 |
| Figure 21: All projects are compiled in debug mode                                            | 17 |
| Figure 22: Serial cables USB based for Arm and Jtag. RS232with Pmod for Microblaze            |    |
| Figure 23: Serial console. Reset board and stop autoboot by any key                           | 20 |
| Figure 24: Download bitstream to the PL part of Zynq                                          |    |
| Figure 25: Select demo application for debug                                                  | 21 |
| Figure 26: Demo app is booted to Arm and the debugger is waiting on the first executable line |    |
| Figure 27: Arm is waiting on HW Mutex for the MicroBlaze start                                |    |
| Figure 28: Select the Microblaze application for debug                                        |    |
| Figure 29: MicroBlaze application is loaded and debugger stops on the first instruction       |    |
| Figure 30: Arm is running. It indicates the number of frames per second                       |    |
| Figure 31: Microblaze is running. It indicates MFLOPs                                         | 26 |
| Figure 32: Accelerated edge detection Toshiba sensor and Zynq with EdkDSP                     |    |
| Figure 33: Edge detection (Sobel filter) output on Full HD monitor                            |    |



### 1. Summary

#### 1.1 Key features

This application note describes HW platform performing integration of the runtime reprogrammable EdkDSP floating point accelerator with edge detection and motion detection video processing for Toshiba Full HD colour video sensor with fixed resolution (1920x1080p60).

- The Xilinx Zynq device xc7z020-2I has two Arm Cortex A9 processors (operating at 666 MHz), memory controller with two levels of caches and also with high performance DDR3 memory access ports. It provides also the programmable logic area used for:
  - UTIA EdkDSP (8xSIMD) floating point processor (operating at 120 MHz) connected to Xilinx MicroBlaze 32bit processor (operating at 100 MHz).
  - o Input chain of video processing IPs is connecting Full HD Toshiba video sensor to input video frame buffers. The input video DMA (VDMA) controller is operating at 150 MHz.
  - Area reserved for HLS HW accelerators and data movers defined in Xilinx SDSoC 2015.4 environment. These accelerators can be controlled from Arm Cortex A9 C programs compiled in SDK 2015.4 C projects. These HLS accelerators are operating at 150 MHz.
  - Chain of output video processing IPs is connecting output frame buffers to the Full HD display connected by HDMI cable. The output VDMA controller is operating at 150 MHz.
- UTIA EdkDSP is 8xSIMD floating point accelerator reprogrammable in runtime by change of firmware of build in PicoBlaze6 8bit controller. This is serving as a scheduler of vector operations performed in the EdkDSP is 8xSIMD floating point processor data paths. This scheduler is programmed by simple C programs compiled by simple C compiler and assembler, respecting the minimal resources of the PicoBlaze6 controller.
- UTIA EdkDSP is 8xSIMD floating point accelerator is controlled by the 32bit MicroBlaze processor. The
  MicroBlaze processor is executing C programs from the DDR3 memory. It executes complex C
  algorithms. Algorithms can benefit from execution of selected operations effectively on the EdkDSP
  coprocessor connected to the MicroBlaze by local dual ported memories. MicroBlaze C programs can
  take benefit of overlap of data communication from DDR3 to the EdkDSP dual-ported memories with
  parallel computations in the EdkDSP accelerator.
- Platform includes also the video processing chain of IPs controlled by Arm Cortex A9 processor.
- Arm Cortex A9 processor of Xilinx Zynq is performing initialisation and synchronisation of the video processing chain. Program and the FPGA image is downloaded to the board from the Xilinx SDK 2015.4 via USB JTAG to the 1GB DDR3 located on the Zynq system on module. System can be also started directly from the SD card. Arm processor initiates the IP cores in the programmable logic (PL) part of the Zynq. It also initiates the Toshiba video sensor and the video output to the Full HD monitor with fixed 1920x1080p60 resolution and standard Full HD pixel clock 148.5 MHz.





Figure 1: Toshiba Full HD evaluation platform HW.

Details of the Full HD video processing video chain:

- Raw video data are provided by the Toshiba video sensor.
- Data are processed into the YCrCb 16 bit per pixel format and stored by Video DMA (VDMA) to input video frame buffers (VFBs) defined in the DDR3.
- HW DMA controller(s) send data from/to the VFBs to the processing accelerators. Clock is 150 MHz.

Projects described in next section are summarising the energy per frame measured on the platform for different accelerated image processing algorithms as defined by individual C projects in these main configurations:

- 1. MicroBlaze with EdkDSP coprocessor is computing Floating point FIR filter (in parallel to the dedicated video processing accelerator chain).
- 2. MicroBlaze with EdkDSP coprocessor is computing Floating point LMS adaptive filter (in parallel to the dedicated video processing accelerator chain).
- 3. MicroBlaze is computing in SW (only with its Floating point unit) FIR or LMS filter (in parallel to the dedicated video processing accelerator chain) but EdkDSP accelerator is not used.
- 4. MicroBlaze and EdkDSP is not present in the PL logic and only the dedicated video processing accelerator chain is processing the Full HD video from the Toshiba sensor.

SW figures indicate the energy/pixel consumed by the complete system in case of computation in Arm. C/C++ code was compiled with -O3 optimisation (but without NEON) in the SDSoC 2015.4 environment. No HLS accelerators present.

The evaluation designs with HLS accelerators have been created from these C/C++ functions in SDSoC 2015.4.



#### 1.2 Project sh01: EdkDSP accelerator with edge detection in single HLS accelerator

Zyng module: TE0720-02-2IF (Trenz) Carrier board: TE701-05 (Trenz) FMC Interface: BD-FMC-HDMI-CAM-G (Avnet) FMC card input: Full HD Toshiba video sensor (color 1920x1080p60) FMC card output: Full HD HDMI output to display (color 1920x1080p60) Accelerator: 8xSIMD EdkDSP (120 MHz) with MicroBlaze (100 MHz) Accelerator: Edge detection single data path (150 MHz) Platform: C:\S\t20i2tm1\hio Demos(SDK2015.4): C:\VM07\t20i2tm1\sh01\_rows\_resize\_25\_to\_100.c C:\VM07\t20i2tm1\sh01\_rows\_fixed\_100.c Design by UTIA in Xilinx SDSoC 2015.4. Date: 2016\_05\_27 DDR3 RAM (3/4) ARM code DDR3 A9 & DDR3 SDK C projects for: RAM (1/4) - ARM Cortex A9 MB code - MicroBlaze VFB **VFB** sobel\_filter\_htile(\*in,\*out,tilerows) PB6 - PicoBlaze (EdkDSP) ARM, **EdkDSP** EdkDSP C compiler code GP1 HP0 HP2 HP0 HP1 GP0 sobel\_filter\_htile\_0 Toshiba t20i2tm1\hio platform **HDMI** Sensor 8x DP 8x DP Micro 8x DP Output Input BRAM BRAM Blaze **BRAM** IPs **V**4 Pico EdkDSP Floating point 75 MHz (150 MHz) Blaze6 8x SIMD data paths 100 MHz 120 MHz

Figure 2: Project sh01 - Edge detection with single HW accelerator

Energy per pixel (nJ/p = nano Joule/pixel)Reduced: te0701-05 te0720-2i: FPS EdkDSP FIR SW: 515.93 nJ/p HW: 94.73 nJ/p **5.44 x** 7,56 SW EdkDSP LMS SW: 514.40 nJ/p HW: 94.46 nJ/p **5.44 x** 41,6 HW Filter by MB SW: 505.22 nJ/p HW: 92,79 nJ/p **5.44 x** 20 60 Without MB SW: 479.19 nJ/p HW: 88.34 nJ/p **5.42 x** 40 te0701-05 te0720-2i: Slices [%] te0701-05 te0720-2i: BRAMs [%] 66,64 51.79 SW/ SW/ 83,54 59,64 HW HW 60 100 20 40 60 80 100 20 40 80 te0701-05 te0720-2i: LUTs [%] te0701-05 te0720-2i: FFs [%] 45,46 26,23 ■ SW SW \$5,55 34,69 HW HW 20 40 60 80 100 20 40 60 80 100

Figure 3: Project sh01 - Energy per frame reduction and used HW resources.



ŪTĬA

Akademie věd České republiky

Ústav teorie informace a automatizace AV ČR, v.v.i.

#### 1.3 Project sh02: EdkDSP accelerator with edge detection in two HLS accelerators

Zyng module: TE0720-02-2IF (Trenz) Carrier board: TE701-05 (Trenz) FMC Interface: BD-FMC-HDMI-CAM-G (Avnet) FMC card input: Full HD Toshiba video sensor (color 1920x1080p60) Full HD HDMI output to display (color 1920x1080p60) FMC card output: Accelerator: 8xSIMD EdkDSP (120 MHz) with MicroBlaze (100 MHz) Edge detection two data paths (150 MHz) Accelerator: Platform: C:\S\t20i2tm1\hio Demos(SDK2015.4): C:\VM07\t20i2tm1\sh02\_rows\_resize\_25\_to\_100.c C:\VM07\t20i2tm1\sh02\_rows\_fixed\_100.c Design by UTIA in Xilinx SDSoC 2015.4. Date: 2016\_05\_27 DDR3 RAM (3/4) ARM code DDR3 A9 & DDR3 SDK C projects for: RAM (1/4) - ARM Cortex A9 MB code sobel\_filter\_htile1(\*in,\*out,tilerows) - MicroBlaze VFB VFB PB6 sobel filter htile2(\*in,\*out,tilerows) - PicoBlaze (EdkDSP) ARM, **EdkDSP** EdkDSP C compiler code HP0 HP2 HP2 HP0 HP1 GP0 sobel\_filter\_htile1\_0 Toshiba t20i2tm1\hio platform **HDMI** Sensor 8x DP 8x DP Micro 8x DP Output Input BRAM BRAM Blaze **BRAM** sobel\_filter\_htile2\_0 **IPs V**4 Pico EdkDSP Floating point 75 MHz (150 MHz) Blaze6 8x SIMD data paths 100 MHz 120 MHz

Figure 4: Project sh02 - Edge detection with two HW accelerators

Energy per pixel (nJ/p = nano Joule/pixel) Reduced: EdkDSP FIR SW: 518.15 nJ/p HW: 67.13 nJ/p **7.72 x** EdkDSP LMS SW: 516.62 nJ/p HW: 66.94 nJ/p **7.72 x** Filter by MB SW: 507.42 nJ/p HW: 65,78 nJ/p **7.72 x** Without MB SW: 479.83 nJ/p HW: 62.40 nJ/p **7.69 x** 





Figure 5: Project sh02 - Energy per frame reduction and used HW resources.



#### 1.4 Project sh03: EdkDSP accelerator with edge detection in three HLS accelerators

Zyng module: TE0720-02-2IF (Trenz) Carrier board: TE701-05 (Trenz) FMC Interface: BD-FMC-HDMI-CAM-G (Avnet) FMC card input: Full HD Toshiba video sensor (color 1920x1080p60) Full HD HDMI output to display (color 1920x1080p60) FMC card output: Accelerator: 8xSIMD EdkDSP (120 MHz) with MicroBlaze (100 MHz) Edge detection three data paths (150 MHz) Accelerator: Platform: C:\S\t20i2tm1\hio Demos(SDK2015.4): C:\VM07\t20i2tm1\sh03 rows resize 25 to 100.c C:\VM07\t20i2tm1\sh03\_rows\_fixed\_100.c Design by UTIA in Xilinx SDSoC 2015.4. Date: 2016\_05\_27 DDR3 RAM (3/4) ARM code DDR3 A9 & DDR3 SDK C projects for: RAM (1/4) - ARM Cortex A9 sobel\_filter\_htile1(\*in,\*out,tilerows) MB code - MicroBlaze VFB sobel\_filter\_htile2(\*in,\*out,tilerows) VFB PB6 - PicoBlaze (EdkDSP) ARM, sobel\_filter\_htile3(\*in,\*out,tilerows) **EdkDSP** EdkDSP C compiler code HP0 HP2 HP2 HP0 HP1 GP0 sobel\_filter\_htile1\_0 Toshiba t20i2tm1\hio platform **HDMI** Sensor 8x DP 8x DP Micro 8x DP Output Input BRAM BRAM Blaze **BRAM IPs** sobel\_filter\_htile2\_0 **V**4 Pico EdkDSP Floating point 75 MHz (150 MHz) Blaze6 8x SIMD data paths sobel filter htile3 0 100 MHz 120 MHz

Figure 6: Project sh03 - Edge detection with three HW accelerators

Energy per pixel (nJ/p = nano Joule/pixel) Reduced: EdkDSP FIR SW: 518.68 nJ/p HW: 67.13 nJ/p **7.72 x** EdkDSP LMS SW: 517.14 nJ/p HW: 66.94 nJ/p **7.72 x** Filter by MB SW: 507.90 nJ/p HW: 65,78 nJ/p **7.72 x** Without MB SW: 481.74 nJ/p HW: 63.08 nJ/p **7.63 x** 





Figure 7: Project sh03 - Energy per frame reduction and used HW resources.

8/33





Figure 8: Project sh01 – EdkDSP and edge detection in one Arm SW function



Figure 9: Project sh01 - EdkDSP and edge detection in one HLS accelerator





Figure 10: Project sh02 – EdkDSP and edge detection with two Arm SW functions



Figure 11: Project sh02 – EdkDSP and edge detection with two HLS accelerators





Figure 12: Project sh03 – EdkDSP and edge detection with three Arm SW functions



Figure 13: Project sh03 – EdkDSP and edge detection with three HLS accelerators



#### 1.5 Project md01: EdkDSP accelerator with motion detection in HLS accelerators



Figure 14: Project md01 - Motion detection with single HW accelerator data path



Figure 15: Project md01 - Energy per frame reduction and HW resources.

12/33





Figure 16: Project md01 – EdkDSP and motion detection in Arm SW functions



Figure 17: Project md01 – EdkDSP and motion detection with single HLS accelerator path



## 2. Installation of evaluation package

#### 2.1 Import of SW projects in Xilinx SDK 2015.4

Unzip the evaluation package to directory of your choice. The directory C:\VM\_07 will be used in this application note. C:\VM\_07\t20i2tm1\_V54\_IMPORT

Create empty directory for Xilinx SDK workspace.

C:\VM\_07\t20i2tm1

Start Xilinx SDK 2015.4 and select the directory for the SDK 2015.4 workspace. See Figure 18. Select C:\VM\_07\t20i2tm1



Figure 18: Select the SDK Workspace

HW and SW projects can be imported into SDK now. Select:

File -> Import -> General -> Existing Projects into Workspace Click on Next button. See Figure 19.





Figure 19: Import Existing Projects into Workspace

Type directory with projects to be imported. See Figure 20.

#### C:\VM\_07\t20i2tm1\_V54\_IMPORT

Set the "Copy projects into workspace" check box.

Click on Finish button. See Figure 20.

Process of compilation will start automatically. This first compilation of all SDK SW projects can take several minutes to finish. It should finish without errors.





Figure 20: Select "Copy projects into workspace" and finish the import of all projects.





Figure 21: All projects are compiled in debug mode.

SDK 2015.4 compiles SW of all imported demos in debug mode.



#### 2.2 HW setup

HW setup is using commercially accessible components [1], [2], [3], [4], [5], [6]:

| TE0720-03-2IF; Part: XC7Z020-2CLG484I; 1 GByte DDR; Industrial Grade; | Price: €269.00 [1]  |
|-----------------------------------------------------------------------|---------------------|
| Heatsink for TE0720, spring-loaded embedded;                          | Price: €19.00 [2]   |
| TE0701-05 Carrier Board for Trenz Electronic 7 Series;                | Price: €249.00 [3]  |
| AES-FMC-HDMI-CAM-G FMC card with HDMI I/O and CAM interface           | Price: \$250.00 [4] |
| Toshiba Industrial 1080P60 Camera Module                              | Price \$229.00 [5]  |
| PmodRS232: Serial converter & interface                               | Price €13.54 [6]    |

#### HW Options:

**TE0720-03-2IF** can be replaced by **TE0720-02-2IF** (Same Price, both boards from Trenz) [1]. **TE0701-05** can be replaced by **TE0701-04** (Same Price, both boards from Trenz) [3].

Trenz TE0701-04 or TE0701-05 carriers require modifications to run the FMC Imageon carrier AES-FMC-HDMI-CAM-G with Zynq TE0720-03-2IF system on module. The modification is related to the swapped polarity of the differential clock signal for the FMC board. Evaluation HW systems with carriers TE0701-04 or TE0701-05 provided by UTIA have these modifications already done.

UTIA can implement these HW modifications for the original Trenz TE0701-04 and TE0701-05 carriers. This requires written e-mail request to <a href="mailto:kadlec@utia.cas.cz">kadlec@utia.cas.cz</a>. Request will be first confirmed by UTIA. The interested party has to cover the cost of shipment of the carrier board to/from UTIA. Modification can be done in 5 working days and it is offered free of charge.



#### 2.3 Test demos

To test demos follow these steps:

- Insert the Toshiba Full HD video sensor to the connector on the Imageon board.
- Connect Full HD (or DVI) monitor by HDMI cable to the HDMI OUT on the Imageon FMC card.
- Switch the monitor ON.
- Connect the carrier board by USB-to-microUSB cable to PC to support JTAG serial link and the standard serial terminal.
- Connect the PmodRS232 Serial converter & interface module to the carrier board as indicated in Fig. xx. Connect the RS232 cable to COM1 serial terminal of your PC. This serial line will support serial terminal for the Microblaze processor.



Figure 22: Serial cables USB based for Arm and Jtag. RS232with Pmod for Microblaze.

- Connect power supply (DC 12V).
- Open and configure the standard serial terminal client (PuTTY or similar) on PC for the Arm serial terminal (USB emulated).
  - (Speed: 115200 baud; Data bits: 8; Stop bits: 1; Parity: None; Flow control: None).
- Open and configure the standard serial terminal client (PuTTY or similar) on PC for MicroBlaze It is COM1. (Speed: 115200 baud; Data bits: 8; Stop bits: 1; Parity: None; Flow control: None).
- Reset the board. Board will start first stage boot loader from internal flash as set up by Trenz. It is writing messages to the serial terminal. On request, "Hit any key to stop autoboot" type any key to stop the auto-boot of Linux.
- If you need to switch-off the power, close first the serial terminal on the PC. This will help to avoid problems





Figure 23: Serial console. Reset board and stop autoboot by any key.

20/33

Download bitstream to the board. Demo **sh01\_rows\_fixed\_100** will be used as an example. The **bitstream.bit** for demo **sh01** is located in the directory:

C:\VM\_07\t20i2tm1\sh01\_hw\_platform\_0





Figure 24: Download bitstream to the PL part of Zynq.

Select Program to download the bitstream to the PL part of Zynq via the USB cable in JTAG mode.



Figure 25: Select demo application for debug.





Figure 26: Demo app is booted to Arm and the debugger is waiting on the first executable line .

```
U-Boot 2013.01-00011-gc260602-dirty (Apr 11 2014 - 06:18:54)

I2C: ready
DRAM: 256 MiB
WARNING: Caches not enabled
MMC: zynq_sdhci: 0
Using default environment

In: serial
Out: serial
Err: serial
Net: Gem.e000b000
Hit any key to stop autoboot: 0
zynq-uboot> ARMCPU0: place 0xb8000000 at start of MB0 vectors
```

Figure 27: Arm is waiting on HW Mutex for the MicroBlaze start.

22/33



http://zs.utia.cas.cz

Ústav teorie informace a automatizace AV ČR, v.v.i.



Figure 28: Select the Microblaze application for debug.

We are downloading program for Microblaze while Arm is already running.

- Unselect "Run ps7\_init"
- Unselect "Run ps7\_post\_config"
- Select No reset

Click on "Apply" button.

Click on "Debug" to download the **sh01\_edkdsp\_fp12\_1x8\_all.elf** to DDR3 as program for MicroBlaze.

The debugger will download this code by JTAG (connected to PC by the USB cable shared with the serial terminal) and stop Microblaze at the first executable instruction. See Figure 29.





Figure 29: MicroBlaze application is loaded and debugger stops on the first instruction.

- Arm Thread [1] is running.
- Microblaze Thread [1] is currently suspended at breakpoint hit. See Figure 29.

Click on the |> icon to start the execution of MicroBlaze. The handshake of Arm and Microblaze on HW mutex IP is completed and both processors start to run uninterrupted.

Arm will initiate the Toshiba Full HD video sensor and all Video processing IP cores. It controls in SW status of VDMA units and sets correct pointers to the active video frame buffers. Video processing is performed by HLS IP cores in HW. Data are moved from video frame buffers to HW and back to output video frame buffers by HW data mover IPs. These IPs are set-up by the Arm SW via the Axi-Lite.

Input HW data movers act as HW masters controlling the DMA engines moving data from DDR3 as input to the chain(s) of HLS IP cores. Output HW data movers act as HW masters controlling the DMA engines moving data from the output of chain(s) of HLS IP cores to DDR3 output video frame buffers. See Figure 30.



```
_ D X
COM17 - PuTTY
Image processing time: 16016710, Total FPS: 41.611107
Image processing time: 16016636, Total FPS: 41.611183
Image processing time: 16016744, Total FPS: 41.610924
Image processing time: 16016644, Total FPS: 41.611217
Image processing time: 16016682, Total FPS: 41.611008
Image processing time: 16016628, Total FPS: 41.611217
Image processing time: 16016604, Total FPS: 41.611332
Image processing time: 16016572, Total FPS: 41.611485
Image processing time: 16016568, Total FPS: 41.611423
Image processing time: 16016632, Total FPS: 41.611179
Image processing time: 16016622, Total FPS: 41.611263
Image processing time: 16016576, Total FPS: 41.611366
Image processing time: 16016658, Total FPS: 41.611221
Image processing time: 16016608, Total FPS: 41.611351
Image processing time: 16016578, Total FPS: 41.611420
Image processing time: 16016646, Total FPS: 41.611179
Image processing time: 16016498, Total FPS: 41.611607
Image processing time: 16016630, Total FPS: 41.611206
Image processing time: 16016486, Total FPS: 41.611626
Image processing time: 16016854, Total FPS: 41.610634
Image processing time: 16016622, Total FPS: 41.611279
Image processing time: 16016626, Total FPS: 41.611233
```

Figure 30: Arm is running. It indicates the number of frames per second.

The MicroBlaze processor executes in parallel program from DDR3 and communicated firmware and data to the (8xSIMD) EdkDSP floating point accelerator.

It is testing basic floating point operations and compares EdkDSP results with MicroBlaze floating point results.

In next stage it programs EdkDSP to perform FIT filter and LMS adaptive filter.

The performance of the combination of MicroBlaze with EdkDSP accelerator is measured by HW timer instantiated as Microblaze AXI-Lite IP core. See Figure 31.

25/33



```
_ D X
COM1 - PuTTY
Initialize TmrCtr for axi timer 0...
MBO : (EdkDSP 8xSIMD) Write firmware ...
MB0 : (EdkDSP 8xSIMD) Capabilities1 = 13ffff
MBO : (HW FP unit
                    ) Far-end signal ...
MBO : (EdkDSP 8xSIMD) FIR room response ... 1137 MFLOPs
MBO : (HW FP unit ) Add near-end signal ...
MBO : (EdkDSP 8xSIMD) LMS Identification ...
                                              731 MFLOPs
MBO : (HW FP unit ) LMS Identification ...
                                              3 MFLOPs
MB0 : (EdkDSP 8xSIMD) OK
MBO : (EdkDSP 8xSIMD) Write firmware ...
MB0 : (EdkDSP 8xSIMD) Capabilities1 = 13ffff
MBO : (EdkDSP 8xSIMD) VZ2A 'worker1' ..... OK
MBO : (EdkDSP 8xSIMD) VB2A 'worker1'
MB0 : (EdkDSP 8xSIMD) VZ2B 'worker1'
MB0 : (EdkDSP 8xSIMD) VA2B 'worker1'
MBO : (EdkDSP 8xSIMD) VADD 'worker1' ..... OK
MBO : (EdkDSP 8xSIMD) VADD BZ2A 'worker1' .. OK
MB0 : (EdkDSP 8xSIMD) VADD_AZ2B 'worker1' .. OK
MBO : (EdkDSP 8xSIMD) VSUB 'worker1' ..... OK
MBO : (EdkDSP 8xSIMD) VSUB BZ2A 'worker1' .. OK
MBO : (EdkDSP 8xSIMD) VSUB AZ2B 'worker1' .. OK
MBO : (EdkDSP 8xSIMD) VMULT 'worker1' ..... OK
MB0 : (EdkDSP 8xSIMD) VMULT BZ2A 'worker1' . OK
MBO : (EdkDSP 8xSIMD) VMULT AZ2B 'worker1' . OK
MBO : (EdkDSP 8xSIMD) VPROD 'worker1' ..... OK
MBO : (EdkDSP 8xSIMD) VMAC 'worker1' ..... OK
MB0 : (EdkDSP 8xSIMD) VMSUBAC 'worker1' .... OK
MBO : (EdkDSP 8xSIMD) VPROD S8 'worker1' ... OK
MBO : (EdkDSP 8xSIMD) VDIV 'worker1' ..... OK
MBO : (EdkDSP 8xSIMD) Write firmware ...
MB0 : (EdkDSP 8xSIMD) Capabilities1 = 13ffff
MBO : (HW FP unit
                   ) Far-end signal ...
MBO : (EdkDSP 8xSIMD) FIR room response ... 1138 MFLOPs
MBO : (HW FP unit
                   ) Add near-end signal ...
MBO : (EdkDSP 8xSIMD) LMS Identification ...
                                              732 MFLOPs
```

Figure 31: Microblaze is running. It indicates MFLOPs.

26/33





Figure 32: Accelerated edge detection Toshiba sensor and Zynq with EdkDSP.



Figure 33: Edge detection (Sobel filter) output on Full HD monitor.



- All evaluation demos can be also compiled into release versions with optimisation set to -O2 or -O3. These optimisations can be set for Arm and for MicroBlaze.
- Demo **sh01\_rows\_fixed\_100** works on complete frame with single HW accelerator data path.
- Demo **sh01\_rows\_resize\_25\_to\_100** works with identical HW. But SW scales dynamically the number of lines to be processed. This is scaling from ¼ of frame to the complete frame. Part of the frame which is not processed is automatically propagating the input video signal via the cyclic structure of 8 video frame buffers. The HW data movers are instructed about the number of lines to be processed. SW is writing this information to an AXI-lite configuration register of the data mover IP core.
- Demos sh02\_rows\_fixed\_100 and sh02\_rows\_resize\_25\_to\_100 work with 2 data paths.
- Demos sh03\_rows\_fixed\_100 and sh03\_rows\_resize\_25\_to\_100 work with 3 data paths.
- Demos md01\_rows\_fixed\_100 works with one HW video processing chain with fixed set of processed lines.

#### 2.4 EdkDSP C compiler

This section describes how to use the UTIA EdkDSP C compiler. It cross-compiles (on PC) simple C programs for the PicoBlaze6 controller. This controller acts as programmable finite state machine in the (8xSIMD) EdkDSP accelerator. It is setting the wide instructions for the 8xSIMD floating point data path of the EdkDSP accelerator.

The evaluation package includes also precompiled firmware files. These files can be used without the need to install the EdkDSP C compiler to your PC.

The UTIA EdkDSP C compiler is included as Ubuntu binaries. The "VMware player" software with compatible Ubuntu image is needed to run the UTIA EdkDSP C compiler on Windows 7 PC.

The Ubuntu image used in UTIA needs two DVD (8GB) for installation. That is why it is not included as part of the evaluation package. If you would need this image, write an email request to <a href="mailto:kadlec@utia.cas.cz">kadlec@utia.cas.cz</a> to get these two DVD with correct Ubuntu image from UTIA (free of charge).

Install VMware Workstation 12 Player [9] on Win 7 64 bit PC.

Open the VMware Workstation 12 Player and select the "**Ubuntu\_EdkDSP**" image. The Ubuntu will start.

Login as: User: **devel** Pswd: **devuser** 

The PC directory C:\VM\_07 needs to be shared by Windows 7 with Ubuntu. In Windows 7, set the directory C:\VM\_07 and its subdirectories as shared with the \_\_vmware\_user\_\_ for Read and Write.

28/33

In Ubuntu, open terminal and mount the PC directory **C:\VM\_07** to Ubuntu by typing: **cd bin samba\_07.sh** 

The Windows 7 **C:/VM\_07** directory is mounted to the Ubuntu OS as: /mnt/cdrive In Ubuntu terminal, change the directory to:

/mnt/cdrive/t20i2tm1/edkdsp





The EdkDSP C compiler utilities have to be on the Ubuntu PATH. This is done by sourcing the **settings.sh** script in this directory. Type in Ubuntu terminal:

#### source settings.sh

In Ubuntu terminal, change the directory to the example directory: cd a

#### /mnt/cdrive/t20i2tm1/edkdsp/a\$

Provided C source code examples can be compiled by script **ca\_fp11.sh** with parameter **a**. Type in the Ubuntu terminal:

#### ca\_fp11.sh a

This will compile and assemble four C firmware programs to header files with the firmware binary code for the EdkDSP accelerator:

```
a_fp1101p0.c is compiled to fill_FA1101P0_program_store.h a_fp1101p1.c is compiled to fill_FA1101P1_program_store.h a_fp1124p0.c is compiled to fill_FA1124P0_program_store.h a_fp1124p1.c is compiled to fill_FA1124P0_program_store.h
```

To use the compiled headers in the SDK project, copy and paste

```
edkdsp/a/ fill_FA1101P0_program_store.h
edkdsp/a/ fill_FA1101P1_program_store.h
edkdsp/a/ fill_FA1124P0_program_store.h
edkdsp/a/ fill_FA1124P0_program_store.h
```

to the SDK project directory (in case of sh01\_edkdsp\_fp12\_1x8\_all):

#### C:\VM\_07\t20i2tm1\sh01\_edkdsp\_fp12\_1x8\_all\src

Recompile the MicroBlaze project "sh01\_edkdsp\_fp12\_1x8\_all". The compiled firmware for the (8xSIMD) EdkDSP will be used by the MicroBlaze C code of the demo as data for the runtime (re)configurations of the (8xSIMD) EdkDSP accelerator PicoBlaze6 controller.

The change of firmware is demonstrated by the runtime change of firmware for computation of FIR and LMS filters in the EdkDSP accelerator.



#### 3. Conclusions

This application note documents following general observations and conclusions:

- Programmable logic part of the Zynq xc7z020-2I device is capable implement in parallel the UTIA (8xSIMD) EdkDSP floating point accelerator together with the Full HF video processing chain for the Toshiba color sensor.
- The total power consumption for the HW accelerated video processing in Full HD (measured at the 12V DC power supply) is up to 8.57 W. This is relatively high power for passive cooling in small space even if the dedicated passive heat sink is used.
- The total power consumption for the SW solution without HLS Video IPs (measured at the 12V DC power supply) is 8.33 W. This is also relatively high power for passive cooling.
- The energy per pixel savings for the complete system are significantly reduced for the HW accelerated designs with HLS IP accelerators. Energy per pixel reduction up to 30 x can be reached, for chained HLS IP cores (motion detection).
- Main source of the energy per pixel saving is the increased frame rate of the video processing.
- The combination of 32bit MicroBlaze with the (8xSIMD) EdkDSP floating point accelerator brings additional capability to compute in floating point (single precision) with performance in the range of cca.1 GFLOP/s (1.139 GFLOP/s in case of FIR filter) at the expense of relatively moderate increase of total power consumption:
  - 8.03 W without MicroBlaze + (8xSIMD) EdkDSP 0 GFLOP/s
  - o 8.57 W with MicroBlaze + (8xSIMD) EdkDSP 1.139 GFLOP/s (this is + 540 mW)
- Instantiation of MicroBlaze + (8xSIMD) EdkDSP takes significant part of Zynq PL resources. This
  limits the possibilities for design of video systems with increased number of parallel video
  processing chains.
- Bill of material for the system [1]-[6] is €1030,00.
- The Toshiba Industrial 1080P60 Camera Module is connected to the FMC card by connector. This
  open space for possible replacement of the Toshiba module by Python 1300 colour video sensor
  module (provided by Avnet) with resolution 1280x1024p60. Designs included in this evaluation
  package and the corresponding SW projects are designs only for support Toshiba module.

This application note documents how designs debugged and developed in the high level SDSoC 2015.4 environment can be exported to the end-user in form of SDK 2015.4 projects.

Enclosed SDK 2015.4 projects provide space for the end-user to make some SW adaptations and customisations of the final application without the need to disclose to the end-user complete low level details about used IP cores Vivado 2015.4 project and the SDK 2015.4 board support package.



#### 4. References

- [1] TE0720-03-2IF; Part: XC7Z020-2CLG484I; 1 GByte DDR; Grade: Industrial; Price: €269.00. <a href="http://shop.trenz-electronic.de/en/TE0720-03-2IF-Xilinx-Zynq-module-XC7Z020-2CLG484I-ind-temp.-range-1-Gbyte">http://shop.trenz-electronic.de/en/TE0720-03-2IF-Xilinx-Zynq-module-XC7Z020-2CLG484I-ind-temp.-range-1-Gbyte</a>
- [2] Heatsink for TE0720, spring-loaded embedded; Price: €19.00. https://shop.trenz-electronic.de/en/26922-Heatsink-for-TE0720-spring-loaded-embedded?c=38
- [3] TE0701-05 Carrier Board for Trenz Electronic 7 Series; Price: €249.00. https://shop.trenz-electronic.de/en/TE0701-05-Carrier-Board-for-Trenz-Electronic-7-Series
- [4] AES-FMC-HDMI-CAM-G Price: \$250.00. http://products.avnet.com/shop/en/ema/3074457345623664802
- [5] Toshiba Industrial 1080P60 Camera Module; Price \$229.00. http://zedboard.org/sites/default/files/product\_briefs/PB-AES-CAM-TOSH-1080P-G-v5-web.pdf
- [6] PmodRS232: Serial converter & interface; Price €13.54. https://shop.trenz-electronic.de/de/23331-PmodRS232-Serial-converter-und-interface?c=215
- [7] VMware Workstation Player Documentation https://www.vmware.com/support/pubs/player\_pubs.html



#### 5. Evaluation license

The **evaluation version of the package** can be downloaded from UTIA www pages free of charge for evaluation of EdkDSP accelerator with HW accelerated edge detection and motion detection algorithms for the Toshiba Full HD video sensor [5] on TE0720-03-2IF module [1] located on TE0701-05 carrier [3] with FMC card [4].

The evaluation package includes SDK 2015.4 SW projects with C source code for Arm Cortex A9 processor (32bit) in standalone mode, C source code for MicroBlaze and C source code for the EdkDSP Picoblaze6 controller.

The evaluation package includes these static libraries for Arm Cortex A9 processor (32bit) for standalone mode:

| libfmc_imageon.a | SDK 2015.4 UTIA static library with interface functions for video IP cores |
|------------------|----------------------------------------------------------------------------|
| libwal.a         | SDK 2015.4 UTIA static library with EdkDSP API for MicroBlaze              |
| libsh01.a        | SDSoC 2015.4 static library for HW accelerator in project sh01             |
| libsh02.a        | SDSoC 2015.4 static library for HW accelerator in project sh02             |
| libsh03.a        | SDSoC 2015.4 static library for HW accelerator in project sh03             |
| libmd01.a        | SDSoC 2015.4 static library for HW accelerator in project md01             |
| libmd02.a        | SDSoC 2015.4 static library for HW accelerator in project md02             |

These libraries have no time restriction. Source code of these libraries is not provided in this evaluation package.

The UTIA (8xSIMD) EdkDSP accelerators are compiled with HW limit on number of vector operations. The termination of the nonexclusive, non-transferable evaluation license is reported in advance by the demonstrator on the terminal.

The evaluation package includes SDK 2015.4 SW projects with source code for MicroBlaze processor and ARM processor. SW projects support the family of UTIA (8xSIMD) EdkDSP accelerators for the Trenz TE0720-03-2IF Xilinx Zyng module [1] on Trenz TE701-05 Carrier Board board [3].

The evaluation package includes these binary applications for Ubuntu:

edkdspppEdkDSP C pre-processor binary for Ubuntu in VMware Workstation 12 Player.edkdspccEdkDSP C compiler binary for Ubuntu in VMware Workstation 12 Player.edkdspasmEdkDSP ASM compiler binary for Ubuntu in VMware Workstation 12 Player.

These binary applications have no time restriction. The user of the evaluation package has nonexclusive, non-transferable license from UTIA to use these utilities for compilation of the firmware for the Xilinx PicoBlaze6 processor inside of the UTIA EdkDSP accelerators in precompiled designs. The source code of these compilers is owned by UTIA and it is not provided in the evaluation package.

The evaluation package includes demonstration firmware in C source code for the Xilinx PicoBlaze6 processor for the family of UTIA EdkDSP accelerators for the Xilinx TE0720-03-2IF module on TE0701-05 carrier board.

The evaluation package also includes compiled versions of this firmware in form of header files .h. These compiled firmware files can be used for initial test of the UTIA EdkDSP accelerators on the Xilinx TE0720-03-2IF module on TE0701-05 carrier board without the need to install the UTIA compiler binaries and the Ubuntu image under the VMware Workstation 12 Player [7]. On email request to <a href="kadlec@utia.cas.cz">kadlec@utia.cas.cz</a>, UTIA will send DVD with the Ubuntu image for the VMware Workstation 12 Player [7] free of charge.

32/33





#### **Disclaimer**

This disclaimer is not a license and does not grant any rights to the materials distributed herewith. Except as otherwise provided in a valid license issued to you by UTIA AV CR v.v.i., and to the maximum extent permitted by applicable law:

- (1) THIS APPLICATION NOTE AND RELATED MATERIALS LISTED IN THIS PACKAGE CONTENT ARE MADE AVAILABLE "AS IS" AND WITH ALL FAULTS, AND UTIA AV CR V.V.I. HEREBY DISCLAIMS ALL WARRANTIES AND CONDITIONS, EXPRESS, IMPLIED, OR STATUTORY, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, OR FITNESS FOR ANY PARTICULAR PURPOSE; and
- (2) UTIA AV CR v.v.i. shall not be liable (whether in contract or tort, including negligence, or under any other theory of liability) for any loss or damage of any kind or nature related to, arising under or in connection with these materials, including for any direct, or any indirect, special, incidental, or consequential loss or damage (including loss of data, profits, goodwill, or any type of loss or damage suffered as a result of any action brought by a third party) even if such damage or loss was reasonably foreseeable or UTIA AV CR v.v.i. had been advised of the possibility of the same.

#### **Critical Applications:**

UTIA AV CR v.v.i. products are not designed or intended to be fail-safe, or for use in any application requiring fail-safe performance, such as life-support or safety devices or systems, Class III medical devices, nuclear facilities, applications related to the deployment of airbags, or any other applications that could lead to death, personal injury, or severe property or environmental damage (individually and collectively, "Critical Applications"). Customer assumes the sole risk and liability of any use of UTIA AV CR v.v.i. products in Critical Applications, subject only to applicable laws and regulations governing limitations on product liability.

