Pipistrello “POC” project

This project implements a micro-machine entirely in an FPGA, using Pipistrello FPGA board. The aim of this project was to use the available resources of the Pipistrello board such as

This is a Proof Of Concept project. I did not want to optimize it for speed or fpga room, I just wanted to make it work and keep is as simple as possible. So it should considered as a basement for something more complex solutions. I used a lots of existing resources to build the system. I kept all the copyright messages and credentials in the original source codes, so I do not claim anything here to be my own work, apart from putting all pieces together.

See it in action!



I use Xilinx’s Microblaze MCS embedded soft-core processor. This processor is a lightweight version of the “real” (means licensed) Microblaze product, and it has many limitations compared to the real Microblaze, or other hardware core devices. The CPU runs at 100 MHz and it has only 64K internal code and data memory. (This is really a big limitation and disadvantage.) It has some basic IO elements, like UART and GPIO ports. It also has a component called IO bus, this is able to access 1G of external resources.

These are used as follows:

Unfortunately the CPU cannot execute code in the IO area, so the code space is really limited to 64K. Data can be stored in the DDR area, but there is no cache between the CPU and the MCB, so it is a slow access.

The main advantage of the Microblaze MCS CPU is that it’s supported by the Xilinx SDK, therefore we can write applications in C (or C++). Although the 64K code space is not too big, some basic functions can be implemented easily, there is no need for external (3rd party) compilers or tools.

VGA video generator

This is a modified design of Mike Field’s (aka Hamster) MCB framebuffer design. The original design is here:


This page describes the basic steps to build the MCB part, too. It is necessary to access the DDR memory via the MCB, using two separated ports: Port0 is a readonly port for the VGA circuit, while Port1 is a Read/Write port for the CPU.

I removed the pattern writer part from the project and changed the color depth to 16bits. So in my POC design there is 64K colors (R5G6R5). I also changed the clocks and the resolution down to 640x480 pixels. The main pixel clock is 25MHz, which is a bit out of the standards (it should be 25.125MHz), but an HDMI TV or DVI monitor can handle it.

The output of the VGA video generator is encoded by the DVI-D encoder, again this is Mike Field’s work, the URL of the original design is http://hamsterworks.co.nz/mediawiki/index.php/Dvid_test.


The main oscillator runs at 50MHz on the Pipistrello board. The MCB design mentioned above generates the other necessary clocks:

SD-CARD controller

It’s a software controlled interface, using only the CPU’s GPIO ports. All the SPI send/receive functions are written in the software part, a simple bit-banging solution.

PS2 KEYBOARD controller

The original design is downloaded from DigilentInc’s website (http://digilentinc.com/), and slightly modified. The scan codes are read by a simple interface connected the the GPIO ports. There is a small FIFO (16 entries) between the PS2 controller and the CPU. Pipistrello does not have a PS/2 connector itself, so I had to make a basic interface, which is connected to the PMOD connector of the board.



For accessing the file system on the SD-CARD I ported FatFS to Microblaze. FatFS is a tiny but very useful software designed for microcontrollers to access FAT filesystems on modern media (CF cards, SD/MMC cards, etc)

Found here: http://elm-chan.org/fsw/ff/00index_e.html

Apart from FatFS, I wrote the rest.


Design is created with the Xilinx ISE Design Suite (WebPack) version 14.7

Link: http://www.xilinx.com/products/design-tools/ise-design-suite/ise-webpack.html (free)

Software part is created with the Xilinx SDK, version 2014.4

Link: http://www.xilinx.com/tools/sdk.htm (eclipse based and free)

For terminal (serial communication) I use PUTTY

Link: http://www.chiark.greenend.org.uk/~sgtatham/putty/ (free)

Programming the Pipistrello board with a small tool called miniSprog

Link: http://www.saanlima.com/download/miniSProg-win.zip (free)



This document shows the steps to create the Microblaze MCS processor with the Core Generator tool, and then the steps in the SDK to create a “Hello world” application. Very useful.



The hardware

System block diagram

Thee diagram shows the basic elements of the system. The SD-card SPI interface and the PS/2 controller are directly connected to the CPU using the GPIO ports. The Memory controller implements two 32bits ports: a read only port for the video generator and a read/write port for the cpu. The memory controller gives us simplified command/data paths for accessing the DDR ram, completely hides the DDR operations, like refresh, row opening, bank switching, etc.

The read only port of the controller uses burst length of 32 words (which means 32x32 bit), practically it reads 64 pixels in one shot. However, the CPU read/write port only uses burst length of 1 (1x32bit), so that is slow port. In a production system it would need a d-cache to make the memory access significantly faster, but for proof of concept, this is enough,

The IObus-MCB bridge component connects the MCB signals to the Microblaze IO bus. The CPU is waiting for a handshake signal from the IO device, so it can be stalled if a read or write operation takes longer (e.g. when the controller needs to wait for an auto-refresh to complete)

This operation could cause glitches and “snowing” on the screen, if the VGA generator could not read the pixels that it needs, but I have not experienced this kind of behaviour at all.

Ok, so let’s see the interesting parts of the VHDL code now. The design elements are:


This is where everything is connected together. Top level port (to the outer world):

entity top_level is

        Port (

          clk_50            : in  STD_LOGIC; -- Main clock is 50MHz (on-board oscillator)


            -- Reset for the cpu

            reset_sw              : in  STD_LOGIC;


            -- USB Serial interface

            rxd                      : in  STD_LOGIC;

            txd                      : out STD_LOGIC;

            -- SD-Card interface (in SPI mode)

            sd_miso               : in  STD_LOGIC;

            sd_cs                     : out STD_LOGIC;

            sd_mosi               : out STD_LOGIC;

            sd_sck                    : out STD_LOGIC;


            -- HDMI/DVI-D Port

          tmds              : out  STD_LOGIC_VECTOR(3 downto 0);

          tmdsb             : out  STD_LOGIC_VECTOR(3 downto 0);


            -- PS2 keyboard

            kbdclk               : in std_logic;

            kbddat               : in std_logic;


          -- Status LEDs

          led_calibrate : out STD_LOGIC;

          led_written   : out STD_LOGIC;

            led_status        : out STD_LOGIC;

          -- Memory Signals

          mcb3_dram_dq        : inout std_logic_vector(15 downto 0);

          mcb3_dram_a         : out   std_logic_vector(12 downto 0);

          mcb3_dram_ba        : out   std_logic_vector( 1 downto 0);

          mcb3_dram_cke   : out   std_logic;

          mcb3_dram_ras_n : out   std_logic;

          mcb3_dram_cas_n : out   std_logic;

          mcb3_dram_we_n  : out   std_logic;

          mcb3_dram_dm        : out   std_logic;

          mcb3_dram_udqs  : inout std_logic;

          mcb3_rzq            : inout std_logic;

          mcb3_dram_udm   : out   std_logic;

          mcb3_dram_dqs   : inout std_logic;

          mcb3_dram_ck        : out   std_logic;

          mcb3_dram_ck_n  : out   std_logic


end top_level;

The real pin positions for these signals are found in the file pipistrello.ucf and for the DDR ram in the mem32.ucf. Signal names are self explanatory I guess.

The memory wrapper interface needs a lot of signals, I am not going to write down all… see the MIG documentation for more details about them.

What more interesting is the CPU component:

        -- CPU I/O bus signals

   signal IO_Addr_Strobe : STD_LOGIC;

   signal IO_Read_Strobe : STD_LOGIC;

   signal IO_Write_Strobe : STD_LOGIC;

   signal IO_Address : STD_LOGIC_VECTOR(31 DOWNTO 0);

   signal IO_Byte_Enable : STD_LOGIC_VECTOR(3 DOWNTO 0);

   signal IO_Write_Data : STD_LOGIC_VECTOR(31 DOWNTO 0);

   signal IO_Read_Data : STD_LOGIC_VECTOR(31 DOWNTO 0);

   signal IO_Ready : STD_LOGIC;



    -- mcs_0 Microblaze MCS CPU running @ 100MHz

    COMPONENT microblaze_mcs

      PORT (

             Clk : IN STD_LOGIC;

             Reset : IN STD_LOGIC;

             IO_Addr_Strobe : OUT STD_LOGIC;

             IO_Read_Strobe : OUT STD_LOGIC;

             IO_Write_Strobe : OUT STD_LOGIC;

             IO_Address : OUT STD_LOGIC_VECTOR(31 DOWNTO 0);

             IO_Byte_Enable : OUT STD_LOGIC_VECTOR(3 DOWNTO 0);

             IO_Write_Data : OUT STD_LOGIC_VECTOR(31 DOWNTO 0);

             IO_Read_Data : IN STD_LOGIC_VECTOR(31 DOWNTO 0);

             IO_Ready : IN STD_LOGIC;

             UART_Rx : IN STD_LOGIC;

             UART_Tx : OUT STD_LOGIC;

             GPO1 : OUT STD_LOGIC_VECTOR(7 DOWNTO 0);

             GPO2 : OUT STD_LOGIC_VECTOR(7 DOWNTO 0);

             GPI1 : IN STD_LOGIC_VECTOR(7 DOWNTO 0);

             GPI1_Interrupt : OUT STD_LOGIC;

             GPI2 : IN STD_LOGIC_VECTOR(7 DOWNTO 0);

             GPI2_Interrupt : OUT STD_LOGIC



The signals beginning with prefix “IO_” belong to the Microblaze IO bus. These signals will be connected to the bridge circuit later.

The UART RX and TX signals connected directly to the output, via the USB device chip (FTDI FT2232H) they are connected to the PC. (It uses fixed bitrate, 57600bps 8N1, no parity)

The GPI and GPO ports will control the SPI bus for the SD card and the PS2 reader component. They need some signals in the top level design:

        -- Microblaze CPU GPIO ports

    signal gpo1        : std_logic_vector(7 downto 0);          

    signal gpo2        : std_logic_vector(7 downto 0);          

    signal gpi1        : std_logic_vector(7 downto 0);          

    signal gpi2        : std_logic_vector(7 downto 0);          

As you can see, there are 2 input and 2 output ports, each has only 8 bits.

Signals for the VGA video generator:

        -- VGA Signals

    signal blank  : std_logic;

   signal hsync   : std_logic;

   signal vsync   : std_logic;

   signal red     : std_logic_vector(4 downto 0);          

   signal green   : std_logic_vector(5 downto 0);

   signal blue    : std_logic_vector(4 downto 0);

    -- DVI-D / HDMI signals

   signal red_s   : std_logic;

   signal green_s : std_logic;

   signal blue_s  : std_logic;

   signal clock_s : std_logic;

And this is how the SPI and PS2 are connected:

-- SPI assignments

-- using the microblaze's GPIO ports, bit-banging mode

   sd_cs <= gpo2(0); -- SPI SS        (slave select)

  sd_sck <= gpo2(1); -- SPI CLK   (clock)

 sd_mosi <= gpo1(7); -- SPI MOSI  (master out slave in)

 gpi1(0) <= sd_miso; -- SPI MISO  (master in slave out)


-- PS2 keyboard

kbd : entity work.ps2reader

 port map(

  mclk => clk_25m,

  PS2C => kbdclk,

  PS2D => kbddat,

   rst => reset_sw,      

 CLKRD => gpo2(2), -- C code needs to give a H pulse

  DRDY => gpi1(7), -- H if keycode is waiting in the FIFO

  DOUT => gpi2         -- fifo output


Note the bit positions for the GPI and GPO ports. These are going to be memory mapped registers in the CPU’s programming model. Well, in the Microblaze environment, everything is memory mapped anyway.


This wrapper (originally written by Hamster) simplified the signals for the memory controller block. Even though it is considered a simple interface, still you can see a lots of signals. This is because the memory controller itself needs these signals. For each port (see the MIG documentation for the possible combinations) we need a command path, then data path for read and write. The memory controller is a dedicated hardware part on the Spartan-6 FPGA, so no doubt, it handles the DDR memory as fast as possible. Having more than one different ports to the memory allows us to handle the ram as if it was a multiport ram. For each transaction you need to provide some initial data on the command path bus (e.g. the command itself, like read or write, the number of words, also known as burst length, etc), then you need to provide the data if you write, or get the data once it’s available when you read. Written and read data is going through a FIFO, the maximum length is 64 words. The whole design shows the approach of using caches between the system and the ram, which is the usual way DDR rams handled nowadays.

In our case there are no caches. The read path for the video memory uses 32 words burst length, practically means 64 pixels in each run. The memory controller can handle this without any issue.

The read port for the video generator has these signals:

  -- Reads are in a burst length of 32 (32*2 => 64 pixels, each pixel has 16bits)

  vid_read_cmd_enable           : in  std_logic;

  vid_read_cmd_address          : in  std_logic_vector(29 downto 0);

  vid_read_cmd_full             : out std_logic;

  vid_read_cmd_empty                : out std_logic;


  vid_read_data_enable          : in  std_logic;

  vid_read_data                     : out std_logic_vector(31 downto 0);

  vid_read_data_empty           : out std_logic;

  vid_read_data_full                : out std_logic;

  vid_read_data_count           : out std_logic_vector(6 downto 0);

The signals above with “cmd” belong to the command path. Not all signals exposed from the memory controller, because we know that we will only use this port for reading. The vid_read_cmd_full and vid_read_cmd_empty signals show if the command path can accept new commands, or it is full. (Even the command path has FIFO, so it it possible to issue more than one command on a port.) The memory address is provided by the signal called vid_read_cmd_address. This signal uses 30 bits. This is because the memory controller port is configured to use 32 bits data, therefore it always reads 32 bits, and lower 2 bits (selecting the actual byte) aren’t necessary. (Later the video generator circuit uses the upper or lower half of the data to display the pixels.)

On the data path the data presented on the vid_read_data bus. When we issue a read command, the controller needs some time (cycles) to collect the data, so there can be a read latency. Once the data is available, vid_read_data_empty goes low, so the read FIFO has data for us. This signal shows that we can start reading. Burst length is known (we set it), so it is safe to read the words (pixel data).

CPU port is a read/write port, so it has a command path, and two separated data path for read and write:

 -- On the CPU bus writes and reads are in a burst length of 1

  cpu_cmd_instr   : in  std_logic;

  cpu_cmd_enable  : in  std_logic;

  cpu_cmd_empty   : out std_logic;

  cpu_cmd_full    : out std_logic;

  cpu_cmd_address : in  std_logic_vector(29 downto 0);

  -- write signals

  cpu_write_data_enable         : in  std_logic;

  cpu_write_mask                    : in  std_logic_vector(3 downto 0);

  cpu_write_data                    : in  std_logic_vector(31 downto 0);          

  cpu_write_data_empty          : out std_logic;

  cpu_write_data_full   : out std_logic;

  cpu_write_data_count  : out std_logic_vector(6 downto 0);

  -- read signals

  cpu_read_data_enable         : in  std_logic;

  cpu_read_data                         : out std_logic_vector(31 downto 0);

  cpu_read_data_empty           : out std_logic;

  cpu_read_data_full                : out std_logic;

  cpu_read_data_count           : out std_logic_vector(6 downto 0);

The signal cpu_cmd_instr selects read (1) or write (0). In case of reading the memory, the data path is quite the same as for the video generator. The only difference is that in this case we only read one word (data burst length is 32bits).

For the write command path we have to provide a write mask, that selects the 8bits parts of the 32bits data, which is to be written. The mask is a 4bits vector, where 0 select the byte lane for writing, for example the mask “1110” means, that only the least significant byte will be written. Fortunately, the Microblaze CPU has (almost) the same signal, so the mask is controlled by the CPU itself, we do not have to deal with it.

Again, there is a FIFO for both read and write path, so data_empty and data_full signals can be used for different time requirements (when the memory is not ready, the CPU must be stalled).


This one I call bridge. This is the module that joins the CPU IO bus and the memory controller. CPU IO bus has a handshake signal named “IO_ready”, which is asserted by the connected slave device, when the read or write is done. This also means, that the CPU will wait for this signal. In a real project that would slow down the communication (this is where cache is a must), but for proof of concept, it will do. Once the CPU starts a read or write in the IO space, we issue a read or write command to the DDR, and wait, until it finishes. In our case we don’t care about how long it takes.


-- Engineer: Jozsef Laszlo <rbendr@gmail.com>


-- Create Date:        13:19:54 02/05/2015

-- Design Name:

-- Module Name:        mcsoi_to_ddr - Behavioral

-- Project Name:     Pipistrello POC

-- Description:      Interface between MIG memory controller and Microblaze MCS


library IEEE;


entity mcsoi_to_ddr is

      PORT (

             Clk : IN STD_LOGIC;


             -- Microblaze IO bus signals

             IO_Addr_Strobe : in STD_LOGIC;

             IO_Read_Strobe : in STD_LOGIC;

             IO_Write_Strobe : in STD_LOGIC;

             IO_Address : in STD_LOGIC_VECTOR(31 DOWNTO 0);

             IO_Byte_Enable : in STD_LOGIC_VECTOR(3 DOWNTO 0);

             IO_Write_Data : in STD_LOGIC_VECTOR(31 DOWNTO 0);

             IO_Read_Data : out STD_LOGIC_VECTOR(31 DOWNTO 0);

             IO_Ready : out STD_LOGIC;


             -- Spartan6 MCB signals (simplified)            

           memory_ready         : IN std_logic;

           completed          : OUT std_logic;

             cmd_instr          : OUT std_logic;

           cmd_enable          : OUT std_logic;

           cmd_address    : OUT std_logic_vector(29 downto 0);

           cmd_empty           : IN  std_logic;

           cmd_full            : IN  std_logic;


           write_data_empty  : IN  std_logic;

           write_data_count  : IN  std_logic_vector(6 downto 0); -- How many words are queued

           write_data_enable : OUT std_logic;

           write_mask        : OUT std_logic_vector(3 downto 0);

           write_data        : OUT std_logic_vector(31 downto 0);                    


           read_data_enable  : out std_logic;

           read_data         : in  std_logic_vector(31 downto 0);

           read_data_empty   : in  std_logic;

           read_data_full    : in  std_logic;

           read_data_count   : in  std_logic_vector(6 downto 0)            




end mcsoi_to_ddr;

architecture Behavioral of mcsoi_to_ddr is

signal mem_addr : STD_LOGIC_VECTOR(31 DOWNTO 0);

signal wrt_data : STD_LOGIC_VECTOR(31 DOWNTO 0);

signal wrt_mask : STD_LOGIC_VECTOR( 3 DOWNTO 0);

signal readyPulse : std_logic_vector(2 downto 0) := "000";

signal pending_cmd : std_logic := '0';

signal wait_read : std_logic := '0';

signal   start_write : std_logic := '0';

signal   iobus_write : std_logic := '0';

signal   start_read : std_logic := '0';

signal   iobus_read : std_logic := '0';

signal instruction : std_logic := '0'; -- write=0, read=1


  IO_Ready <= readyPulse(2);

  cmd_instr <= instruction;




        if rising_edge(Clk) then

          cmd_address <= mem_addr(29 downto 2) & "00";

          cmd_enable <= pending_cmd;

            write_mask <= wrt_mask;

            write_data <= wrt_data;

           if IO_Write_Strobe='1' and iobus_write='0' then

              mem_addr <= IO_Address and x"03ffffff"; -- limits to 64M and masks out high nibble

              wrt_data <= IO_Write_Data;

              wrt_mask <= not IO_Byte_Enable;              

              iobus_write <= '1';

              instruction <= '0'; -- write

            end if;


           if IO_Read_Strobe='1' and iobus_read='0' then

              mem_addr <= IO_Address and x"03ffffff"; -- limits to 64M and masks out high nibble

              iobus_read <= '1';

              instruction <= '1'; -- read

            end if;


            if start_write = '1' then

                    pending_cmd <= '1';

                    write_data_enable <= '1';

                    readyPulse <= "001";

            elsif start_read = '1' then

                    pending_cmd <= '1';

                    wait_read <= '1';


                    write_data_enable <= '0';

                    pending_cmd <= '0';

                    readyPulse <= readyPulse(1 downto 0) & '0';            

            end if;


            if wait_read='1' and read_data_empty = '0' then

                   IO_Read_Data <= read_data;

                   wait_read <= '0';

                    readyPulse <= "010";

                    read_data_enable <= '1';


                   read_data_enable <= '0';

            end if;


            start_write <= '0';

             start_read <= '0';


            if write_data_count(6) = '0' and cmd_empty = '1' and write_data_empty = '1' and iobus_write='1' then

                    start_write <= '1';

                    iobus_write <= '0';

            end if;

            if cmd_empty = '1' and read_data_empty = '1' and iobus_read='1' then

                    start_read <= '1';

                    iobus_read <= '0';

            end if;


     end if;

  end process;

end Behavioral;

I hope it’s not too complicated. All transmission starts with IO_Write_Strobe or IO_Read_Strobe. Some other signals also set, like instruction (write or read) and the memory address of course. Since the IO space for the CPU is at the address 0xC0000000, we mask out the high part of the address. In the memory controller the base address is 0x00000000 and we only have 64Mbytes (16Mbytes x 32 bits).

These signals set the next stage signals iobus_write or iobus_read. When the controller is ready, start_write or start_read is set.

In case of write, the write mask is used for selecting the byte lanes (wrt_mask <= not IO_Byte_Enable), and if the write FIFO is not full, we do it. There is signal called readyPulse which is a 3bits shift register. This lets us add additional wait states when the write cycle is done.

Reading is bit different. We assert the signal wait_read and wait for the data available (read_data_empty = '0') . When this is true, we read one word (read_data_enable <= '1'; ), and assert the pulse ready signal (readyPulse <= "010";). This is the end of the read cycle.


This is completely done by Hamster (for a full description see the original design), I only modified small parts in order to read data as 16bits pixels (the original design read them as 8bits pixels). Obviously it doubles the bandwidth of the memory read bus (compared to the original version), but it is still good for the MCB.


  -- Display pixels and trigger data FIFO reads


             if hCounter < hVisible and vCounter < vVisible then

            if hcounter(0)='0' then

                   -- lower half of the 32 bits

                    red   <= read_data(15 downto 11);

                    green <= read_data(10 downto  5);

                    blue  <= read_data( 4 downto  0);

                    -- there is a read for 2 pixels, read FIFO should be filled by the MCB

                   read_data_enable <= memory_ready and not read_data_empty;


                   -- upper half of the 32 bits

                    red   <= read_data(31 downto 27);

                    green <= read_data(26 downto 21);

                    blue  <= read_data(20 downto 16);                            

            end if;

            blank <= '0';


        red   <= (others => '0');

        green <= (others => '0');

        blue  <= (others => '0');

            blank <= '1';

     end if;

This is the main part for displaying the pixels. Since we already issued the read command for the MCB, when the first line is started, the read will always be ahead of one block from the display part. Therefore the FIFO is able to provide the pixels we need. (In theory this could be a problem. The MCB does not guarantee, that timing is always like that, and it is possible to catch a refresh cycle… that could cause “snowing” on the screen. In practice, this never happened to me.)


Keyboard handler is based on the Digilent’s example design (I also have some Digilent cards, so I am entitled to use those. Well, anyone can download those designs from digilentinc.com by the way). I added a small (16 entries) FIFO to store the scan codes. (For a real design keyboard reading should be interrupt driven!)

This is how it works:

-- put new char into the FIFO



 if rst='1' then

   buffWrPtr<= "0000";


  if rising_edge(mclk) and state=end_char then



  end if;

 end if;

end process;

-- Reading FIFO by the CPU



 if rst='1' then

   buffRdPtr<= "0000";


  if rising_edge(CLKRD) then


  end if;

 end if;

end process;

DRDY <= '0' when buffWrPtr=buffRdPtr else '1'; -- H if buffer is not empty                                                                                                                               

-- note: overrun condition is not handled here

DOUT <= kbdBuff(conv_integer(buffRdPtr));

The software

For creating the software part I use the Xilinx SDK (2014.4), which is in the Vivaldo package, but also found in the ISE webpack, as far as I know. In order to create your Hardware platform and BSP (Board Support Package), you need to read this document:


This describes the method step by step, so anyone should be able to build a Microblaze system based on it. At least saying “Hello world” is straightforward (though there are many steps).

For Pipistrello POC I wanted to access the SD card - a minimal feature I guess -, so I used the FatFS system. Porting it to Microblaze was not too hard. The rest (apart from FatFS) is mine. I am going to show you the important parts from the C sources.

The workspace folder is inside the Xilinx project, so when you open your SDK, browse to the downloaded and extracted folder when SDK asks for the workspace root. For example, in my drive it is in the folder d:\work\fpga\pipstrello_mcs_ddr, so this is what I provide for the SDK, as the picture shows below.


Memory mapped I/O.

This was easy. The Microblaze MCS documentation shows the fixed addresses for each available I/O module.

typedef unsigned char byte;

typedef unsigned short word;

volatile word* screen = (volatile word*)0xc0000000;

volatile int* mmgpi1;

volatile int* mmgpo1;

volatile int* mmgpi2;

volatile int* mmgpo2;

void gpio_defaults()


    mmgpi1 = (volatile int*)0x80000020;

    mmgpo1 = (volatile int*)0x80000010;

    mmgpi2 = (volatile int*)0x80000024;

    mmgpo2 = (volatile int*)0x80000014;



The variables with prefix “mmgp” refer the GPIO ports, while the screen is the video area im the DDR ram. Even if I used int-s for the mmgp ports, they are bytes only! My guess is that the 32bits CPU works better with int-s than bytes, and the hardware will do the truncating anyway.

Now the PS2 keyboard reader becomes a quite simple function:

int ps2key()


   int rv=0;

   if (*mmgpi1&0x80) // if there is scancode in the PS2 FIFO


           rv = *mmgpi2 & 0xff;

           *mmgpo2 = 0x05; // H pulse on gpo2(2) while gpo(0) remains H (SD-CARD SS)

           *mmgpo2 = 0x01;


   return rv;


Returning 0x00 means no key is pressed. This is safe, cause there is no scancode like 0x00.

The SD card SPI functions in the mmc_bb.c file:

unsigned char spi_cs=0;


/* Transmit bytes to the card (bit-banging)                                  */


static void xmit_mmc (

    const BYTE* buff,            /* Data to be sent */

    UINT bc                   /* Number of bytes to send */



    BYTE i,d;

    do {

            d = *buff++;



                    *mmgpo1 = d; d<<=1; // gpo1 bit 7 => SD_MOSI

                    *mmgpo2 = 0x02|spi_cs; // CLK H

                    *mmgpo2 = spi_cs; // CLK L


    } while (--bc);



/* Receive bytes from the card (bitbanging)                                  */


static void rcvr_mmc (

    BYTE *buff,            /* Pointer to read buffer */

    UINT bc           /* Number of bytes to receive */



    BYTE r,i;

    *mmgpo1 = 0x80; // SD_MOSI = 1

    do {            





                    if (*mmgpi1&1) r++;

                    *mmgpo2 = 0x02|spi_cs; // CLK H

                    *mmgpo2 = spi_cs; // CLK L


            *buff++ = r;

    } while (--bc);



/* Deselect the card and release SPI bus                                     */


static void deselect (void)


    BYTE d;

    spi_cs = 1;

    *mmgpo2 = 1; // CS = H (bit 0);

    rcvr_mmc(&d, 1);    /* Dummy clock (force DO hi-z for multiple slave SPI) */



/* Select the card and wait for ready                                        */


static int select (void)    /* 1:OK, 0:Timeout */


    BYTE d;

    spi_cs = 0;

    *mmgpo2 = 0; // CS = L (bit 0);

    rcvr_mmc(&d, 1);    /* Dummy clock (force DO enabled) */

    if (wait_ready()) return 1;    /* OK */


    return 0;                    /* Failed */


Although this big-banging method works, even if it’s fast enough, still in a real application you should use a hardware SPI module. Fortunately it is really easy to port the FatFS system, so it can’t be too hard to do so.

Drawing on the screen

Displaying pixels needs memory write. As our screen memory area is a linear array of pixels, it’s a piece of cake. I won’t explain. If you don’t understand the following two functions, this document is not for you at all. :-)

void pixel(int x, int y, word color)


    screen[x+y*640] = color;


word point(int x, int y)


    return screen[x+y*640];


Color encoding uses the most significant bits of the Red, Green and Blue colors:

word rgb(int r, int g, int b)


    return (r&0xf8)<<8 | (g&0xfc)<<3 | (b&0xf8)>>3;


See the result:


The memory read and write for the screen area goes through the CPU IO bus and the bridge module described in the VHDL section. This is going to be relatively slow, as I mentioned a couple of times. But works.

Based on the pixel() function, it is easy now to display lines or even characters. I have created a very basic character set with 96 chars. If you have ever seen the legendary ZX Spectrum, these fonts will look familiar. I just copied the bitmaps from the Spectrum’s rom. The function called putchar() uses this character set to show the different chars.

void linefeed()




    if (cy>=475) cy=5;


void putchr(char c)


    if (c=='\n')





    byte b;

    byte* p = charbitmap + ((int)(c-32)<<3);

    int i,j;



            b = *p++;



                    if (b&0x80) pixel(cx+j,cy+i,ink);





    if (cx>=632) linefeed();


Note that the \n (line-feed) character is handled differently. This makes easier to print strings line by line. Variables cx, cy and ink are globals. I do not fear of global vars, no matter what they say. :-)

Printing a string with putchar() … well, could it be ever more pathetic?

void putstr(const char* str)


    char *c = (char *)str;

    while (*c)




Reading files and display them

I have created some example pictures in RAW RGB format. This is not a real picture file format like JPG or PNG, it only contains the raw color information, and I know the dimensions, because I made them. (Orignal image is downloaded from http://www.howdoyousaythatword.com/word/fruit-fruits/ )


The image size is 250x180 pixels (same for all), and the pixels are stored in B G R format, each color needs one byte. So a file like this needs 3x250x180 bytes. Knowing all of these parameters, reading and displaying them isn’t too complicated. (I used the application called Paint Shop Pro, free version 3.12 to export pictures in raw format.)

void read_picture(const char* fn)


    FRESULT fr = f_open(&Fil, fn, FA_READ); // Fil is a global var (FIL Fil;)

    BYTE buff[3*25];

    BYTE *p;

    UINT i,j,k,l;

    if (fr==FR_OK)


            for(i=0;i<180;i++) // 180 pixels vertically


                    for(j=0;j<10;j++) // 10x25 => 250 pixels horizontally


                            f_read(&Fil,buff,3*25,&l); // 25 pixels (1 pixel has 3 bytes)

                            p = buff;











This is the result on Pipistrello POC’s screen:


Main execution loop

Before you run the main application, you need to prepare the SD card. (The system will work without the card, but in this case you cannot read and display images on the screen.)

Assuming, all the preparations done, the app should work the following way: The main loop of the application waits for a keypress on the PS2 keyboard. F1 to F4 reads and displays a picture stored on the SD card (f1.raw … f4.raw). F5 will clear the screen and redraw the initial content. For any other keys, it displays the scancode of the key that was pressed. See this table for reference:


As you usually press and then release a key, both, the “make” (press) and the “break” (release) code will be sent by the keyboard, and then displayed by the system. For example, if you press key “A” it will show 1C, then for the release it will also show F01C. If you press the key, and hold it down, the keyboard will repeat the make code after a certain period, so it will be something like this:


Bugs and untested use cases

Remember, this is Proof of Concept project. A couple of factors can affect the way it works.


Complete project, software workspace also included, zipped: pipipoc.zip

C source files only: source.zip

Bitfile only for Pipistrello: top_level.bit

Raw pictures: f1.raw, f2.raw, f3.raw, f4.raw

Have fun with FPGAs!