arrays - More resource efficient way to get the maximum of the last 512 values -


i have written vhdl code stores last 512 values of input signal , calculates largest of stored values. code works uses lot of lut resources of fpga. purpose of code calculate largest value of last 512 samples, there more resource efficient way of achieving this? (it's important calculates largest of last 512 values , not largest value observed input, latter achieved storing single number).

alternatively there someway code vhdl such synthesiser implement array block ram (bram) instead of luts?

the synthesiser using labview fpga (which believe uses xilinx ise internally compile/synthesise vhdl).

my current code shown below:

library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all;  entity recentmax   port (     clk : in std_logic;     reset : in std_logic;     inputsignal : in std_logic_vector(15 downto 0);     max : out std_logic_vector(15 downto 0)     ); end recentmax;  architecture rtl of recentmax -- declarations   type array512 array(0 511) of signed(15 downto 0);   signal pastvals : array512;   type array256 array(0 255) of signed(15 downto 0);   signal result : array256;    signal calculationstate : unsigned(1 downto 0);   signal nlefttocompute : unsigned(8 downto 0); begin -- behaviour   process(clk)   begin     if(rising_edge(clk))       if(reset = '1')         -- reset values         in pastvals'low pastvals'high loop           pastvals(i) <= (others => '0');         end loop;         in result'low result'high loop           result(i) <= (others => '0');         end loop;         calculationstate <= to_unsigned(0, 2);         max <= std_logic_vector(to_signed(0, 16));         nlefttocompute <= to_unsigned(256, 9);       else         -- stuff         case to_integer(calculationstate)           when 0 =>             in pastvals'low pastvals'high-1 loop               pastvals(i+1) <= pastvals(i);             end loop;             pastvals(0) <= signed(inputsignal);             max <= std_logic_vector(result(0));             nlefttocompute <= to_unsigned(256, 9);             calculationstate <= to_unsigned(1, 2);           when 1 =>             in 0 255 loop               if (i <= to_integer(nlefttocompute)-1)                 if pastvals(i*2) > pastvals(i*2+1)                   result(i) <= pastvals(i*2);                 else                   result(i) <= pastvals(i*2+1);                 end if;               end if;             end loop;             nlefttocompute <= shift_right(nlefttocompute, 1);             calculationstate <= to_unsigned(2, 2);           when 2 =>;             in 0 127 loop               if (i <= to_integer(nlefttocompute)-1)                 if result(i*2) > result(i*2+1)                   result(i) <= result(i*2);                 else                   result(i) <= result(i*2+1);                 end if;               end if;             end loop;             if nlefttocompute > 2               nlefttocompute <= shift_right(nlefttocompute, 1);             else               calculationstate <= to_unsigned(0, 2);             end if;           when others =>             --- nothing - shouldn't here         end case;     end if;   end if; end process; end rtl; 

there 2 possibilities want.

first can choose instantiate bram creating dedicated process , array, synthesizer choose use block ram instead of 512 luts.

constant data_width     : integer := 16; constant add_width      : integer := 9; -- 512 addresses constant dpram_depth    : integer := 2**add_width; -- depth of memory  type dpram array (0 dpram_depth - 1) of std_logic_vector(data_width - 1 downto 0); signal mem              : dpram; 

and process :

dpram_gen_p : process (clk_i) begin     if (rising_edge(clk_i))         if (wr_req_i = '1')             mem(wr_add_s) <= wr_data_i;         end if;         rd_data_s <= mem(rd_add_s);     end if; end process; 

for main synthesizer, syntax implemented block ram. then, instead of pastvals signal need use data port of ram. careful reading cycles since need 1 clock cycle change address of read interface (rd_add_s) , 1 read data (rd_data_s).

the second option (according me easiest , fastest) implement fifo memory (you can use xilinx ip core generator) of size 512 words.

https://www.xilinx.com/support/documentation/ip_documentation/fifo_generator/v13_1/pg057-fifo-generator.pdf

then need write in fifo until it's full , read data word word until it's empty , register highest value did in design.


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

Add a dynamic header in angular 2 http provider -

minify - Minimizing css files -