Loading data from multiple files in Ruby using Hash

February 21, 2011 by Syed Aslam

Lately, I have been working applications which require data from an external source to be loaded into the local database. One such application is Sydrea (in the making!), which requires the Drug information to be loaded. Drugs@FDA is freely downloadable compressed zip file which contains about 9 CSV files having Drug related data.

The usual path to data loading is through a rails runner script. I was tired of writing, testing different scripts to load data. I had to come up with one general script which, with minimal customizations, could load data into whatever table.

I created the required models and tables. Database schema for the tables to hold drug related information is available at the Drugs@FDA website. I started with defining a Hash mapping the file containing data, to the Object.

files = {
    'AppDoc.txt'              => 'AppDoc',
    'AppDocType_Lookup.txt'   => 'AppDocTypeLookup',
    'application.txt'         => 'Application',
    'ChemTypeLookup.txt'      => 'ChemicalTypeLookup',
    'DocType_lookup.txt'      => 'DocTypeLookup',
    'product.txt'             => 'Product',
    'Product_tecode.txt'      => 'ProductTECode',
    'RegActionDate.txt'       => 'RegActionDate',
    'ReviewClass_Lookup.txt'  => 'ReviewClassLookup'
} 

What I need to do now is iterate over this hash, read file whose name is given by the Key and create new objects of the type Value. To the AR's new method you can pass a hash with key names matching the associated table column names. The column names are given by the file headers. Now, I need a method which would take two arrays and return me a hash. Like given

arr1 = ['col_name1', 'col_name2'] 
arr2 = ['val1', 'val2'] 
would return
{ 
    'col_name1' => 'val1',
    'col_name2' => 'val2'
}

I came up with a method that does exactly that and I added it to the Array class:

class Array
    def self.to_hash(headers, values) 
        hsh = Hash.new
        headers.each_with_index do |h, i|
            hsh[h.underscore] = values[i]
        end
    
        hsh
    end
end

Then, we're all set to put everything together and load data:

require 'rubygems'
require 'fastercsv'

files.each do |key, value|
    file = "#{RAILS_ROOT}/db/drugsatfda/" + key
    recs = 0
    
    puts "Working with #{value.pluralize}.."
    FasterCSV.foreach(file, :headers => true) do |row|
        begin
            obj = value.constantize.new(Array.to_hash(row.headers, row.fields))
            obj.save
        
            recs += 1
        rescue => e
            puts "Rows processed: " + recs.to_s
            puts e
        end
    end
    puts "Loaded #{recs} #{value.pluralize}"
end
Technicals: Rails 2.3.8, PostgreSQL, FasterCSV

© 2017 | Follow on Twitter