Lately, I have been working applications which require data from an external source to be loaded into the local database. One such application is Sydrea (in the making!), which requires the Drug information to be loaded. Drugs@FDA is freely downloadable compressed zip file which contains about 9 CSV files having Drug related data.

The usual path to data loading is through a rails runner script. I was tired of writing, testing different scripts to load data. I had to come up with one general script which, with minimal customizations, could load data into whatever table.

I created the required models and tables. Database schema for the tables to hold drug related information is available at the Drugs@FDA website. I started with defining a Hash mapping the file containing data, to the Object.

files = {
	'AppDoc.txt' 			  => 'AppDoc',
	'AppDocType_Lookup.txt'   => 'AppDocTypeLookup',
	'application.txt' 		  => 'Application',
	'ChemTypeLookup.txt'	  => 'ChemicalTypeLookup',
	'DocType_lookup.txt'	  => 'DocTypeLookup',
	'product.txt'			  => 'Product',
	'Product_tecode.txt'      => 'ProductTECode',
	'RegActionDate.txt'	      => 'RegActionDate',
	'ReviewClass_Lookup.txt'  => 'ReviewClassLookup'
} 

What I need to do now is iterate over this hash, read file whose name is given by the Key and create new objects of the type Value. To the AR's new method you can pass a hash with key names matching the associated table column names. The column names are given by the file headers. Now, I need a method which would take two arrays and return me a hash. Like given

arr1 = ['col_name1', 'col_name2'] 
arr2 = ['val1', 'val2'] 
would return
{ 
	'col_name1' => 'val1',
	'col_name2' => 'val2'
}

I came up with a method that does exactly that and I added it to the Array class:

class Array
	def self.to_hash(headers, values) 
		hsh = Hash.new
		headers.each_with_index do |h, i|
			hsh[h.underscore] = values[i]
		end
	
		hsh
	end
end

Then, we're all set to put everything together and load data:

require 'rubygems'
require 'fastercsv'

files.each do |key, value|
	file = "#{RAILS_ROOT}/db/drugsatfda/" + key
	recs = 0
	
	puts "Working with #{value.pluralize}.."
	FasterCSV.foreach(file, :headers => true) do |row|
		begin
			obj = value.constantize.new(Array.to_hash(row.headers, row.fields))
			obj.save
		
			recs += 1
		rescue => e
			puts "Rows processed: " + recs.to_s
			puts e
		end
	end
	puts "Loaded #{recs} #{value.pluralize}"
end
Technicals: Rails 2.3.8, PostgreSQL, FasterCSV
blog comments powered by Disqus