Browse | Submit A New Snippet | Create A Package

 

Split a file at delimiting lines

Type:
Full Script
Category:
File Management
License:
Ruby License
Language:
Ruby
 
Description:
This script will split a file at lines identified by a regular expression. The output files will be named using the original file's name with ".1", ".2", etc. appended to the name of each consecutive output file.

I wrote this script because I needed to break up a text file of MySQL table contents dumped out by phpMyAdmin into one big CSV-style file. The delimiting line in this case was the column headings declaration for each table, which started with "ID;".

The command line for this situation was (on OS X):

split_at_line input_file.csv ^\"ID\"

where I had to escape the " characters so they would not be stripped away by bash before it passed the argument to the script.

The script implicitly assumes that its input file is text, not binary, as the format is not specified when opening the files.

Versions Of This Snippet::

Albert Davidson Chou
Snippet ID Download Version Date Posted Author Delete
1780.12006-09-04 05:52Albert Davidson Chou

Download a raw-text version of this code by clicking on "Download Version"

 


Latest Snippet Version: :0.1

#!/usr/bin/env ruby -w

def usage
  puts "usage:  #{File.basename( __FILE__ )} file_to_split regular_expression_identifying_lines_to_split_at"
end

if ARGV.length != 2 then
  usage
  exit( 1 )
else
  @file_to_split = ARGV[0]
  @expression_identifying_lines_to_split_at = ARGV[1]
end

regular_expression_identifying_lines_to_split_at = Regexp.new( @expression_identifying_lines_to_split_at )

output_file = nil
output_file_name_counter = 0
File.open( @file_to_split ) { |file|
  file.each { |line|
    # If the current line matches the pattern of the delimiting line, create a new output file.
    create_new_output_file = ( line =~ regular_expression_identifying_lines_to_split_at )

    if create_new_output_file then
      # If there is currently an output file open, close it.
      if ! output_file.nil? then
        output_file.close
      end

      # Increment the filename extension counter and create the new output file.
      output_file_name_counter += 1
      output_file = File.open( "#{@file_to_split}.#{output_file_name_counter}", 'w' )
    end

    output_file.print( line )
  }
}
		

Submit a new version

You can submit a new version of this snippet if you have modified it and you feel it is appropriate to share with others..