Parsing Text Files in Java

The following code is designed to parse (comma, tab, etc.) delimited files in Java.

private static ArrayList parseDelimitedFile(String filePath, String delimiter) throws Exception
{
  ArrayList rows = new ArrayList();

  FileReader fr = new FileReader(filePath);
  BufferedReader br = new BufferedReader(fr);

  String currentRecord;
  while((currentRecord = br.readLine()) != null)
    rows.add(currentRecord.split(delimiter));

  br.close();

  return rows;
}

July 14, 2008 at 3:08 pm 18 comments

Peeking at Large Files in Python

I have been parsing files that are in the multi-gigabyte range. Python can handle them pretty well, but it can still take awhile to chug through them. I have to be honest in saying I don’t know of any great tricks to speed this up. However, one thing that can be helpful when parsing large files is to read a few lines to be able to see the format. The following code will allow you to look at the first 100 lines of a text file with Python (like when you want to see the format of a large file without reading through all of it). To read the entire file, you just would take out the if statement.

inFileName = “associations.txt”
inFile = open(inFileName, ‘r’)
outFile = open(“peek_%s” % inFileName, ‘w’)

count = 0

for line in inFile:
  count += 1

if count <= 100:
  outFile.write(line)
else: break

outFile.close()
inFile.close()

July 14, 2008 at 3:07 pm

How to Concatenate Two String Values in SQL

(The following example was tested in SQLite, but it should work for most versions of SQL.)

The syntax for combining two string values is the following.

SELECT Column1 || Column2
FROM TheTable

July 12, 2008 at 5:35 pm 2 comments

How to Do an IF Statement in SQL

(The following example has been tested in SQLite, but it should work for most versions of SQL.)

Technically there is not a way to do IF statements in the SQLite query language like there is in some other versions of SQL. But you can essentially do it with the CASE statement. The following statement is similar to doing an IF, ELSE IF, ELSE statement.

SELECT (CASE InRome
WHEN ‘Yes’ Then ‘Do As the Romans Do’
WHEN ‘No’ Then ‘Do As the French Do’
ELSE ‘Who knows what to do’
)

July 12, 2008 at 5:34 pm Leave a comment

Check to See If File Exists in Python

import os
os.path.exists(fileName)

This returns a boolean value.

July 11, 2008 at 9:20 pm 6 comments

Increase the Text Size on Axes of Plots in R

An easy way to do this is to use the cex.lab parameter of the plot (or barplot, etc.) function. The abbreviation cex stands for character expansion. If you set this value to 2, the characters will be twice as large as they will be if you don’t set this.

Similarly, you can modify the size of the text labels on the axis using cex.axis and cex.names.

barplot(x,
ylab=’Proportion’,
cex.names=2.0,
cex.axis=2.0,
cex.lab=2.0
)

July 11, 2008 at 7:27 pm Leave a comment

Find Files in Directory Using Python

A nice solution to this is the path Python module. However, the following simple solution will do the trick. It doesn’t support wildcards at this point, but that could easily be added with some regular expression code.

def getFilesMatchingPattern(directory, nonWildCardPattern):
  fileList=os.listdir(directory)
  return [f for f in fileList if f.find(nonWildCardPattern) > -1]

July 10, 2008 at 5:53 pm 2 comments

How Many Files in a Linux Directory

The following command will tell you how many files matching the *.csv pattern are in the current directory:

ls -1R | grep .*.csv | wc -l

July 4, 2008 at 6:14 am 1 comment

Converting a String to a Boolean in Python

Let’s say you have a string value that you want to convert to a boolean, but you’re not sure the format it will be in. Some languages have built-in functions for doing this, but to my knowledge Python doesn’t. Here’s a way to do it (though it’s not comprehensive). (Thanks to the commenter who helped me see a simpler way to do this.)

def parseBoolString(theString):
  return theString[0].upper()==’T’

parseBoolString(“true”)

True

parseBoolString(“false”)

False

April 8, 2008 at 1:11 am 29 comments

Simple Method to Search a Python List

Let’s say you have a list of objects of type Individual and that list is called individuals.

The Individual type contains an ID, name, and email address.

Let’s say you have an ID and want to get the corresponding Individual object from the list. How would you go about doing that?

match = [ind for ind in individuals if ind.id == theID]

April 4, 2008 at 3:19 pm 3 comments

Older Posts