Hi,
I am trying to remove duplicate lines from a text file. To make things
difficult the lines contain non unique timestamps but a unique reference
number. Some of the duplicates amount to 10 lines whereas others can only be 2
lines.
1. Here are some examples of duplicates
lines:<timestamp>,<reference>,<error message>
08:47:22,95847170050,Problem inputting data.
08:48:28,96672540040,More problems inputting data.
08:49:29,95847170050,Problem inputting data.
08:55:28,106622510040,Extra issues inputting data.
08:56:35,95847170050,Problem inputting data.
08:57:35,106622510040,Extra issues inputting data.
09:02:35,96672540040,More problems inputting data.
09:03:41,96672540040,More problems inputting data.
09:04:41,106622510040,Extra issues inputting data.
I want to delete all but KEEP the most recent duplicate line.
I am new to c#, I originally wrote a java program to do this but was told to
rewrite in c#.
To assist here is the java code.
/*
Contents of the text file is read into an ArrayList (allData)
Unique reference values are then extracted from allData and populated into references (another ArrayList)
*/
static DateFormat df = new SimpleDateFormat("HH:mm:ss");
...
ArrayList latest = getLatestEntries(allData, references);
private static ArrayList getLatestEntries(ArrayList allData, ArrayList references) {
// For each reference, save the latest entry.
ArrayList list = new ArrayList();
for(int i = 0; i < references.size(); i++) {
String ref = references.get(i).toString();
Date date = null;
int maxValIndex = i;
//System.out.printf("ref = %s%n", ref);
for(int j = 0; j < allData.size(); j++) {
String next = allData.get(j).toString();
if(next.split(",")[1].equals(ref)) {
Date nextDate = parse(next.split(",")[0]);
if(date == null) {
date = nextDate;
maxValIndex = j;
continue;
}
if(nextDate.compareTo(date) > 0) {
date = nextDate;
maxValIndex = j;
}
}
}
list.add(allData.get(maxValIndex));
}
return list;
} // getLatestEntries
private static Date parse(String s) {
try {
return df.parse(s);
} catch(ParseException e) {
System.out.println("read error: " + e.getMessage());
System.out.println("parse error: " + e.getMessage());
return null;
}
} //parse
I know the code will be more or less similar with some capitalisation
changes and System.out.println to Console.WriteLine but I am struggling
with the Date to DateTime conversion.
Can someone help?
Thank you in advance.