8
Answers

how to seprate records from CSV file

Amol

Amol

13y
1.6k
1
I want to seprate the field from csv file but thing is that field itself content commas so how to go for it
 for example
supid,  suppliername,         city,      state,              phone no
1,      Orton Eng,  mumbai, maharashtra,    02228695877,02228692356

here i have two phone nos with commas

thanks in advance 
Answers (8)
0
Vulpes

Vulpes

NA 98.3k 1.5m 13y
Usually, when you have embedded commas in a CSV file, then the fields concerned are placed within quotes. For example:

dave,1,"red,blue"
bert,2,"red,green"
fred,3,"red,blue,green"

The trick then is to replace the embedded commas with some character that you know won't occur (say character 127 DEL), do the split on the remaining comma, replace the substitute characters with commas again and remove the quotes. Here's some code for doing that:

using System;
using System.IO;

class Test
{
    static void Main()
    {
       string[] lines = File.ReadAllLines("rakesh.txt");
       for(int i = 0; i < lines.Length; i++)
       {
          ProcessEmbeddedCommas(ref lines[i]);
          string[] items = lines[i].Split(',');
          for(int j = 0; j < items.Length; j++) items[j] = items[j].Replace('\x7f', ',').Replace("\"", "");         
          Console.WriteLine("{0}   {1}    {2}", items[0], items[1], items[2]); 
       }
       Console.ReadKey();
    }

    static void ProcessEmbeddedCommas(ref string text)
    {
       bool embedded = false;
       bool changed = false;
       char subst = '\x7f';
       char[] chars = text.ToCharArray();
       for(int i = 0; i < chars.Length; i++)
       {
           if(chars[i] == '\"')
           {
               embedded = !embedded;
           }
           else if (chars[i] == ',' && embedded)
           {
               chars[i] = subst;
               changed = true;
           }
       }
       if (changed) text = new string(chars);
     }     
}

The output should be:

dave   1    red,blue
bert   2    red,green
fred   3    red,blue,green

If you don't use quotes or some other delimiter to mark the fields with embedded quotes, then the task is much more difficult as you don't know which commas are separators and which are embedded in the field.

In your particular example, if you can assume that there will always be five fields and that the only embedded commas will occur in the phone field which is the last one, then you could do the split in the normal way and any items above 5 must then be additional phone numbers.

Accepted
0
Sam Hobbs

Sam Hobbs

NA 28.7k 1.3m 13y
I was answering the question that was asked; how to process the data that does not have quotes that normally has quotes. If the format of the input data can be modified such that it is in a more common format then that was not stated but there are articles in this web site that also could help, such as my Using CSV and Other Delimited Files in C# but that is not the only relevant article.

If the format of the input data can be modified then I suggest using a different delimiter other than a comma. The most common alternate is the tab ('\t') which will likely work here since the data won't have tabs. If the data might have tabs then there are many other possibilities. See Escape Sequences and the ASCII Character Codes Chart for other possibilities. A backslash ('\\') is another character that is not likely to be in most text. So in other words, just separate the fields with a delimiter (such as a tab character) instead of a comma and then you don't need to add the quotes; the data can be parsed with just a simple split.
0
Vulpes

Vulpes

NA 98.3k 1.5m 13y
That's more or less what I suggested in the final paragraph of my first post.

It works as long as there are the same number of fields in each line and the embedded commas can only occur in the final field.

Otherwise you have a problem distinguishing which commas are delimiters and which are embedded.
0
Sam Hobbs

Sam Hobbs

NA 28.7k 1.3m 13y
You could split using the comma as the delimiter and then the first four items will be supid, suppliername, city and state. Then you could simply put the "phone no" back together using String.Join(",", pieces) where pieces is the an array beginning with the fifth item from the split. However you can get Split to omit the need to even do the last part and return an array that always consists of five items. So that reduces the solution to one line.

String[] items = Record.Split(new char[]{','}, 5);
0
Vulpes

Vulpes

NA 98.3k 1.5m 13y
You can do this more easily by substituting the delimiter rather than the embedded commas and then splitting on the delimiter, thereby avoiding the need to change the delimiter back to a comma. However, whether this is more efficient or not depends on how many embedded commas you actually have. Sometimes with this approach you might be changing lines unnecessarily.

You can also do the split whilst you're looking for the commas though, personally, I prefer to keep methods simple when I know I have another method which can do the split efficiently. 

Another approach would be to use regular expressions but this is relatively slow for simple string parsing such as this.

I've no idea what Sam has in mind but, like you, I'd be interested to know.
0
Amol

Amol

NA 241 150.6k 13y
What is your way to sort out this one? Please Share
0
Sam Hobbs

Sam Hobbs

NA 28.7k 1.3m 13y
There is a much easier way to do it, but if you are happy with the first way then that is good.
0
Amol

Amol

NA 241 150.6k 13y
Nice explanation really appreciated
keep it up..