열에,가 포함될 수있는 CSV를 분할하는 방법
주어진
2,1016,7 / 31 / 2008 14 : 22, Geoff Dalgas, 6 / 5 / 2011 22:21, http://stackoverflow.com , "Corvallis, OR", 7679,351,81, b437f461b3fd27387c5d8ab47a293d35,34
C #을 사용하여 위의 정보를 다음과 같이 문자열로 분할하는 방법 :
2
1016
7/31/2008 14:22
Geoff Dalgas
6/5/2011 22:21
http://stackoverflow.com
Corvallis, OR
7679
351
81
b437f461b3fd27387c5d8ab47a293d35
34
보시다시피 열 중 하나에, <= (Corvallis, OR)
// 업데이트 // C # Regex Split 기반 -따옴표 외부의 쉼표
string[] result = Regex.Split(samplestring, ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");
Microsoft.VisualBasic.FileIO.TextFieldParser
수업을 사용하십시오 . 이것은 구분 된 파일 TextReader
또는 Stream
일부 필드가 따옴표로 묶여 있고 일부는 그렇지 않은 경우 구문 분석을 처리 합니다.
예를 들면 :
using Microsoft.VisualBasic.FileIO;
string csv = "2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,\"Corvallis, OR\",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";
TextFieldParser parser = new TextFieldParser(new StringReader(csv));
// You can also read from a file
// TextFieldParser parser = new TextFieldParser("mycsvfile.csv");
parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");
string[] fields;
while (!parser.EndOfData)
{
fields = parser.ReadFields();
foreach (string field in fields)
{
Console.WriteLine(field);
}
}
parser.Close();
결과는 다음과 같습니다.
2 1016 년 2008/7/31 14:22 제프 달 가스 2011/6/5 22:21 http://stackoverflow.com 코발리스, OR 7679 351 81 b437f461b3fd27387c5d8ab47a293d35 34
자세한 내용은 Microsoft.VisualBasic.FileIO.TextFieldParser 를 참조하십시오.
참조 Microsoft.VisualBasic
추가 .NET 탭에서에 참조를 추가해야합니다 .
너무 늦었지만 누군가에게 도움이 될 수 있습니다. RegEx를 다음과 같이 사용할 수 있습니다.
Regex CSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
String[] Fields = CSVParser.Split(Test);
뒤에 짝수 개의 따옴표가있는 모든 쉼표로 분할 할 수 있습니다.
또한 specf
쉼표 처리에 대한 CSV 형식 을보고 싶습니다 .
유용한 링크 : C# Regex Split - commas outside quotes
I see that if you paste csv delimited text in Excel and do a "Text to Columns", it asks you for a "text qualifier". It's defaulted to a double quote so that it treats text within double quotes as literal. I imagine that Excel implements this by going one character at a time, if it encounters a "text qualifier", it keeps going to the next "qualifier". You can probably implement this yourself with a for loop and a boolean to denote if you're inside literal text.
public string[] CsvParser(string csvText)
{
List<string> tokens = new List<string>();
int last = -1;
int current = 0;
bool inText = false;
while(current < csvText.Length)
{
switch(csvText[current])
{
case '"':
inText = !inText; break;
case ',':
if (!inText)
{
tokens.Add(csvText.Substring(last + 1, (current - last)).Trim(' ', ','));
last = current;
}
break;
default:
break;
}
current++;
}
if (last != csvText.Length - 1)
{
tokens.Add(csvText.Substring(last+1).Trim());
}
return tokens.ToArray();
}
Use a library like LumenWorks to do your CSV reading. It'll handle fields with quotes in them and will likely overall be more robust than your custom solution by virtue of having been around for a long time.
It is a tricky matter to parse .csv files when the .csv file could be either comma separated strings, comma separated quoted strings, or a chaotic combination of the two. The solution I came up with allows for any of the three possibilities.
I created a method, ParseCsvRow() which returns an array from a csv string. I first deal with double quotes in the string by splitting the string on double quotes into an array called quotesArray. Quoted string .csv files are only valid if there is an even number of double quotes. Double quotes in a column value should be replaced with a pair of double quotes (This is Excel's approach). As long as the .csv file meets these requirements, you can expect the delimiter commas to appear only outside of pairs of double quotes. Commas inside of pairs of double quotes are part of the column value and should be ignored when splitting the .csv into an array.
My method will test for commas outside of double quote pairs by looking only at even indexes of the quotesArray. It also removes double quotes from the start and end of column values.
public static string[] ParseCsvRow(string csvrow)
{
const string obscureCharacter = "ᖳ";
if (csvrow.Contains(obscureCharacter)) throw new Exception("Error: csv row may not contain the " + obscureCharacter + " character");
var unicodeSeparatedString = "";
var quotesArray = csvrow.Split('"'); // Split string on double quote character
if (quotesArray.Length > 1)
{
for (var i = 0; i < quotesArray.Length; i++)
{
// CSV must use double quotes to represent a quote inside a quoted cell
// Quotes must be paired up
// Test if a comma lays outside a pair of quotes. If so, replace the comma with an obscure unicode character
if (Math.Round(Math.Round((decimal) i/2)*2) == i)
{
var s = quotesArray[i].Trim();
switch (s)
{
case ",":
quotesArray[i] = obscureCharacter; // Change quoted comma seperated string to quoted "obscure character" seperated string
break;
}
}
// Build string and Replace quotes where quotes were expected.
unicodeSeparatedString += (i > 0 ? "\"" : "") + quotesArray[i].Trim();
}
}
else
{
// String does not have any pairs of double quotes. It should be safe to just replace the commas with the obscure character
unicodeSeparatedString = csvrow.Replace(",", obscureCharacter);
}
var csvRowArray = unicodeSeparatedString.Split(obscureCharacter[0]);
for (var i = 0; i < csvRowArray.Length; i++)
{
var s = csvRowArray[i].Trim();
if (s.StartsWith("\"") && s.EndsWith("\""))
{
csvRowArray[i] = s.Length > 2 ? s.Substring(1, s.Length - 2) : ""; // Remove start and end quotes.
}
}
return csvRowArray;
}
One downside of my approach is the way I temporarily replace delimiter commas with an obscure unicode character. This character needs to be so obscure, it would never show up in your .csv file. You may want to put more handling around this.
I had a problem with a CSV that contains fields with a quote character in them, so using the TextFieldParser, I came up with the following:
private static string[] parseCSVLine(string csvLine)
{
using (TextFieldParser TFP = new TextFieldParser(new MemoryStream(Encoding.UTF8.GetBytes(csvLine))))
{
TFP.HasFieldsEnclosedInQuotes = true;
TFP.SetDelimiters(",");
try
{
return TFP.ReadFields();
}
catch (MalformedLineException)
{
StringBuilder m_sbLine = new StringBuilder();
for (int i = 0; i < TFP.ErrorLine.Length; i++)
{
if (i > 0 && TFP.ErrorLine[i]== '"' &&(TFP.ErrorLine[i + 1] != ',' && TFP.ErrorLine[i - 1] != ','))
m_sbLine.Append("\"\"");
else
m_sbLine.Append(TFP.ErrorLine[i]);
}
return parseCSVLine(m_sbLine.ToString());
}
}
}
A StreamReader is still used to read the CSV line by line, as follows:
using(StreamReader SR = new StreamReader(FileName))
{
while (SR.Peek() >-1)
myStringArray = parseCSVLine(SR.ReadLine());
}
With Cinchoo ETL - an open source library, it can automatically handles columns values containing separators.
string csv = @"2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,""Corvallis, OR"",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";
using (var p = ChoCSVReader.LoadText(csv)
)
{
Console.WriteLine(p.Dump());
}
Output:
Key: Column1 [Type: String]
Value: 2
Key: Column2 [Type: String]
Value: 1016
Key: Column3 [Type: String]
Value: 7/31/2008 14:22
Key: Column4 [Type: String]
Value: Geoff Dalgas
Key: Column5 [Type: String]
Value: 6/5/2011 22:21
Key: Column6 [Type: String]
Value: http://stackoverflow.com
Key: Column7 [Type: String]
Value: Corvallis, OR
Key: Column8 [Type: String]
Value: 7679
Key: Column9 [Type: String]
Value: 351
Key: Column10 [Type: String]
Value: 81
Key: Column11 [Type: String]
Value: b437f461b3fd27387c5d8ab47a293d35
Key: Column12 [Type: String]
Value: 34
For more information, please visit codeproject article.
Hope it helps.
참고URL : https://stackoverflow.com/questions/6542996/how-to-split-csv-whose-columns-may-contain
'program tip' 카테고리의 다른 글
Rails에서 다른 Rails 예외처럼 작동하도록 어떻게 예외를 발생 시키나요? (0) | 2020.09.07 |
---|---|
개체 요약의 NewLine (0) | 2020.09.07 |
Firefox 5 '캐싱'301 리디렉션 (0) | 2020.09.07 |
Debug.Assert 대 예외 발생 (0) | 2020.09.07 |
프로토 타입을 사용하는 경우 자바 스크립트 (0) | 2020.09.07 |