随笔 - 48, 评论 - 258, 引用 - 17

导航

关于

我是新人,: )

每月存档

最新留言

广告

 

前些天正好需要处理一些csv文件(就是逗号分隔文件,常用excel的应该很熟悉),于是想早早.NET里是不是已经有这样的类库了。最后还真的发现了一个。不过有趣的是那个类(Microsoft.VisualBasic.FileIO.TextFieldParser)只有VB.NET的版本。

不过我们是在.NET平台上么,CLR的好处不是只拿来吹吹的。VB.NET写的类,在C#程序里一样能用:

1. 首先Add Refenece(Microsoft.VisualBasic):

2. 然后声明使用的namespace:

using Microsoft.VisualBasic;

3. 接着就可以写代码了, 我加了一些很简单的Schema的功能:

 

using Microsoft.VisualBasic.FileIO; using System; using System.Collections; using System.Collections.Generic; using System.IO; namespace MS.Live.Cumulus.Autopilot.Watchdog { public class CsvParser { /* fields */ TextFieldParser csvParser = null; Dictionary<string, int> textSchema = null; /* methods */ public CsvParser(string inputFile) { csvParser = new TextFieldParser(inputFile); csvParser.TextFieldType = FieldType.Delimited; csvParser.Delimiters = new string[] { "," }; } public CsvParser(Stream inputStream) { csvParser = new TextFieldParser(inputStream); csvParser.TextFieldType = FieldType.Delimited; csvParser.Delimiters = new string[] { "," }; } public void SetSchema(string[] schema) { textSchema = new Dictionary<string, int>(schema.Length); int index = 0; foreach (string field in schema) textSchema.Add(field, index++); // may throw an exception if duplicated field names exist in schema } public string ReadLineRaw() { return csvParser.ReadLine(); } public string[] ReadLine() { return csvParser.ReadFields(); } public string[] ReadFields(string[] filter) { if (null == textSchema) throw new Exception("The parser has no schema defined."); string[] allFields = ReadLine(); string[] result = new string[filter.Length]; for (int i = 0; filter.Length > i; i++) result[i] = allFields[textSchema[filter[i]]]; return result; } /* properties */ public bool IsEndOfData { get { return csvParser.EndOfData; } } }

4. 最后测试一下:

 

public void TestCsvParser() { CsvParser parser = new CsvParser("input.csv"); try { parser.ReadFields(new string[] { "Address" }); Assert.Fail("This line should not be reached"); } catch (Exception) { } parser.SetSchema(new string[]{ "Name", "PhoneNumber", "Address" }); string[] row = parser.ReadLine(); Assert.IsTrue("FakeID1" == row[0] && "FakePhoneNumber1" == row[1] && "FakeAddress1" == row[2]); Assert.IsTrue("FakeID2,FakePhoneNumber2,FakeAddress2" == parser.ReadLineRaw()); row = parser.ReadFields(new string[]{ "Address", "Name" }); Assert.IsTrue(2 == row.Length && "FakeAddress3" == row[0] && "FakeID3" == row[1]); row = parser.ReadFields(new string[]{ "Name", "PhoneNumber", "Address" }); row = parser.ReadFields(new string[]{ "Name", "PhoneNumber", "Address" }); Assert.IsTrue(3 == row.Length && "FakeID5" == row[0] && "hello,Fake\"PhoneNumber5" == row[1] && "FakeAddress5" == row[2]); Assert.IsTrue(true == parser.IsEndOfData); }

 

测试的输入文件如下:

FakeID1,FakePhoneNumber1,FakeAddress1
FakeID2,FakePhoneNumber2,FakeAddress2
FakeID3 , FakePhoneNumber3,FakeAddress3

FakeID4, FakePhoneNumber4 ,FakeAddress4
FakeID5, "hello,Fake""PhoneNumber5", FakeAddress5

打印 | 张贴于 2007-06-25 17:17:00 | Tag:暂无标签

留言反馈

#回复: CsvParser in C# 编辑
在数据采集用的比较多点
2007-06-26 09:24:00 | [匿名用户:梁广永]
博客主人设置本博客不允许匿名用户发表言论,请登录后再试

Powered by: Joycode.MVC引擎 0.5.1.8