程序义诊开张几天, 无人问津. 终于邻国 Koye Li 送来了第一笔生意. 感激不尽.

Koye Li's email and program fit perfectly to my submission requirement. The code is quite well written, but I still need to find problems as I promised to. So you can say it's 鸡蛋里挑骨头.

Email from Koye Li

Attached is a simple CSV parser class that I wrote for work a few days ago.  The problem that I had to deal with was to read in a simple CVS formatted file and then using those data and convert it to a special file format. 

Here's a simple explanation of my work, a line of input for the parser would be a standard CSV line, where strings are quoted, and numeric values aren't, so here's a few example input lines:

"F1_NAME",         "Time",        "Speed",   "Engine throttle SP",
"F11_UNIT",        "seconds",     "rev/min", "counts",
"F1_DATA_1",       2345.323,      0.523,     0.937,


The parser will take the input one line at a time and parse it. and here's the sample code that does it...

    ifstream infile("c:\\lalalal.txt", ios::in);
    if (infile)
    {
        string sep = ",";
        CsvParser* csv = new CsvParser(sep);

        string textline;
        while ( getline(infile, textline, '\n') )
        {
            // parse input line
            csv->SplitLine(textline);

            // add fields
            // pass to the converter class,
... purposely omitted**/
        }

        infile.close();
    }
    else
    {
          // calls to our error handling routines, omitted here
    }

Comments:

  1. When dealing with parsing, define the syntax of input data in something like BNF first. Although this case is very simple, there are still places which could generate confusion. The first issue is how to embed '“' within a string. If you do not think about it when designing it, most likely it will not be handled and not tested. The second issue is Europe countries may use comma instead of period in decimal numbers, then the places of command and period may be switched. Also, add syntax description and sample data to the code as comments, to keep it together with source code.
  2. infile is an object, not a pointer. So “if (infile)” is either wrong or not needed.
  3. There is no allocation failure check for csv = new CsvParser(sep). It will generate access violation unless new throws exception on failure.
  4. SplitLine returns number of items parsed. It should be important to the caller.
  5. csv is not freed. Memory leak.

csv.h

#ifndef MY_CSV_PARSER

#define MY_CSV_PARSER

 

#pragma once

#pragma warning(disable:4786) // disable vc STL complains

 

#include <iostream>

using namespace std;

 

#include <vector>

#include <string>

 

#include "stdafx.h"

 

class CsvParser

{

public:

CsvParser(string& fieldSep) :

m_sFieldSep(fieldSep), m_nField(0) {}

 

virtual ~CsvParser() {};

 

int GetNumberOfFields();

 

string GetField(int n);

 

int SplitLine(string& str);

 

protected:

 

ExtractStringField(string& source, string& field, int start_pos);

ExtractNumericField(string& source, string& field, int istart_pos);

int Split();

 

string m_sLine;

vector<string> m_vsField;

string m_sFieldSep;

int m_nField;

};

#endif

Comments:

  1. It looks like other places of your program uses STL too, so put <vector>, <string> and <iostream> into stdafx.h to avoid compiling them multiple times.
  2. Normally stdafx.h should be included by every .cpp file, then there is no need to include it in header files.
  3. Virtual destructor is not need unless CsvParser has derived classes. The cost of having a virtual destructor is a pointer for each object allocated and a virtual function table.
  4. GetNumberOfFields and GetFields can be const function.
  5. GetField should return const reference. 
  6. SplitLine parameter can be const reference.
  7. ExtractStringField and ExtraNumericField, first parameter can be const reference, add something out OUT to second parameter, 3rd number needs to be consistent in name. Both functions should return int, they can all be const function.
  8. m_sLine does not need to be member variable as it's always passed to ExtractStringField and ExtraNumericField, as currently designed.

 

csv.cpp

// CSV.cpp: implementation of the CSV class.
//
//////////////////////////////////////////////////////////////////////

#include "stdafx.h"
#include "CSV.h"

#ifdef _DEBUG
#undef THIS_FILE
static char THIS_FILE[]=__FILE__;
#define new DEBUG_NEW
#endif

//
// SplitLine()
//
int CsvParser::SplitLine(string& str)
{
    m_sLine = str;

    // clear old buffer
    m_vsField.erase(m_vsField.begin(), m_vsField.end());

    return Split();
}

//
// CsvParser::GetField(int n)
//
string CsvParser::GetField(int n)
{
    if (n<0 || n > m_vsField.size())
        return "";
    else
        return m_vsField[n];
}

//
// GetNumberOfFields()
//
int CsvParser::GetNumberOfFields()
{
    return m_nField;
}

//
// CsvParser::Split()
//
int CsvParser::Split()
{
    m_nField = 0;

    int line_length = m_sLine.length();
    if (line_length == 0)
        return 0;

    string field;

    int cur_pos = 0;
    int next_pos = 0;
    do {
        // skip spaces
        for (; isspace(m_sLine[cur_pos]); cur_pos++);
        
        if (m_sLine[cur_pos] == '"')  
            next_pos = ExtractStringField(m_sLine, field, cur_pos);
        else
            next_pos = ExtractNumericField(m_sLine, field, cur_pos);

        m_vsField.push_back(field);
        m_nField++;

        cur_pos = next_pos + 1; // skip current ',' and start looking for the next

    } while (cur_pos < line_length);

    return m_nField;

}

int CsvParser::ExtractStringField(string& source, string& field, int start_pos)
{
    field = "";

    int cur_pos = 0;
    int line_length = source.length();
        
    start_pos++;   // skip first '"'

    // find next '"'
    cur_pos = source.find_first_of("\"", start_pos);
    if (cur_pos < 0 )
    {
        cur_pos = line_length;
    }

    field = string(source, start_pos, cur_pos-start_pos);

    // find next ','
    cur_pos = source.find_first_of(m_sFieldSep, cur_pos);
    if (cur_pos < 0 )
    {
        cur_pos = line_length;
    }
            
    return cur_pos;
}


int CsvParser::ExtractNumericField(string& source, string& field, int start_pos)
{
    field = "";

    int cur_pos = 0;
    int line_length = source.length();
    
    // find first ','
    cur_pos = source.find_first_of(m_sFieldSep, start_pos);
    if (cur_pos < 0 )
    {
        cur_pos = line_length;
    }
    
    field = string(source, start_pos, cur_pos-start_pos);

    return cur_pos;
}   

Comments:

  1. GetField method, the out of bound condition should be (n < 0 || (n >= m_vsField.Size()). Add a debug assert to out of bound case to catch possible bugs in caller's code.
  2. Split method, field and next_pos can be moved inside the loop.
  3. Split mehod, what is a field is neither string field nor numeric field?
  4. ExtractStringField method, file = ““ is not needed.
  5. Embedding '“' within string is not supported. Document it.
  6. Anything after a string and before a seperator is skipped. If this is intended, document it.
  7. find next ',' does not match code which fields the next seperator.
  8. What if m_sFieldSep is null, or not single character?
  9. ExtractNumericField, source.length is calcualted first, but only used when cur_pos is less than 0.
  10. The routines does not really find numeric numbers, it just returns anything before the next seperator.
  11. source.length is calculated multiple times. It may be worthwhile to store it as member variable and do not pass m_sLine to two extraction functions.

Thanks Koye Li for sharing the code.