30. 8. 2012

Sail Forms XML/CSV Convertor

Na Androidu jsem si oblíbil aplikaci Sail Forms, což je velmi jednoduchá databáze, která umožňuje dělat pěkné formuláře. Používám ji především pro evidenci různých věcí, jako je seznam všech mých her (s datem, cenou a místem nákupu, plus další info o hře typu platforma, žánr, vydavatel, PEGI a tak dále), nebo evidence plateb přes PayPal a podobně. 

Na občasné zadávání a prohlížení dat je to dobré, ale čas od času potřebuji udělat nějaké hromadné změny, doplnění nebo výpočty a na to je jedinečný tabulkový kalkulátor Calc nebo Excel s funkcemi automatického doplňování, najdi a nahraď a podobně. A proto jsem řešil problém, jak dostat data z Sail Forms do tabulkového kalkulátoru a po úpravě zpět do Sail Forms.
 

Program sice má export do CSV, ale jen v palcené verzi, což je pro mě problém, když Google obchod nepodporuje PayPal. I kdyby to problém nebyl, obávám se, že by si ten program stejně neporadil s českým CSV, tj. českým kódováním, českým oddělovačem sloupců i českou desetinnou čárkou na místo tečky.
A proto jsem se rozhodl udělat si vlastní konvertor s využitím exportu dat z Sail Forms pro zálohování. Vyexportovaná data jsou uložena v XML souboru, takže konverze z XML do CSV a zpět je poměrně jednoduchá. Konvertor jsem si napsal v Pythonu a ovládá se jednoduše z příkazového řádku.

Zdrojový kód dávám veřejně k dispozici, takže každý kdo umí s Pythonem může tento konvertor používat. Kdyby k tomu někdo chtěl doplnit okenní uživatelské rozhranní, budu jen rád. Zdrojový kód je upraven tak, aby celá aplikace mohla být jen v jednom souboru.

Velkou výhodou tohoto konvertoru je, že používá nestandardní formát CSV souboru, díky čemuž je možno v Excelu používat Unicode kódování a Excel nemá problém s určením hranic sloupců ani čísel.

Obsah souboru "sfconvert.py":
==============================================================================
# encoding: utf-8

##############################################################################
# Sail Forms Convertor / Python 2.5
# (c) Petr Mach 2012
#
# Convert XML backup data from SailForms to CSV and vice versa.
#
# CSV FORMAT:
#     value separator: TAB
#     text delimiter:  "
#     encoding:        Unicode (UTF-16 LE)
#     decimal point:   depending on locale or optional parameter
#
# HOW TO USE:
#     python sfconvert MODE FILENAME [DP]
#         FILENAME - name of XML backup file, for example data.xml
#         MODE     - GET or STORE
#         [DP]     - optional, decimal point in CSV file  
#
# EXAMPLES:
#     python sfconvert GET data.xml
#         - need:   data.xml
#         - create: data.xml.csv
#     python sfconvert STORE data.xml  
#         - need:   data.xml, data.xml.csv
#         - create: data.xml.csv.xml               
#
# WARNING: 
#     Do not delete data.xml after creation of data.xml.csv.
#     This file is needed to create data.xml.csv.xml.
#
# TESTED WITH:
#     Python 2.5 on WindowsXP CZ.
#

import sys, locale
from   xml.dom import minidom as md
from   xml.dom import Node

class SFConvertor(object):
    """
    user methods:
    SFConvert.run(mode)
    
    main methods:
    SFConvert.makeCsvData()          # make CSV data from XML file
    SFConvert.writeCsvData(csvData)
    SFConvert.readCsvData()          # read CSV data from CSV file
    SFConvert.makeRecData(csvData)   # make XML data from CSV data
    SFConvert.writeRacData(recData)  # create new XML file from old XML file with new data
    
    other methods:
    SFConvert.getColData()           # get columns data from XML file
    SFConvert.getColFormat()         # make index [col name] = data format from colData 
    SFConvert.getXmlUText(node)      # get unicode text from DOM/XML node
    
    """
    def __init__(self, fName, dPoint):
        self.sfIFName  = fName
        self.csvFName  = self.sfIFName + '.csv'
        self.sfOFName  = self.csvFName + '.xml'
        self.dPoint    = dPoint
        
        try:
            self.domData   = md.parse(self.sfIFName)
        except:
            print "ERROR: Parsing XML file '%s' failed!" % self.sfIFName
            print makeTextFromException()
            sys.exit(1)
        self.colData   = self.getColData()
        self.colFormat = self.getColFormat()
        
        self.csvData   = None
        self.recData   = None
        
    def run(self, mode):
        if  mode in ('GET', 'LOAD'):
            print 'MODE: GET CSV'
            print 'decimal="%s"' % self.dPoint
            csvData = self.makeCsvData()
            self.writeCsvData(csvData)
        elif mode in ('STORE', 'SAVE'):
            print 'MODE: STORE CSV' 
            print 'decimal="%s"' % self.dPoint
            csvData = self.readCsvData()
            if  len(csvData) == 0:
                print "ERROR: CSV data in file '%s' not found. Empty file?" % self.csvFName
                sys.exit(1)
            recData = self.makeRecData(csvData)
            self.writeRecData(recData)
        else:
            return 'info'
        return 'done'     

    def getColData(self):
        fields = self.domData.getElementsByTagName("fields")[0]
        fields = fields.getElementsByTagName("field")
        colsData = []
        for field in fields:
            colData = {}
            xmlCols = field.getElementsByTagName("col")
    
            for xmlCol in xmlCols:
                xmlColName = xmlCol.getAttribute('name')
                xmlColData = xmlCol.firstChild.wholeText
                if  xmlColName == 'col_order': 
                    xmlColData = int(xmlColData)
                colData[xmlColName] = xmlColData
                
            if  'col_order' not in colData:
                # this is label, not data column
                continue
                
            if  colData['format'] == 'picture':
                # format picture has 2 data columns
                colData1 = colData.copy() # shallow copy
                colData2 = colData.copy()
                colData1['dbname'] = colData1['dbname'] + '_file' 
                colData2['dbname'] = colData2['dbname'] + '_data'
                colsData.append(colData1) 
                colsData.append(colData2) 
            else:    
                colsData.append(colData)
    
        colsData.sort(cmp = lambda x,y: cmp(x['col_order'], y['col_order']))
        return colsData

    def getColFormat(self):
        data = {}
        for item in self.colData:
            data[item['dbname']] = item['format']
        return data
    
    def getXmlUText(self, node):
        data = []
        node = node.firstChild;
        while(True):
            if  not node:
                break
            if  node.nodeType == Node.ELEMENT_NODE:
                data.append(self.getXmlText(node))
            if  node.nodeType == Node.TEXT_NODE:
                data.append(node.wholeText)
            node = node.nextSibling
        return u''.join(data)
    
    def makeCsvData(self):
         csv = []
    
        #header
        row = []
        for column in self.colData:
            row.append(column['dbname'])
        csv.append(row)
        
        #data
        xmlRecords = self.domData.getElementsByTagName('rec')
        for i, xmlRecord in enumerate(xmlRecords):
            xmlFields = xmlRecord.getElementsByTagName('field')
            uData = {}
            uRow  = []
            for xmlField in xmlFields:
                dbname = xmlField.getAttribute('name')
                uText  = self.getXmlUText(xmlField).strip() # return unicode text
                format = self.colFormat[dbname]
                if  uText == '' or format in ('text', 'int', 'picture'):
                    pass
                elif format in ('float', 'float_d1', 'float_d2', 'float_d3', 'float_d4'):
                    uText = uText.replace(' ', '')
                    uText = uText.replace('.', self.dPoint)
                elif format == 'date':
                    uText = "%s.%s.%s" % (uText[6:8], uText[4:6], uText[0:4])
                elif format == 'time':
                    uText = "%s:%s" % (uText[:2], uText[2:])
                else:
                    print "UNKNOWN FORMAT:", format 
    
                uData[dbname] = uText
    
            for column in self.colData:
                dbname = column['dbname']
                if  dbname in uData:
                    uRow.append(uData[dbname])
                else:
                    uRow.append('')
            csv.append(uRow)
        return csv
        
    def writeCsvData(self, data, encoding='utf-16', delimiter='\t', **kwds):
        try:
            fHandler  = open(self.csvFName, 'wb')
        except:
            print "ERROR: open file failed - '%s'" % self.csvFName
            print makeTextFromException()
            sys.exit(1)
        try:
            csvWriter = UnicodeCsvWriter(fHandler, encoding=encoding, delimiter=delimiter)
            csvWriter.writerows(data)  
        except:
            print "ERROR: write to file failed - '%s'" % self.csvFName
            print makeTextFromException()
            sys.exit(1)
        try:
            fHandler.close()
        except:
            print "ERROR: close file failed - '%s'" % self.csvFName
            print makeTextFromException()
            sys.exit(1)
        
    def readCsvData(self, encoding='utf-16', delimiter='\t', **kwds):
        try:
            fHandler  = open(self.csvFName, 'rb')
        except:
            print "ERROR: open file failed - '%s'" % self.csvFName
            print makeTextFromException()
            sys.exit(1)
        try:
            csvReader = UnicodeCsvReader(fHandler, encoding=encoding, delimiter=delimiter)
            data = []
            for row in csvReader:
                data.append(row)
        except:
            print "ERROR: read from file failed - '%s'" % self.csvFName
            print makeTextFromException()
            sys.exit(1)
        try:
            fHandler.close()
        except:
            print "ERROR: close file failed - '%s'" % self.csvFName
            print makeTextFromException()
            sys.exit(1)
        return data         
    
    def makeRecData(self, csvData):
        #cut header from csv data
        header  = csvData[0][:]
        csvData = csvData[1:]
        
        #make xml records from csv data
        recData = []
        for row in csvData:
            recData.append('    ')
            for i, cell in enumerate(row):
                dbname = header[i]
                try:
                    format = self.colFormat[dbname]
                except:
                    print "V SailForms XML souboru neexistuje sloupec '%s'" % dbname
                    print "V CSV souboru se nesmi menit nazvy sloupcu!"
                    print "Opravte prosim CSV soubor a provedte konverzi znova."
                    sys.exit(1) 
                if  format in ('text', 'int', 'picture'):
                    value  = cell.strip()
                elif format == 'date':
                    value  = cell.strip()
                    if  len(value) > 0:
                        tmp = value.split('.')
                        value = "%s%02d%02d" % (tmp[2], int(tmp[1]), int(tmp[0]))
                elif format == 'time':
                    value  = cell.strip()
                    if  len(value) > 0:
                        tmp = value.split(':')
                        value = "%02d%02d" % (int(tmp[0]), int(tmp[1]))
                elif format == 'float':
                    value  = cell.strip()
                    if  len(value) > 0:
                        value = value.replace(' ', '')
                        value = value.replace(self.dPoint, '.')
                        value = "%f" % float(value)
                elif format == 'float_d1':
                    value  = cell.strip()
                    if  len(value) > 0:
                        value = value.replace(' ', '')
                        value = value.replace(self.dPoint, '.')
                        value = "%.1f" % float(value)
                elif format == 'float_d2':
                    value  = cell.strip()
                    if  len(value) > 0:
                        value = value.replace(' ', '')
                        value = value.replace(self.dPoint, '.')
                        value = "%.2f" % float(value)
                elif format == 'float_d3':
                    value  = cell.strip()
                    if  len(value) > 0:
                        value = value.replace(' ', '')
                        value = value.replace(self.dPoint, '.')
                        value = "%.3f" % float(value)
                elif format == 'float_d4':
                    value  = cell.strip()
                    if  len(value) > 0:
                        value = value.replace(' ', '')
                        value = value.replace(self.dPoint, '.')
                        value = "%.4f" % float(value)
                else:
                    print "Neznamy format '%s'. Data budou formatovany jako text."
                    value  = cell.strip()
                if  value != '':
                    recData.append('      %s' % (dbname, value))
            recData.append('   
')
        return '\n'.join(recData)
    
    def writeRecData(self, recData):
        try:
            fi = open(self.sfIFName, 'rt')
        except:
            print "ERROR: open file failed - '%s'" % self.sfIFName
            print makeTextFromException()
            sys.exit(1)
        try:
            fo = open(self.sfOFName, 'wt')
        except:
            print "ERROR: open file failed - '%s'" % self.sfOFName
            print makeTextFromException()
            sys.exit(1)
        try:    
            mode = 1
            for line in fi:
                if  mode == 1: # copy header
                    fo.write(line)
                    if  line.strip() == '': # save new data
                        fo.write(recData.encode('utf-8'))
                        mode = 2
                elif mode == 2: # skip old data
                     if  line.strip() == '
':
                         fo.write(line)
                         mode = 3
                elif mode == 3: # copy footer
                    fo.write(line)
        except:
            print "ERROR: Create new XML file '%s' failed!" % self.sfOFName
            print makeTextFromException()
            sys.exit(1)
        try:
            fi.close()
        except:
            print "ERROR: close file failed - '%s'" % self.sfIFName
            print makeTextFromException()
            sys.exit(1)
        try:             
            fo.close()
        except:
            print "ERROR: close file failed - '%s'" % self.sfOFName
            print makeTextFromException()
            sys.exit(1)
           
           
#############################################################################
# unicode wrappers for CSV module
# Taken from example in Python 2.5 Documentation / CSV module
#

import csv, codecs, cStringIO

class UTF8Recoder:
    def __init__(self, f, encoding):
        self.reader = codecs.getreader(encoding)(f)
    def __iter__(self):
        return self
    def next(self):
        return self.reader.next().encode("utf-8")

class UnicodeCsvReader:
    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        f = UTF8Recoder(f, encoding)
        self.reader = csv.reader(f, dialect=dialect, **kwds)
    def next(self):
        row = self.reader.next()
        return [unicode(s, "utf-8") for s in row]
    def __iter__(self):
        return self

class UnicodeCsvWriter:
    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        self.queue   = cStringIO.StringIO()
        self.writer  = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream  = f
        self.encoder = codecs.getincrementalencoder(encoding)()
    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        data = self.queue.getvalue()
        data = data.decode("utf-8") 
        data = self.encoder.encode(data)
        self.stream.write(data)
        self.queue.truncate(0)
    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

#############################################################################
# universal error handler
# (c) Petr Mach 2012, petr.mach.cz@gmail.com
#

import sys, inspect

def getTextFromCode(code, offset, linenu):
    ''' Print relevant part source code. '''
    code = code[:offset+1]
    code.reverse()

    shortCode = []
    for i, line in enumerate(code):
        sLine = line.strip()
        if  sLine[:3] == 'def':
            shortCode.insert(0, line)
            break
        if  sLine[:5] == 'class':
            shortCode.insert(0, line)
            break
        if  sLine == '' and i > 10:
            break
        shortCode.insert(0, line)

    offset = len(shortCode)-1
    TEXT = []
    for i, line in enumerate(shortCode):
        number = linenu-offset+i
        line   = line[:-1]
        TEXT.append("%5d %s" % (number, line))
    return '\n'.join(TEXT)

def getTextFromDict(d, title=None):
    ''' Pretty print of dict. '''
    TEXT = []
    if  title:
        TEXT.append(title)
    keys = d.keys()
    keys.sort()

    keyMaxLen = 0
    for key in keys:
        if  len(key) > keyMaxLen:
            keyMaxLen = len(key)

    keyMaxLen += 1
    pattern = '%%%ds = %%-17s %%s' % keyMaxLen
    for key in keys:
        TEXT.append(pattern % (key, type(d[key]), unicode(d[key])[:60].__repr__()))
    return '\n'.join(TEXT)    

def getTextFromStrerr(strerr):
    TEXT = []
    TEXT.append('\nERROR:')
    TEXT.append(str(strerr))
    return ' '.join(TEXT)

def isIter(obj):
    try:
        iter(obj)
    except:
        return False
    if  isinstance(obj, type('')):
        return False
    return True

def makeTextFromException(size='full'):
    ''' Print information about exception. '''
    TEXT = []

    typerr,strerr,c = sys.exc_info()
    TEXT.append('\nEXCEPTION: ' + str(typerr))
    traceback = inspect.trace(100)

    last = len(traceback) - 1
    for i, (frame, file, linenu, object, code, offset) in enumerate(traceback):
        if  code and code[offset]:
            example = code[offset].strip()
        else:
            example = ''

        if  size == 'full' or i == last:    
            TEXT.append('%s/%d %s: %s' % (file, linenu, object, example))

        if(i == last):
            if  code and size == 'full':
                TEXT.append(getTextFromCode(code, offset, linenu))
            TEXT.append(getTextFromStrerr(strerr))
            if  size == 'full':
                TEXT.append(getTextFromDict(frame.f_locals, 'LOCAL VARS:'))

    TEXT.append('----------')
    return '\n'.join(TEXT)

           
           
if __name__ == '__main__':
    print sys.argv
    prog = sys.argv[0]
    info = """SailForms Converter 1.0 / Python 2.5
(c) Petr Mach 2012, petr.mach.cz@gmail.com        
Convert XML backup data from SailForms to CSV and vice versa.
CSV FORMAT:
    value separator: TAB
    text delimiter:  "
    encoding:        Unicode (UTF-16 LE)
    decimal point:   depending on locale or optional parameter
HOW TO USE:
    python %s MODE FILENAME [DP]
       FILENAME - name of XML backup file, for example data.xml
       MODE     - GET or STORE
       [DP]     - optional, decimal point in CSV file  
EXAMPLES:
    python %s GET data.xml
       - need:   data.xml
       - create: data.xml.csv
    python %s STORE data.xml  
       - need:   data.xml, data.xml.csv
       - create: data.xml.csv.xml               
WARNING: 
    Do not delete data.xml after creation of data.xml.csv.
    This file is needed to create data.xml.csv.xml."""
    info = info % (prog, prog, prog)

    if len(sys.argv) < 3:
        print info
        sys.exit()
    
    mode     = sys.argv[1]  # GET, STORE
    sfIFName = sys.argv[2]  # sf_backup.xml
    
    locale.setlocale(locale.LC_ALL, '')
    cLocale = locale.localeconv()
    dPoint  = cLocale['decimal_point']
    if  len(sys.argv) > 3:
        dPoint = sys.argv[3]
       
    convertor = SFConvertor(sfIFName, dPoint)
    res = convertor.run(mode)
    if  res == 'info':
        print info
    else:
        print 'DONE'

Žádné komentáře:

Okomentovat