Search This Blog

Wednesday, February 13, 2013

Symfony Project sfservermon - Post 2 Data (awk digression)

This is the second post in the series, the first is here

The intention of this project is work with existing data - the schema of the database will be unchanged.
For development I wanted some test data - also the configuration data was required.

The aim of this post is to have some development data which can be loaded as required into the database while developing the system.
The format of the data files is yml, loaded using the excellent bundle from Khepin - , I discuss using his bundle here.


This post - which is a digression into the world of sed and awk, discusses generating yml fixtures from csv files, however as I only wanted a sample 1 - 5 rows, I decided to see if I would read parse the data from MySQL statements, particular those formatted with the \G separator, an example:

mysql> select * from monitor_log limit 3\G
*************************** 1. row ***************************
   log_id: 1
server_id: 4
     type: status
  message: Server 'press release' is DOWN: ip=http://www.spec-net.com.au/presstest/1009/spec_211009.htm, port=80. Error=404 Not Found
 datetime: 2012-11-15 15:26:07
  user_id: 
*************************** 2. row ***************************
   log_id: 2
server_id: 4
     type: status
  message: Server 'press release' is RUNNING: ip=http://www.spec-net.com.au/presstest/1009/spec_211009.htm, port=80
 datetime: 2012-11-15 15:30:21
  user_id: 

Note that the table name has a space after it, so if not using limit or order by:

mysql> select * from monitor_config \G

To me the data looks very much like the needed yml.
Awk is the perfect tool for the editing required.

Strictly speaking this development was not required at all,  I could export and reimport the database and only keep the required rows.
This post is a (perhaps) useful documentation of how awk can be used.

The first version:


BEGIN { } 
/^mysql> select/ { table = $5;sub(/\\G/, "", table);
printf "model: %s\n", table;
printf "persistence: orm\n"
printf "fixtures:\n";
        primaryKey= substr(table "_id", prefixLen+1) ":";
}
/\*+.* row/ { printf "  %s_%0.4d:\n",table,++fixture ; }
/^[^\*]*: NULL$/    { printf "    %s ~\n",$1; next; }

/^[^\*]*: .*/    {s = ""; for (i = 2; i <= NF; i++) s = s $i " "; 
                        printf "    %s %s\n",$1,s; }


END { }
Notes:
  1. The begin and empty actions are currently empty.
  2. The table select statement is matched and parsed to get the table name.
  3. The model name and header lines are printed.
  4. The end of each row is matched "**** row" and then printed. All of the subsequence fields are concatenated and printed. Printf is used to get the space indented formatting required.
  5. NULL fields are matched and a tilde '~' printed.



This generates this output:

johnreidy$ awk -f datayml.awk.1 < log.list 

model: monitor_log
persistence: orm
fixtures:

  monitor_log_0001:
    log_id: 1 
    server_id: 4 
    type: status 
    message: Server 'press release' is DOWN: ip=http://www.spec-net.com.au/presstest/1009/spec_211009.htm, port=80. Error=404 Not Found 
    datetime: 2012-11-15 15:26:07 
    user_id: 
  monitor_log_0002:
    log_id: 2 
    server_id: 4 
    type: status 
    message: Server 'press release' is RUNNING: ip=http://www.spec-net.com.au/presstest/1009/spec_211009.htm, port=80 
    datetime: 2012-11-15 15:30:21 
    user_id: 


As a first pass this is not bad.
Second Pass - remove primary keys and format dates.
The import files cannot have a primary key fields - log_id, doctrine will report an error if they are present, this makes sense as the entities themselves do not have s setId method.
The problem is that the id fields are not id and not even <table_name>_id, e.g. monitor_log_id.
These tables all have a prefix of monitor_. This is passed to the awk script as an environment variable - prefix.
BEGIN { prefix=ENVIRON["prefix"]== "" ? "": ENVIRON["prefix"] "_"; prefixLen= length(prefix); }
/^mysql> select/ { table = $5;sub(/\\G/, "", table);
        printf "model: %s\n", table;
        printf "persistence: orm\n"
        printf "fixtures:\n";
        primaryKey= substr(table "_id", prefixLen+1) ":";
}       
/\*+.* row/ { printf "  %s_%0.4d:\n",table,++fixture ; }
$1 == primaryKey { next; }
/^[^\*]*: NULL$/    { printf "    %s ~\n",$1; next; }
/^[^\*]*: [0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9].*/ {
                s = ""; for (i = 2; i < NF; i++) s = s $i " "; s = s $i;
                printf "    %s \"%s\"\n",$1,s; next; }
/^[^\*]*: .*/    {s = ""; for (i = 2; i < NF; i++) s = s $i " "; s = s $i;
                        printf "    %s %s\n",$1,s; }
        
END { } 
This now generates:
model: monitor_log
persistence: orm
fixtures:
  monitor_log_0001:
    server_id: 4 
    type: status 
    message: Server 'press release' is DOWN: ip=http://www.spec-net.com.au/presstest/1009/spec_211009.htm, port=80. Error=404 Not Found 
    datetime: "2012-11-15 15:26:07"
    user_id: 
  monitor_log_0002:
    server_id: 4 
    type: status 
    message: Server 'press release' is RUNNING: ip=http://www.spec-net.com.au/presstest/1009/spec_211009.htm, port=80 
    datetime: "2012-11-15 15:30:21"
    user_id: 

I also edited the loop to not append the space after the last field.
Also quotes are put around the date fields.

Plural Table Names.

This is good but there was a problem with another table - monitor_users:
model: monitor_users
persistence: orm
fixtures:
  monitor_users_0001:
    user_id: 1 
    server_id: 4,3,5,6,9,10,15 
    name: John Reidy 
    mobile: 61415552311 
    email: johnr@fakemonitor.net.au 
  monitor_users_0002:
    user_id: 2 
    server_id: 4,3,5,6,9,10 
    name: Fred James 
    mobile: 61912555231 
    email: fredj@fakemonitor.net.au 
The table name is a plural - users but the primary key is user_id.
But this stage I wasn't sure if using awk was a good approach but I pushed on.
The solution is to add an extra statement:
/^mysql> select/ { table = $5;sub(/\\G/, "", table);
printf "model: %s\n", table;
printf "persistence: orm\n"
printf "fixtures:\n";
        sub(/s$/, "", table);
        primaryKey= substr(table "_id", prefixLen+1) ":";
        next;
}

One possibility is to add an extra match which would be more awkish.
One more hack to the awk was needed, the data has not null fields, but with a single space in them, this was not being picked up by awk I think it is not handled by the tee statement (used to capture the output from mysql).
/^[^\*]*: $/    { printf "    %s \" \"\n",$1; next; }

This prints a " " sequence if there is no data.

Adding the Correct Model Statement.

This is now pretty close. However the model statement needs correction, it should read:
model: JMPR\ServerMonBundle\Entity\MonitorUsers
rather than
model: monitor_user

As this requires a change to title case for the entities, sed appears to be a better tool.
So I run the generated file through a sed postprocessor script:

sed  -e '/^model/{s/[^ _\\]\+/\L\u&/g}' \
     -e '/^Model/{s/_//g}' \
     -e "/^Model/{s#Model: #model: ${model}#}" 

This does the following:
  1. Change the table name to title case.
  2. Remove the underscores.
  3. Revert the model tag back to model from Model (as changed in step 1) and prepend the contends of the model environment variable.
Wrapped in a script this becomes:

export model='JMPR\\ServerMonBundle\\Entity\\'
export prefix='monitor'
file=${file:-10_config.yml}

destpath=$bundle/DataFixtures
scriptdir=/Users/johnreidy/sfprojects/sfservermon/util
sourcepath=$scriptdir/input
cat $sourcepath/$file |awk -f $scriptdir/data4.awk  | \
sed  -e '/^model/{s/[^ _\\]\+/\L\u&/g}' \
     -e '/^Model/{s/_//g}' \
     -e "/^Model/{s#Model: #model: ${model}#}" \
> ${destpath}/${file}
echo "output to: ${destpath}/${file}"


With the data load, the next post covers the controllers and templates to view the server list and logs.


No comments:

Post a Comment