Here is an interesting demo by Doug Engelbart from 1968, showing how a computer could be used in an office setting...
The quality isn't the best, but the demo is very interesting. This would have been very exciting to work on.
peace
Tuesday, June 30, 2009
Thursday, June 18, 2009
Tab Delimited Renderer in SSRS 2005
I was recently tasked with coming up with a tab delimited renderer in SSRS 2005.
My first thought was to use the CSV renderer and add a config entry in the ReportServer.config file to use a tab as a field delimiter.
However, in SSRS 2005, Microsoft clears any whitespace when reading the config file for the CSV renderer, meaning that the tab character does not work.
So I set about researching how to create a tab renderer.
The first article I read stated that you would have to visit Redmond, WA and sit with an SSRS developer to even think about developing a rendering extension. I didn't buy this and kept searching.
There isn't a lot out there on rendering extensions, but I did find several references to the following assemblies:
So with that, I started to develop a prototype solution to the tab rendering problem.
My first decision was to only render tables and matrices. The SSRS CSV renderer tries to render textboxes, which only leads to malformed CSV files. And since most of the time the tab delimited export is going to be loaded into Excel, I decided to just focus on the table and the matrix control.
More to come in the next post...
My first thought was to use the CSV renderer and add a config entry in the ReportServer.config file to use a tab as a field delimiter.
However, in SSRS 2005, Microsoft clears any whitespace when reading the config file for the CSV renderer, meaning that the tab character does not work.
So I set about researching how to create a tab renderer.
The first article I read stated that you would have to visit Redmond, WA and sit with an SSRS developer to even think about developing a rendering extension. I didn't buy this and kept searching.
There isn't a lot out there on rendering extensions, but I did find several references to the following assemblies:
Microsoft.ReportingServices.Interfaces
Microsoft.ReportingServices.ReportRendering
So with that, I started to develop a prototype solution to the tab rendering problem.
My first decision was to only render tables and matrices. The SSRS CSV renderer tries to render textboxes, which only leads to malformed CSV files. And since most of the time the tab delimited export is going to be loaded into Excel, I decided to just focus on the table and the matrix control.
More to come in the next post...
Friday, March 6, 2009
Last Update Dates
It's been a while... We are well into deployment and finalization of this version of our ETL and reporting system. Things are going well.
Recently I ran into an interesting ETL problem while using a source system "last update" field. Let me give you some background.
We have an ETL process that reads from a source system that was developed in-house. The queries were all based on the last update field in all of the tables.
While in UAT, several reports were reported as missing rows. After investigating, it appeared that the rows had never made it to the data mart. Needless to say, this was very worrisome.
I researched and tried to find out why these rows were excluded. There seemed to be no pattern, just random rows.
While looking at my morning logs, I noticed something strange. The ETL Last Update table showed times from about 5 hours after the ETL had run. It should have been when the ETL had run.
I looked in the source system, and there were 3 rows that had update dates in the future! 5 hours to be exact!
It turns out that under certain circumstances, the source system was using the wrong date/time to update the last update field. And this date was GMT, so it was 5 hours in the "future" when it was applied to the last update field.
The result of this was missed records on the ETL. It would miss 5 hours worth of updates anytime this occured in the source system.
So my recommendation, which I am now kicking myself for not implementing to begin with, is this: Always use a date range, not just a "Greater than" for last update fields. For example:
And the other rule... Never trust the source system to be accurate 100% of the time. Anticipate issues like this.
Anyway, that's all for now.
peace
Recently I ran into an interesting ETL problem while using a source system "last update" field. Let me give you some background.
We have an ETL process that reads from a source system that was developed in-house. The queries were all based on the last update field in all of the tables.
While in UAT, several reports were reported as missing rows. After investigating, it appeared that the rows had never made it to the data mart. Needless to say, this was very worrisome.
I researched and tried to find out why these rows were excluded. There seemed to be no pattern, just random rows.
While looking at my morning logs, I noticed something strange. The ETL Last Update table showed times from about 5 hours after the ETL had run. It should have been when the ETL had run.
I looked in the source system, and there were 3 rows that had update dates in the future! 5 hours to be exact!
It turns out that under certain circumstances, the source system was using the wrong date/time to update the last update field. And this date was GMT, so it was 5 hours in the "future" when it was applied to the last update field.
The result of this was missed records on the ETL. It would miss 5 hours worth of updates anytime this occured in the source system.
So my recommendation, which I am now kicking myself for not implementing to begin with, is this: Always use a date range, not just a "Greater than" for last update fields. For example:
Where LastUpdate Between '3/5/09 11:00:00' and GetDate()
And the other rule... Never trust the source system to be accurate 100% of the time. Anticipate issues like this.
Anyway, that's all for now.
peace
Thursday, December 18, 2008
PerformancePoint Server 2007
We have begun the task of implementing PerformancePoint Server 2007 (PPS) on top of our existing data marts. Right now we are in the early stages of the requirements gathering. However, I have had a good chance to play with PPS and it is very slick!
I would highly recommend anyone needing a dashboard application to download the evaluation version and spend some time on it. You can download it here.
You will see some more blogs about it in the very near future because some of the stuff this can do is worth writing about. Also, some of the pitfalls that we have hit are interesting as well!
peace
I would highly recommend anyone needing a dashboard application to download the evaluation version and spend some time on it. You can download it here.
You will see some more blogs about it in the very near future because some of the stuff this can do is worth writing about. Also, some of the pitfalls that we have hit are interesting as well!
peace
Wednesday, December 10, 2008
DateTime Columns in Slowly Changing Dimension Component
We had an interesting error when implementing a slowly changing dimension in SSIS this week.
We had a date column that we were passing thru to the database from the source. However, the source was a script task because it came in from a multi-resultset stored procedure. In other data flows, where we were using an OLE DB destination, we typed datetime columns as DT_DATE. However, when we used DT_DATE with the slowly changing dimension component, it threw an error:
peace
We had a date column that we were passing thru to the database from the source. However, the source was a script task because it came in from a multi-resultset stored procedure. In other data flows, where we were using an OLE DB destination, we typed datetime columns as DT_DATE. However, when we used DT_DATE with the slowly changing dimension component, it threw an error:
Error at Import Data [Slowly Changing Dimension]:The input column "input column "ALDATE (15157)" cannot be mapped to external column "external column "ALDATE (15160)" because they have different data types. The Slowly Changing Dimension transformation does not allow mapping between column of different types except for DT_STR and DT_WDTR.After much digging and research, we determined that we had to set the output column type on the script component to DT_DBTIMESTAMP. Once we changed that, the SCD worked just fine!
peace
Friday, December 5, 2008
Importing CSV Files with SSIS
One of the shortcomings of the comma separated value (CSV) file import in SSIS is the inability for SSIS to determine if there are too few fields in a record.
According to Microsoft, they only check for the end of record indicator (Ususally a CR/LF) when processing the last column in the file. If you are not on the last column, and a CR/LF is encountered, it is ignored and the first field of the new line is added to the last column of the previous line.
Here is an example of what you see:
So what I have been doing to gracefully catch and report on this is to quickly pre-process the file to count the columns. Since the processes that create the CSV files for my app are automated, I can assume that every record in the file has the same number of fields, and I only need to count 1 row. If you cannot assume this, it would be just as easy to loop thru each row and determine the field count.
I would also like to point out that my code does not yet handle commas embedded within quote delimited fields. For example, if your record looked like this:
Ok, so here is what I am doing.
I set up a script task in the control flow, before I try to open the file with ssis. This is the code I use:
When this code runs, it generates an error if the field count is not correct.
If you need to check multiple file layouts, you can put the script task into a foreach container in SSIS and use database lookups to get the field length based on the filename, making this code totally reusable!
Also note... I actually converted this code into a C# dll and put it in the GAC on the SSIS server so all I have to do is call that dll in my script task. I will blog about that soon.
Hope this helps!
peace
According to Microsoft, they only check for the end of record indicator (Ususally a CR/LF) when processing the last column in the file. If you are not on the last column, and a CR/LF is encountered, it is ignored and the first field of the new line is added to the last column of the previous line.
Here is an example of what you see:
Now, when troubleshooting this error, you can be looking in the wrong place, since the error usually shows up only at the end of the file.
SSIS is set up to accept 3 columns in a CSV file: Col1, Col2 and Col3.
File comes in like this:
Col1,Col2
A1,A2
B1,B2
C1,C2
When ssis imports this file it produces this result:
Record 1: A1, A2B1, B2
Record 2: C1,C2 Error - Column Delimiter Not Found
So what I have been doing to gracefully catch and report on this is to quickly pre-process the file to count the columns. Since the processes that create the CSV files for my app are automated, I can assume that every record in the file has the same number of fields, and I only need to count 1 row. If you cannot assume this, it would be just as easy to loop thru each row and determine the field count.
I would also like to point out that my code does not yet handle commas embedded within quote delimited fields. For example, if your record looked like this:
1,"this is um, the field", 1,1I will be posting code for that next week sometime.
Ok, so here is what I am doing.
I set up a script task in the control flow, before I try to open the file with ssis. This is the code I use:
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO
Public Class ScriptMain
Public Sub Main()
Dim sr As StreamReader
Dim Textline As String
Dim Fields As String()
sr = File.OpenText("C:\FileName.csv")
If Not sr.EndOfStream Then
sr.ReadLine() 'Skip 1 row (Header)
End If
If Not sr.EndOfStream Then
Textline = sr.ReadLine() 'Read in the entire line of text.
Fields = Textline.Split(Convert.ToChar(",")) ' Split the line of text into a string array.
Else
sr.Close()
sr.Dispose()
Dts.TaskResult = Dts.Results.Failure 'There were no records after the header. This might not be a failure in all implementations.
Exit Sub
End If
sr.Close()
sr.Dispose()
If Fields.Length <> 25 Then ' If there are not 25 fields in the record, raise error.
Dts.TaskResult = Dts.Results.Failure
Err.Raise(vbObjectError + 513, Nothing, "Field Count is Invalid! 25 expected, " + Fields.Length.ToString() + " received.")
Exit Sub
End If
Dts.TaskResult = Dts.Results.Success
End Sub
End Class
When this code runs, it generates an error if the field count is not correct.
If you need to check multiple file layouts, you can put the script task into a foreach container in SSIS and use database lookups to get the field length based on the filename, making this code totally reusable!
Also note... I actually converted this code into a C# dll and put it in the GAC on the SSIS server so all I have to do is call that dll in my script task. I will blog about that soon.
Hope this helps!
peace
Tuesday, December 2, 2008
Using GUID Data Type in SQL Command Task
I have now run into this problem several times, so I thought I would write it down.
Just had a colleague come to me with an issue of the SQL Command task not returning the data it should have.
Looking at his stored proc, everything seemed fine. The proc took in a UniqueIdentifier and returned a SELECT from a table.
The SQL Command task in SSIS passed a GUID and put the result set into an object variable.
However, by watching Profiler, we captured that the value being sent to the stored procedure parameter from the SQL Command task was not the GUID that was being processed in SSIS.
So on the parameter mapping tab we changed the data type of the parameter from GUID to VARCHAR(70).
After that the SQL Command returned the rows that were expected.
I hope this helps someone out there
Just had a colleague come to me with an issue of the SQL Command task not returning the data it should have.
Looking at his stored proc, everything seemed fine. The proc took in a UniqueIdentifier and returned a SELECT from a table.
The SQL Command task in SSIS passed a GUID and put the result set into an object variable.
However, by watching Profiler, we captured that the value being sent to the stored procedure parameter from the SQL Command task was not the GUID that was being processed in SSIS.
So on the parameter mapping tab we changed the data type of the parameter from GUID to VARCHAR(70).
After that the SQL Command returned the rows that were expected.
I hope this helps someone out there
Subscribe to:
Posts (Atom)