library('RODBC')
ch <-odbcConnect("XE",uid="HR",pwd="HR")
sqlTables(ch,schema='HR')
The HR schema contains Human Resources data for a fictional company. The data consists of employee data, geographic areas, organizational departments and job position. An entity relationship diagram (ERD) of the schema is as follows:
Employees have salary data, so lets start with some simple univariate analysis.
It is possible to retrieve an entire table using sqlFetch. This seems to be the rather R way of thinking about work - a table corresponds with a data frame, get a data frame into R and manipulate the data from there.
r=sqlFetch(ch,'EMPLOYEES')
plot(sort(r2$SALARY, decreasing=TRUE))

Of course, this means that a large amount of unused data was accessed and retrieved over the network. This data was stored in memory by R. For simple examples using demonstration schemas, this is not an issue. In real world situations where the amount of data is growing significantly, it makes sense to reduce the amount of data being accessed and returned. It is better to request only the data that is of specific interest using an SQL query.
sql='select salary from employees order by 1 desc'
r=sqlQuery(ch, sql)
plot(r)
plot(r$SALARY)
The class of r$SALARY is a class of type integer. The class of r itself is a data frame. R functions often handle objects of various types without a problem - they simply do what you would expect. This is one small example of the benefit of paying closer attention to the object that you are manipulating. You also need to pay attention to the function in use. Consider the ggplot2 library qplot function.
qplot(r$SALARY)
This call results in a histogram. The bottom line on this: Many of the examples of R that I see - my own included - consist of elaborate function calls and rather obscure parameter settings. This can lead to a misconception about how the various functions actually work.
This post introduced the sqlFetch and sqlQuery functions from the RODBC package.



0 comments:
Post a Comment