Sunday, 29 May 2016

Sqoop & SQL Server #8 - Incremental Imports

So far the examples shown have imported whole or subsets of tables. But if the data load is a regular occurrence you'll need to reload data already transmitted and overwrite the existing target. This is inefficient. The Sqoop import command allows incremental data loads using either a incrementing integer column or a datetime column as a reference at the point of load.

Caution : Both methods assume the table is being populated with new data but existing data is not updated or deleted.

For both methods an argument --incremental is required but that argument requires a different parameter depending on the type of load. Let's start with using an incrementing integer column which uses the parameter append.

--incremental append

Two other arguments must be supplied. --check-column which defines the column in which the incrementing integer is stored and --last-value which unsurprisingly stores the previous maximum value for the integer column. The syntax is,

--check-column [column name]
--last-value [n]

An example of these arguments in an import command is,

sqoop import --connect "jdbc:sqlserver://10.88.12.0\\SQL2014;database=Utility;username=ClouderaAccess;password=GiveMeData" --table Customers --incremental append --check-column CustomerID --last-value 1000

The second option is to use a datetime column. That requires the parameter lastmodified for the argument --incremental like this,
--incremental lastmodified

We continue to use check-column and last-value but in this case the value has to be a date contained within speech marks. Here is an example,

sqoop import --connect "jdbc:sqlserver://10.88.12.0\\SQL2014;database=Utility;username=ClouderaAccess;password=GiveMeData" --table Customers --incremental lastmodified --check-column CreatedDateTime --last-value "2016-05-23 21:11:00"

For both methods Sqoop merges the data together using a MapReduce job.

Recurring Incremental Data Loads
An extension to this functionality allows Sqoop to remember the last value in a local configuration database called the Sqoop Metastore. This database can store jobs which can be executed by any cluster user or via an automated script. SQL Server pros can compare this to the MSDB database which stores SQL Agent jobs and configuration. Sqoop has no scheduler but another Apache tool named Oozie can be used to create scheduled jobs. Oozie is also installed on the Cloudera Quickstart VM.

The main benefit of creating a job is that you don;t need to remember the last integer value or date. Sqoop handles that for you.

To create a recurring job you need to define it. To do this use the Sqoop job command along with the argument --create and give the job a name. Here is the syntax,

sqoop job --create [my job name]

After this create statement we define the job with the import command and the arguments required. This saves a job in the Sqoop metastore with the name you've defined. The job can be run using the exec argument. For a job named testjob the example is as follows,

sqoop job --exec testjob

To list the job in the metastore use the following argument,

sqoop job --list

You can display the definition of a job with the show argument which requires a parameter to define the name of the job you wish to display. For example,

sqoop job --show testjob

And if a job is no longer needed you can remove it from the Sqoop metastore using the --delete argument which also requires the name of the job as a parameter. For example,

sqoop job --delete testjob

More about the Sqoop Metastore
Sqoop uses a small footprint database called HSQLDB which uses a a HyperSQL database to store it's metadata. This database can be shared so other users can use the jobs stored within it. However, if you're going to share jobs across multiple users you may want to move the metastore to a more familiar database like MySQL because the metastore can get corrupted and in a production environment you'll want good maintenance procedures.

Other users can connect to the metastore using the argument --meta-connect with the IP address of the virtual machine. It uses port number 16000 by default. For example,

sqoop job --create testjob \ --meta-connect jdbc:hsqldb:hsql://[VM IP Address]:16000/sqoop \

A better model for incremental loads
In reality if you have administrative access over the source SQL Server or a good relationship with your DBAs you're better off pushing the logic for identifying the new data back into a SQL Server view and using Sqoop to import it.



This is all well for tables with only insert transactions but what about tables that contain update or delete transactions. How do you reflect the changes in HDFS. For that you can use the Sqoop merge command.

Saturday, 28 May 2016

Sqoop & SQL Server #7 - Re usability with Options Files

Using an Options File for Reusability
An Options File is a text file containing code to be called as part of a Sqoop job. They are useful when using repeat code often. You can share them between developers and the interact easily with the rest of the Sqoop command you're running.

To create an Options File simply create a text file with the elements of Sqoop syntax you wish to repeatedly call. Functionality you need to be aware of includes;
  • Each line identifies an option in the order that it appears otherwise on the command line
  • You must specify a single option on multiple lines by using the back-slash character at the end of intermediate lines.
  • Comments are supported when the line is prefixed with the hash character. Comments must be specified on a new line and may not be mixed with option text.
  • All comments and empty lines are ignored.
  • Unless options appear as quoted strings, any leading or trailing spaces are ignored.
  • Quoted strings if used must not extend beyond the line on which they are specified.
  • Option files can be specified anywhere in the command line as long as the options within them follow the otherwise prescribed rules of options ordering.

A simple example,
import
--connect
jdbc:sqlserver://10.88.12.0\\SQL2014
--database Utility
--username ClouderaAccess
--password GiveMeData

An example with comments,
#
# Options File for Sqoop Import
#

# Command
import

# Connection String
jdbc:sqlserver://10.88.12.0\\SQL2014
--database Utility

# Credentials
--username ClouderaAccess
--password GiveMeData

# End

Using an Options File
To call the file use the command --options-file plus any other arguments relevant to the file being called, in this case the table to be imported.  We'll use the Options File defined in the examples above.

sqoop --options-file /users/cloudera/sqlserver/optionsfiles/StandardImportConnection.txt --table Customers

And this is equivalent to,



sqoop import --connect "jdbc:sqlserver://10.88.12.0\\SQL2014;database=Utility;username=ClouderaAccess;password=GiveMeData" --table Customers

Friday, 27 May 2016

Sqoop & SQL Server #6 - Sqoop Import Functionality

The standard import syntax is quite simple. All statements start with sqoop followed by a command. In this example the command is import. 

sqoop import --connect "jdbc:sqlserver://10.88.12.0:1433;database=Utility;username=ClouderaAccess;password=GiveMeData" --table Customers

Import has the many possible arguments but for this simple example we use just 2.

Connect specifies the connection string in which the server, port, database, username and password need to be supplied.

--connect "jdbc:sqlserver://[server]:[port];database=[database];username=[username];password=[password]"

Table is just the name of the table or view to be loaded. Sqoop works views the same way it works with tables.

--table [tablename]

When the job is submitted Sqoop connects via JDBC to the source table to retrieve the columns and their datatypes. The SQL datatypes are mapped to Java datatypes at which point there may be some datatype mapping differences.

You also see the following warning.

Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.

Under The Hood
Don't worry about this warning. It is highlighting that you haven't got Apache Accumulo installed. You don't need it. It's a Key Value store tool like Apache HBase. For more information see the Apache Accumulo project.

A MapReduce job is then initiated to get the data and write it to HDFS. Within the terminal session you'll see log entries prefixed INFO mapreduce.Job displaying the percentage completion of the process.

Simple queries don't take long. For larger tables it's possible to increase performance by splitting the job across multiple nodes. Sqoop will use the Primary Key to judge how to split the table. If no primary key exists Sqoop will make a guess. You can control the column used with the --split-by command. But be careful to select a column which can be uniformly distributed otherwise some nodes will have little work to do whilst others will be overworked. 

When the process is complete the terminal session log will display a message similar to the following.

16/05/27 06:17:14 INFO mapreduce.ImportJobBase: Transferred 201.083 KB in 77.9396 seconds (2.58 KB/sec)
16/05/27 06:17:14 INFO mapreduce.ImportJobBase: Retrieved 1823 records.

Sqoop import has many possible arguments which cater for various use cases. The Sqoop User Guide has excellent documention for import syntax.

In the next posts we'll look at some of these more advanced import scenarios.

Transactional consistency
During a data load the source table must be consistent otherwise the data imported may not match the data at source. For example, in a transactional system if data is being inserted, updated or deleted during a long running data load some processes may extract inconsistent data or get blocked by other processes.

This is an important consideration to address with the administrators of the source system. It may be that you can arrange to have the data required replicated to a static source or maybe there is a maintenance window during which transactions cease.

Import from a Named Instance
When connecting to a SQL Server instance which is not the default instance the connection string is formatted differently.

This can be important is you have multiple SQL Server instances installed on a single server.

SQL Server instances which are not the default instance need to be named. A named instance is not connected to on the standard 1433 port. That is reserved for the default instance.

When connecting to a named instance you drop the port number and add the name. The format also changes slightly. In place of :port number we add \\named instance at the end of the host name of ip address.

jdbc:sqlserver://10.88.12.0\\SQL2014;database=Utility;username=ClouderaAccess;password=GiveMeData"

For example, here I reference a named instance called SQL2014.

sqoop import --connect "jdbc:sqlserver://10.88.12.0\\SQL2014;database=Utility;username=ClouderaAccess;password=GiveMeData" --table Customers

Import from a defined schema
By default SQL Server tables are created in the dbo schema. But there are many instances when other schemas are used for security or administrative reasons. 

Sqoop assume the default dbo schema has been used. If not, you need to define the schema being referenced.

To reference a schema add the following at the end of the list of arguments

-- --schema [SchemaName]

For example, here I reference a schema called TestSchema.

sqoop import --connect "jdbc:sqlserver://10.88.12.0:1433;database=Utility;username=ClouderaAccess;password=GiveMeData" --table Customers -- --schema TestSchema

Import all tables from a database
The import-all-tables command loads an entire database into HDFS in one go. However, depending on the size of the database that might not be the best solution. A one shot process which fails could require significant clean up before retrying and could take a long time to complete.

The command works through the source tables sequentially. To avoid unnecessary load on the RDBMS.

There are some other limitations of this command.
  • Each table must have a single-column primary key.
  • You must intend to import all columns of each table.
  • You must not intend to use non-default splitting column, nor impose any conditions via a WHERE clause.

If, despite these limitations, import all tables is the right approach for your use case the syntax is as follows,

sqoop import-all-tables --connect "jdbc:sqlserver://10.88.12.0:1433;database=Utility;username=ClouderaAccess;password=GiveMeData"

If you need to import all but a few tables, you can use the parameter --excludetables  that accepts a comma-separated list of table names. For example,

sqoop import-all-tables --connect "jdbc:sqlserver://10.88.12.0:1433;database=Utility;username=ClouderaAccess;password=GiveMeData" --exclude-tables cities,countries

Import into a defined folder
Sqoop will load your table into a directory with the same name in your HDFS home directory. If the folder already exists the job will fail to ensure existing data is not overwritten.

There are two options for specifying the target location. When loading a single table use the argument --target-dir. A sub folder named after your table will be created in the folder specified.

sqoop import --connect "jdbc:sqlserver://10.88.12.0\\SQL2014;database=Utility;username=ClouderaAccess;password=GiveMeData" --table Customers --target-dir /tmp/definedfolderexample -m 1

When sqooping a number of tables you can use the argument --warehouse-dir. Again a sub folder will be created under this root but in this case a separate folder is created for each table.

sqoop import --connect "jdbc:sqlserver://10.88.12.0\\SQL2014;database=Utility;username=ClouderaAccess;password=GiveMeData" --table Customers --warehouse-dir /tmp/definedfolderexample -m 1

Import a subset of data
There are occasions when you don't want to import an entire table. Sqoop defaults to taking all columns and all rows. You may want to restrict the columns or rows loaded. 

Selecting a subset of columns
Sqoop ships with an argument for the import command named --columns. To use this command you must also supply a comma separated list enclosed in speech marks. Here is the syntax,

-- columns "column1,column2,column3"

And an example,

sqoop import --connect "jdbc:sqlserver://10.88.12.0\\SQL2014;database=Utility;username=ClouderaAccess;password=GiveMeData" --table Customers --columns "FirstName,Surname,DateOfBirth" -m 1

Selecting a subset of rows
To restrict the rows loaded Sqoop uses a --where argument. The syntax is simple,

--where "[ColumnName]=n" or for string values --where "[ColumnName]='stringtext'"

And an example,

sqoop import --connect "jdbc:sqlserver://10.88.12.0\\SQL2014;database=Utility;username=ClouderaAccess;password=GiveMeData" --table Customers --where "MarketingOptIn = 1" -m 1

The text within the speech marks can include anything which the receiving RDBMS can handle. For example, you can use UDFs and T-SQL specific syntax. The arguments will be passed to the source system for each query submitted. Be careful not to define a clause so complex that on process locks the table and the other mappers cannot continue to work in parallel.

Using a free-form query
It is also possible to supply a free-form T-SQL query to Sqoop although there are limitations to the power of this method.

Limitations
  • You must specify a destination directory as Sqoop cannot simple use the table name
  • To use parallelism you must include $CONDITIONS at the end of your query, \$CONDITIONS if the query is wrapped in quotes
  • You must use --split-by as Sqoop cannot identify a primary key

The syntax is as follows,

$ sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
--split-by a.id --target-dir /user/results

If you have control over the source database it may preferable to construct a view which contains logic to present the table as required and thus remove the need for both the --column and --where arguments.

Import using multiple nodes
For large loads it is possible to increase performance by splitting the job across multiple nodes.

Sqoop will use the Primary Key to judge how to split the table. If no primary key exists Sqoop will make a guess. You can control the column used with the import argument --split-by command. But be careful to select a column which can be uniformly distributed otherwise some nodes will have little work to do whilst others will be overworked. 

The syntax for using --split-by is as follows.

--split-by [column name]

And an example of this argument in action is,

sqoop import --connect "jdbc:sqlserver://10.88.12.0\\SQL2014;database=Utility;username=ClouderaAccess;password=GiveMeData" --table Customers --split-by CreatedDateTime

Remember Sqoop will examine the column chosen to split by in order to divide the job across multiple nodes. Therefore choosing a column with good distribution of data will allow Sqoop to spread the load across nodes. Choosing a column with lots of repeated data or one value which is very common in the data will skew the import leading to some nodes finishing early and others taking a long time to complete.

You can control the number of mappers used with the import argument -m [n] where n is an integer. The syntax is very simple,

-m 4

And an example of this argument

sqoop import --connect "jdbc:sqlserver://10.88.12.0\\SQL2014;database=Utility;username=ClouderaAccess;password=GiveMeData" --table Customers --m 4

By default 4 tasks are used. During testing you may see better performance by adding more mappers. However, if you increase the number of mappers beyond the number of nodes in your cluster the mappers will run in series not parallel which defeats the objective. Equally the comes a point where splitting the work across nodes becomes self-defeating so testing to identify optimal performance is advised.

Into a defined file type
With Sqoop 1.4.6 there are 4 different file types available in the Sqoop import command. The different formats each have different advantages which but plain text is useful for most scenarios and can be used by various other Apache tools. Avro is commonly used when streaming data which is not Sqoops purpose. Parquet is commonly used to created efficient storage which can be used for fast analysis. SequenceFiles are used by the HDFS file system extensively behind the scenes.

  1. Plain text - Standard CSV format is the default for Sqoop
  2. Avro - A fast binary data format - see here for details
  3. Parquet - A columnar store format - see here for details
  4. SequenceFile - supports binary data types - see here for details

The syntax for the additional argument is similar for all four options.

--as-avrodatafile
--as-sequncefile
--as-textfile
--as-parquetfile

In this example we'll use Parquet,

sqoop import --connect "jdbc:sqlserver://10.88.12.0\\SQL2014;database=Utility;username=ClouderaAccess;password=GiveMeData" --table Customers --as-parquetfile

Wednesday, 25 May 2016

Sqoop & SQL Server #5 - Basic Sqoop Commands

Version
To check that sqoop is running you can execute a version command. All this does is return the currently installed version of sqoop but by returning that information you are confirming that sqoop is running.

sqoop version

The command has no arguments and returns the following;

16/05/26 08:37:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
Sqoop 1.4.6-cdh5.7.0
git commit id
Compiled by jenkins on Wed Mar 23 11:31:09 PDT 2016

List Databases
A simple connectivity check is to run a list-databases command. This command simply lists the databases your user is allowed to see.

sqoop list-databases --connect "jdbc:sqlserver://10.88.12.0:1433;username=ClouderaAccess;password=GiveMeData"

The syntax is very simple. You only need to use the connect argument.

Connect specifies the connection string in which the server, port, username and password need to be supplied.

--connect "jdbc:sqlserver://[server]:[port];username=[username];password=[password]"

The output is a simple list of databases

master
tempdb
model
msdb
Utility

List Tables
Similar to the list databases command is list-tables. 

sqoop list-tables --connect "jdbc:sqlserver://10.88.12.0:1433;database=Utility;username=ClouderaAccess;password=GiveMeData"

Again the syntax is pretty simple requiring just a connect argument. But in this case you need to specify the database in the connection string as well.

--connect "jdbc:sqlserver://[server]:[port];database=[database];username=[username];password=[password]"

Tuesday, 24 May 2016

Sqoop & SQL Server #4 - Setting up FTP between Windows host and Cloudera VM


By default the Cloudera VM is not configured to allow easy transfer of files between the Windows host and the VM. But this is easily fixed.

You can amend the Oracle VirtualBox settings to allow drag and drop from the host to the VM but that isn't going to help if you're connecting to a VM on your network or in the cloud. I prefer to use the more flexible method of using an FTP client.

First you'll need to amend the Oracle Box VM settings. Follow these steps;
  1. Shutdown the virtual machine
  2. Open Oracle VirtualBox Manager
  3. Select the Virtual Machine you intend to work with from the list
  4. Click Settings
  5. Navigate to the Network option
  6. Select the tab named Adaptor 2
  7. For the option Attached To select Host-only Adaptor
  8. For the option Name select VirtualBox Host-only Ethernet Adaptor
  9. Click OK to save the config
  10. Start the Virtual Machine

Next we need to identify the IP address the Virtual Machine has been given. To do so follow these steps,
  1. When the VM has started open a terminal session
  2. Type su and Enter
  3. Then enter the password 'cloudera'. You now have admin rights
  4. Type ifconfig
  5. Maximise the window if you haven't already. You'll see the network details listed.
  6. One of the blocks of text is prefixed Eth2 for Ethernet Adaptor 2. Make a note of the inet address within this text block. It ought to remain the same even after restarting the virtual machine.

The following video may help if my explanation isn't clear.

Finally we'll set-up an FTP client to connect between the host and VM. I prefer to use WinSCP because it not only functions well as an FTP client for simple tasks but it's also well-suited to more complex tasks and has command line options which can be use as part of a wider process. I've used it in conjunction with SSIS in the past. For more details see this post.

Follow these instructions to create a connection and send a file,
  1. Open WinSCP. It will display a menu of pre-existing connections. If this is the first time you're using it there will be only one item in the list called New Site.
  2. Click on New Site. This opens up a dialog box in which we will configure the connection.
  3. For File Protocol select SFTP
  4. In the Host Name type the IP Address you noted down from previous section.
  5. The Port Number should be set to 22 as that is the SFTP default, if not change it to 22.
  6. For the username type 'cloudera'
  7. And for the password type 'cloudera'
  8. Nothing needs to be changed in the Advanced tab so Save the connection and give it a meaningful name.
  9. The named connection now appears in the list.
  10. Ensure you VM is started and double click on the connection.
  11. It may take a few moments to establish a connection then you'll see a file explorer window with two panes, Windows on the left and the Cloudera VM on the right.



Sunday, 22 May 2016

Sqoop & SQL Server #3 - Getting Help with Sqoop

Getting Help
Sqoop commands are issued in a Terminal session. A number of commands exist. To access help and see details of commands use the following;
     
sqoop help

Remember Linux is case sensitive to be careful to type commands in the correct case.

The following table is returned.

Available commands:
     codegen                     Generate code to interact with database records
     create-hive-table           Import a table definition into Hive
     eval                        Evaluate a SQL statement and display the results
     export                      Export an HDFS directory to a database table
     help                        List available commands
     import                      Import a table from a database to HDFS
     import-all-tables           Import tables from a database to HDFS
     job                         Work with saved jobs
     list-databases              List available databases on a server
     list-tables                 List available tables in a database
     merge                       Merge results of incremental imports
     metastore                   Run a standalone Sqoop metastore
     version                     Display version information

Each command has specific arguments. The help command can display these arguments when you use the commands as a parameter. For example here is the syntax for displaying the arguments for the frequently used import command.

sqoop help list-databases


Common arguments:
  --connect <jdbc-uri>                       Specify JDBC connect  string
  --connection-manager <class-name>          Specify connection manager  class name
  --connection-param-file <properties-file>  Specify connection parameters file
  --driver <class-name>                      Manually specify JDBC  driver class to use
  --hadoop-home <hdir>                       Override $HADOOP_MAPRED_HOME_ARG
  --hadoop-mapred-home <dir>                 Override $HADOOP_MAPRED_HOME_ARG
  --help                                     Print usage instructions
  -P                                         Read password from console
  --password <password>                      Set authentication  password
  --password-alias <password-alias>          Credential provider password alias
  --password-file <password-file>            Set authentication password file path
  --relaxed-isolation                        Use read-uncommitted isolation for imports
  --skip-dist-cache                          Skip copying jars to distributed cache
  --username <username>                      Set authentication username
  --verbose                                  Print more information while working

Saturday, 21 May 2016

Sqoop & SQL Server # 2 - Configuring Sqoop to work with SQL Server

If you've been using SQL Server for many years you'll be used to having all the tools you need to hand packaged together by Microsoft.

The Big Data world isn't so easy to navigate. Often solutions are constructed from different software which needs to be configured to work together. This is true for Sqoop as well.

In these posts I will use a Cloudera VM which has many tools already setup saving hours of administration. However, there are still some elements that need configuration.

To get the free Cloudera Quickstart VM follow this link. I used the version for Oracle VirtualBox which takes less than 5 minutes to import and then you off an running.

JDBC Configuration
Download The Driver from the VM 

You can also download the driver on the Host machine and FTP the file into the VM. For instructions see this link.

Extract and copy to location
  • Open a terminal session and assume super user permissions by typing
su
  • You will be asked for a password followed. The password "cloudera". You now have admin permissions on the VM
  • We need to change the working directory. Type
cd /home/cloudera/Downloads

  • This is case sensitive so beware the capital D on Downloads
  • We now need to extract the contents of the download. Type the following don’t forget the dot in front of the /
tar -zxvf ./sqljdbc_4.0.2206.100_enu.tar.gz

  • Navigate to the location of the driver by typing

cd ./sqljdbc_4.0/enu

  • Copy the driver to the location required by typing
cp sqljdbc4.jar /usr/lib/sqoop/lib

At this point a restart is required so that the new driver can be picked up by the OS

Authentication 
Mixed Mode Authentication needs to be enabled on the SQL Server
  • In management studio connect to the server
  • Right click on server name and select properties
  • Select security tab
  • In the top ‘Server authentication’ section, ensure the ‘SQL Server and Windows Authentication mode’ is selected
  • The server will need to restart before these changes take effect

Security
The SQL Authenticated account used to access the data must have db_datareader permissions in the source database.

Friday, 20 May 2016

Sqoop & SQL Server #1 - An Overview




Apache Sqoop is an open source tool that allows users to extract data from a structured data store into Hadoop for further processing. Sqoop can also export data back to the RDBMS for consumption by  other clients.

It is a command line tool with a set of tools and commands from which you can construct complex import and export scripts.

Within a SQL Server context it is analogous to BCP. 

Why is it important
  • It allows existing RDBMS data to be loaded into an HDFS cluster
  • Many existing analytic tools cannot address HDFS directly so Big Data analysts can be required to export data back to traditional data platforms for analysis with existing tools

The following blog posts are focused on using Sqoop with SQL Server data. Most online tutorials focus on MySQL. When learning Sqoop I found it difficult to find examples for SQL Server. In fact it took days and many grey hairs figuring out how to connect to a named instance. 

I use the Cloudera Quickstart VM which is easily hosted in Oracle VIrtualBox.

Posts in this series include;
Resources
Sqoop User Guide
Sqoop Wiki
Cloudera Quickstart VM
Microsoft JDBC Drivers
WinSCP
O'Reilly Publishing Sqoop Cookbook
Cloudera Community - for submitting questions and searching existing answers

For convenience, I've supplied T-SQL scripts to create and populate a database I'll be using throughout these examples.

Create Database
USE [master]
GO

CREATE DATABASE [Utility]
 CONTAINMENT = NONE
 ON PRIMARY
( NAME = N'Utility', FILENAME = N'C:\Data\Utility.mdf' , SIZE = 65536KB , MAXSIZE = UNLIMITED, FILEGROWTH = 65536KB )
 LOG ON
( NAME = N'Utility_log', FILENAME = N'C:\Logs\Utility_log.ldf' , SIZE = 8192KB , MAXSIZE = UNLIMITED , FILEGROWTH = 65536)
GO

Create Login
USE [master]
GO

CREATE LOGIN [ClouderaAccess] WITH PASSWORD=N'GiveMeData', DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF
GO

Create Table
USE [Utility]
GO

CREATE TABLE [dbo].[Customers]
(
      [CustomerID] [int] IDENTITY(1,1) NOT NULL,
      [FirstName] [varchar](50) NOT NULL,
[Surname] [varchar](50) NOT NULL,
[DateOfBirth] [date] NOT NULL,
[EmailAddress] [varchar](100) NULL,
[PostCode] [varchar](8) NULL,
[MarketingOptIn] [bit] NULL,
[AutoRenewalOptIn] [bit] NULL,
[CreatedDateTime] [datetime] NOT NULL,
[ModifiedDateTime] [datetime] NULL,
CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED
([CustomerID] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

Populate Table - 10 test rows

USE [Utility]
GO
SET IDENTITY_INSERT [dbo].[Customers] ON

GO
INSERT [dbo].[Customers] ([CustomerID], [FirstName], [Surname], [DateOfBirth], [EmailAddress], [PostCode], [MarketingOptIn], [AutoRenewalOptIn], [CreatedDateTime], [ModifiedDateTime]) VALUES (1, N'G', N'THOMAS', CAST(N'1974-07-12' AS Date), N'G.THOMAS@ntlworld.com', N'RM6 5TT', 0, 1, CAST(N'2010-06-16 00:00:00.000' AS DateTime), CAST(N'2010-06-16 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Customers] ([CustomerID], [FirstName], [Surname], [DateOfBirth], [EmailAddress], [PostCode], [MarketingOptIn], [AutoRenewalOptIn], [CreatedDateTime], [ModifiedDateTime]) VALUES (2, N'STEPHANIE', N'CARTER', CAST(N'1968-04-17' AS Date), N'S.CARTER@hotmail.co.uk', N'GU21 3JY', 0, 0, CAST(N'2014-07-11 00:00:00.000' AS DateTime), CAST(N'2014-07-11 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Customers] ([CustomerID], [FirstName], [Surname], [DateOfBirth], [EmailAddress], [PostCode], [MarketingOptIn], [AutoRenewalOptIn], [CreatedDateTime], [ModifiedDateTime]) VALUES (3, N'COLIN', N'BLACKWOOD', CAST(N'1979-10-19' AS Date), N'C.BLACKWOOD@googlemail.com', N'SE3 7JY ', 0, 1, CAST(N'2011-11-27 00:00:00.000' AS DateTime), CAST(N'2011-11-27 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Customers] ([CustomerID], [FirstName], [Surname], [DateOfBirth], [EmailAddress], [PostCode], [MarketingOptIn], [AutoRenewalOptIn], [CreatedDateTime], [ModifiedDateTime]) VALUES (4, N'L', N'BLACKWOOD', CAST(N'1996-01-01' AS Date), N'L.BLACKWOOD@inbox.com', N'SE3 7JY ', 0, 0, CAST(N'2011-11-27 00:00:00.000' AS DateTime), CAST(N'2011-11-27 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Customers] ([CustomerID], [FirstName], [Surname], [DateOfBirth], [EmailAddress], [PostCode], [MarketingOptIn], [AutoRenewalOptIn], [CreatedDateTime], [ModifiedDateTime]) VALUES (5, N'DANIEL', N'SENNITT', CAST(N'1975-01-01' AS Date), N'D.SENNITT@virginmedia.com', N'RG4 6HL ', 0, 0, CAST(N'2012-01-01 00:00:00.000' AS DateTime), CAST(N'2012-01-01 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Customers] ([CustomerID], [FirstName], [Surname], [DateOfBirth], [EmailAddress], [PostCode], [MarketingOptIn], [AutoRenewalOptIn], [CreatedDateTime], [ModifiedDateTime]) VALUES (6, N'M', N'ROSSITER', CAST(N'1982-04-18' AS Date), N'M.ROSSITER@live.com', N'CH49 9AP', 0, 0, CAST(N'2011-02-04 00:00:00.000' AS DateTime), CAST(N'2011-02-04 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Customers] ([CustomerID], [FirstName], [Surname], [DateOfBirth], [EmailAddress], [PostCode], [MarketingOptIn], [AutoRenewalOptIn], [CreatedDateTime], [ModifiedDateTime]) VALUES (7, N'JULIE', N'FLEMING', CAST(N'1978-07-12' AS Date), N'J.FLEMING@mail.com', N'IV2 3TX ', 1, 0, CAST(N'2012-08-02 00:00:00.000' AS DateTime), CAST(N'2012-08-02 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Customers] ([CustomerID], [FirstName], [Surname], [DateOfBirth], [EmailAddress], [PostCode], [MarketingOptIn], [AutoRenewalOptIn], [CreatedDateTime], [ModifiedDateTime]) VALUES (8, N'MICHAEL', N'FLEMING', CAST(N'1934-02-10' AS Date), N'M.FLEMING@live.com', N'IV2 3TX ', 0, 0, CAST(N'2012-08-02 00:00:00.000' AS DateTime), CAST(N'2012-08-02 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Customers] ([CustomerID], [FirstName], [Surname], [DateOfBirth], [EmailAddress], [PostCode], [MarketingOptIn], [AutoRenewalOptIn], [CreatedDateTime], [ModifiedDateTime]) VALUES (9, N'H A', N'PATEL', CAST(N'1972-07-14' AS Date), N'H.PATEL@btinternet.com', N'G52 3LD', 0, 1, CAST(N'2011-02-24 00:00:00.000' AS DateTime), CAST(N'2011-02-24 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Customers] ([CustomerID], [FirstName], [Surname], [DateOfBirth], [EmailAddress], [PostCode], [MarketingOptIn], [AutoRenewalOptIn], [CreatedDateTime], [ModifiedDateTime]) VALUES (10, N'CHRISTOPHER', N'KEECH', CAST(N'1993-12-14' AS Date), N'C.KEECH@hotmail.co.uk', N'TS269LN ', 1, 0, CAST(N'2013-08-22 00:00:00.000' AS DateTime), CAST(N'2013-08-22 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Customers] ([CustomerID], [FirstName], [Surname], [DateOfBirth], [EmailAddress], [PostCode], [MarketingOptIn], [AutoRenewalOptIn], [CreatedDateTime], [ModifiedDateTime]) VALUES (11, N'HANNAH', N'KEECH-REEVES', CAST(N'1988-07-01' AS Date), N'H.KEECH-REEVES@yahoo.co.uk', N'TS269LN ', 0, 0, CAST(N'2013-08-22 00:00:00.000' AS DateTime), CAST(N'2013-08-22 00:00:00.000' AS DateTime))
GO
SET IDENTITY_INSERT [dbo].[Customers] OFF
GO


Detect Local Connection Properties
SELECT
  CONNECTIONPROPERTY('net_transport') AS net_transport,
  CONNECTIONPROPERTY('protocol_type') AS protocol_type,
  CONNECTIONPROPERTY('auth_scheme') AS auth_scheme,
  CONNECTIONPROPERTY('local_net_address') AS local_net_address,
  CONNECTIONPROPERTY('local_tcp_port') AS local_tcp_port,
  CONNECTIONPROPERTY('client_net_address') AS client_net_address