Friday, May 21, 2010

MySQLs auto_increment in Oracle

MySQL has a reserved keyword "auto_increment" that can be used in the ID/primary key column of a new table. Whenever a new record is added to the table, this field will be automatically set to an unique incremented integer value.

In the Oracle database, this feature is missing. But we can simulate this feature using the SEQUENCE and TRIGGER functionality provided by Oracle.

A sequence is a sequential value stored in the database. It starts with an initial user supplied value, and can be incremented by an arbitrary value to a given maximum value. The statement to create a sequence for table ATABLE:

create sequence ATABLE_SEQ
minvalue 1
maxvalue 999999999999999999999999999
start with 1
increment by 1
cache 20;

Now, we want to hook up a trigger on an INSERT event of the table. Whenever a record is inserted into the table, the current value of the sequence is incremented and set in the primary key field of the inserted record.

create or replace trigger ATABLE_TRIG
before insert on ATABLE
for each row
begin
select atable_seq.nextval into :new.id from dual;
end;

Thursday, May 20, 2010

Accessing MS Active Directory using LDAP

In this example, we'll implement a simple client that performs two simple lookups in Microsoft Active Directory to retrieve groups and users through the LDAP interface. The LDAP interface can be accessed using the standard JNDI API.

The Microsoft implementation of the LDAP interface is limited to 1000 query results. So to handle more than 1000 results, we have to maintain a cookie that stores the current position of retrieved results and pass the cookie back to the server to continue the query from the cookie stored position. This is called a paged query.

We first start by creating a LdapContext object, which we'll use to access Active Directory through the LDAP interface. The Context object is configured using a Hashtable.

Hashtable<String, String> env = new Hashtable<String, String>();
env.put(Context.INITIAL_CONTEXT_FACTORY,
"com.sun.jndi.ldap.LdapCtxFactory");
env.put(Context.SECURITY_AUTHENTICATION, "simple");
env.put(Context.SECURITY_PRINCIPAL, "<user>");
env.put(Context.SECURITY_CREDENTIALS, "<password>");
env.put(Context.PROVIDER_URL, "ldap://<host>:389");
env.put(Context.REFERRAL, "follow");

// We want to use a connection pool to
// improve connection reuse and performance.
env.put("com.sun.jndi.ldap.connect.pool", "true");

// The maximum time in milliseconds we are going
// to wait for a pooled connection.
env.put("com.sun.jndi.ldap.connect.timeout", "300000");

Next, we need to specify how we are going to search the directory with a SearchControls object, from where we want to start the search with a search base string, and how we want to filter results.

SearchControls searchCtls = new SearchControls();
// We start our search from the search base.
// Be careful! The string needs to be escaped!
String searchBase = "dc=company,dc=com";

// There are different search scopes:
// - OBJECT_SCOPE, to search the named object
// - ONELEVEL_SCOPE, to search in only one level of the tree
// - SUBTREE_SCOPE, to search the entire subtree
searchCtls.setSearchScope(SearchControls.SUBTREE_SCOPE);

// We only want groups, so we filter the search for
// objectClass=group. We use wildcards to find all groups with the
// string "<group>" in its name. If we want to find users, we can
// use: "(&(objectClass=person)(cn=*<username>*))".
String searchFilter = "(&(objectClass=group)(cn=*<group>*))";

// We want all results.
searchCtls.setCountLimit(0);

// We want to wait to get all results.
searchCtls.setTimeLimit(0);

// Active Directory limits our results, so we need multiple
// requests to retrieve all results. The cookie is used to
// save the current position.
byte[] cookie = null;

// We want 500 results per request.
ctx.setRequestControls(
new Control[] {
new PagedResultsControl(500, Control.CRITICAL)
});

// We only want to retrieve the "distinguishedName" attribute.
// You can specify other attributes/properties if you want here.
String returnedAtts[] = { "distinguishedName" };
searchCtls.setReturningAttributes(returnedAtts);

// The request loop starts here.
do {
// Start the search with our configuration.
NamingEnumeration<SearchResult> answer = ctx.search(
searchBase, searchFilter, searchCtls);

// Loop through the search results.
while (answer.hasMoreElements()) {
SearchResult sr = answer.next();
Attributes attr = sr.getAttributes();
Attribute a = attr.get("distinguishedName");
// Print our wanted attribute value.
System.out.println((String) a.get());
}

// Find the cookie in our response and save it.
Control[] controls = ctx.getResponseControls();
if (controls != null) {
for (int i = 0; i < controls.length; i++) {
if (controls[i] instanceof
PagedResultsResponseControl) {
PagedResultsResponseControl prrc =
(PagedResultsResponseControl) controls[i];
cookie = prrc.getCookie();
}
}
}

// Use the cookie to configure our new request
// to start from the saved position in the cookie.
ctx.setRequestControls(new Control[] {
new PagedResultsControl(500,
cookie,
Control.CRITICAL) });
} while (cookie != null);

// We are done, so close the Context object.
ctx.close();

We can also print all attributes from an object using its fully distinguished name using the following code:

String dN = "CN=JohnDoe,OU=Users,DC=company,DC=com";
Attributes answer = ctx.getAttributes(dN);

for (NamingEnumeration<?> ae = answer.getAll(); ae.hasMore();) {
Attribute attr = (Attribute) ae.next();
String attributeName = attr.getID();
for (NamingEnumeration<?> e = attr.getAll(); e.hasMore();) {
System.out.println(attributeName + "=" + e.next());
}
}

Monday, May 17, 2010

Search implementation strategies

In this blog post, I will talk briefly about two approaches in providing search functionality in a JEE application. I will talk about how content is offered to a core search engine like Lucene, which is responsible for indexing, storing and retrieving. Lucene offers a library to build search engines. It does not provide a way to retrieve or accept content from sources for indexing. Third party vendors that provide complete search engine products based on Lucene exist to fulfill this need. How the indexing works is outside the scope of this blog post. Interesting sites powered by Lucene can be found here.

The two approaches I'm going to discuss:

  • Web crawler/spider
  • Enterprise search


The web crawler is an automated bot that starts with indexing a webpage supplied by an user or read from an internal list. Every hyperlink found in the page will also be scheduled to be indexed. In this way, the web crawler hops from page to page, indexing the content after every visit. The web crawler conforms to a pull model, as it initiates the request for content for indexing.

The enterprise search that conforms to a push model, is an API or service that waits for clients to provide content for indexing. A typical scenario is a CMS that connects to the search engine webservice, with the content as parameter or content. The client initiates the indexing process in this case.

Advantages of web crawling:

  • No modification needed in existing application
  • Easy to implement search on multiple websites


Disadvantages of web crawling:

  • Pages not reachable by hyperlinks cannot be indexed
  • New content only visible in search engine after a crawler visit
  • Crawling impacts performance due to crawler traffic
  • Only public HTML-pages can be indexed by default


Advantages of enterprise search:

  • Finer control of search results (authorization, meta-data, keywords)
  • New content explicitly added/removed/updated by application
  • Support for all kinds of content (even non-public content or database content)
  • Minimal performance impact


Disadvantages of enterprise search:

  • Integration code needed in every application to facilitate search functionality


There is no best way in search strategy. Choosing between the two approaches heavily depends on the context and requirements of the system.

Friday, May 14, 2010

DeMilitarized Zone (DMZ)

Most of the time, your webserver or application server is protected by a firewall that limits external traffic to your server. To improve security even more, an additional firewall can be installed between the webserver and the internal network, which includes the data being served. In most cases, this will be the database server.

The area between the two firewalls is called the DeMilitarized Zone or DMZ. When an intruder manages to compromise the web server, he still has to find a way to circumvent the inner firewall to actually get to the internal network or data.

Here is an PDF-article that explains how to design a DMZ for your network. More information about DMZ can be found here at Wikipedia.

Saturday, May 8, 2010

Java ServerFaces Tutorial

An excellent tutorial about Java ServerFaces can be found at RoseIndia.net: JSF Tutorial. I use this link regularly as reference. When I search information about how to do something in JSF, I often end up on this site. It's an excellent resource about everything related to software engineering.

Monday, May 3, 2010

Version control with GIT on Windows

GIT is a Bitkeeper inspired version control system created by Linus Torvalds. It features fast distributed, decentralized version control. Originally developed for Linux, the application is now also available to Windows under the name MsysGIT.

If you don't like the standard integrated Windows Explorer interface, you can also download the interface by Tortoise here: http://code.google.com/p/tortoisegit. It still needs MsysGIT to work.

Quick reference of commands I regularly use:

Basic commands:
git init - place current directory under version control
git add . - stage all files in the current directory for commit
git commit - commit all staged files
git commit -a - commit all files (including unstaged ones)
git log - view history of changes
git diff - show changes between commits

Clone commands:
git clone <path_to_existing_repos> <clone_name> - clone an existing repository
git pull - fetch latest changes from master branch
git push - push latest changes to master branch
git pull <existing clone> [<branch>] - fetch latest changes from clone
git push <existing clone> [<branch>] - push latest changes to clone


Branche commands:
git branch - list all branches
git branch <branch_name> - create a new branch in current repository
git checkout <branch_name> - checkout current branch in current repository
git merge <branch_name> - merge branch into master branch

There is also a GIT plugin available for Eclipse:
http://www.eclipse.org/egit/
http://wiki.eclipse.org/EGit/User_Guide

Next, I will describe a real life scenario of a team working on the same project.

Installation
John starts a new project and creates a local repository from his working directory:

cd project
git init
git add .
git commit

Now he wants to share this project with his team members. To do that, John creates a central empty "bare" project that is going to hold all the project files:

cd <path_to_central_directory>
mkdir project
cd project
git init --bare

Once the central repository is ready, John pushes the content from his local working directory to the central repository:

git remote add origin <path_to_central_directory>/project
git push origin master

Now Karen can join this project by cloning the central repository to her local directory:

git clone <path_to_central_directory>/project

She can now edit some files and push the changes to the central repository:

git add file
git commit
git push [origin master]

John can fetch or pull the changes from the central repository to merge them with his own repository with:

git fetch
git merge HEAD

or

git pull origin master

Now, John wants to start a new version in a branch:

git branch new_version

The current checked out branch can be viewed using this command:

git branch

John can start editting the new branch by checking out the new branch by using:

git checkout new_version

When John is done, he can merge the changes in the new branch to the master with:

git merge new_version

References: