Puneet Wadhwa's BIOINFORMATICS BLOG

Monday, November 21, 2005

An introduction to Data Mining

Hey Readers,

This post is regarding one of the most exciting fields of research in Computer Science and it is opening plethora of opportunities for scientists and researchers especially in the field of Bioinformatics.

Data Mining basically refers to extraction of "knowledge" from large amounts of data. It is also commonly referred to as "Knowledge Mining" or KDD (short for Knowledge Discovery in Databases). Common applications of Data Mining range from using predictive techniques to unearth interesting patterns in large amounts of data; using previous sales and research data to predict future sales; improving the response rates of direct mail campaigns; and using previous sales data to recommend products to returning or new customers.

Look forward to some more information on Data mining coming soon..

Thursday, November 03, 2005

Effect of BLAST Low complexity filter on BLAST Results

This post is regarding one of my recent experiences with BLAST. I was trying to blast a big fasta file consisting of sequences over 4000 base pairs in length, and had the low complexity filter turned on (which is always ON by default).

Ideally when you blast a sequence against itself, it should give you a match over the entire length, but in this case for a sequence of 4200 bp, it only gave me a match over 1500 bp. This is what perplexed me for some time, and I decided to research this problem using NCBI's "blast2seq" tool, with both input sequences same as the 4200 bp sequence. This gave me 3 hits over lengths ranging from 1500 to 1200 bp. This gave me a clue to turn the filter "off", and then I found the hit to be over the entire region.

I hope someone facing the same problem would benefit from this post :)

Best.
Puneet Wadhwa

Wednesday, November 02, 2005

RNA Interference and Gene silencing

RNA interference, or RNAi, is a way for cells to regulate which genes would be expressed. This amazing phenomenon was first observed in petunias, when a scientist called Rich Jorgensen introduced a pigment-producing gene under the control of a powerful promoter. Instead of the expected deep purple color in the petunia, the result was a mixture of variegated and white petunias.

RNAi was named the breakthrough of the year in 2002, yielding a new potential for disease treatment and unraveling the mysteries of the functioning of human genes.

So, why do we need to shutdown the production of some genes?
  • Scientists have been interested in the ability to shut down genes, so that observing the effect of turning down a gene can be observed on an organism, giving clues about the function of the gene.
  • The ability to shut off the genes, may also result in developing new treatments for diseases by turning down the harmful protein producing gene.

A very beautiful article about RNA Interference, or RNAi recently appeared on pbs.org, and can be found at http://www.pbs.org/wgbh/nova/sciencenow/3210/02.html.

Article about careers in Bioinformatics

Hey Readers,

A warm welcome again and wish you all a Happy Diwali! (An Indian festival of lights and Goddess lakshmi - the goddess of wealth)

I found a pretty encouraging article about career opportunities in Bioinformatics and it is attached below. It is located at http://sciencecareers.sciencemag.org/feature/cperspec/biosci.shl

Without further ado, here you go:
-----------------------------------------------------------------------

Bioinformatics, the use of computer technology to manage biological information, made its spectacular debut a few years ago, as the first trickles of gene sequence information from the Human Genome Program (HGP) and other sequencing projects grew into a deluge. Individuals with the skills to work on the interface between molecular biology and computer science instantly became some of the most sought-after job applicants in the biopharma world. With about 3 billion base pairs on its agenda, and a target completion date of 2005, HGP alone should foster a continuing explosion of data and a robust job market for computational biologists.
"Career opportunities in bioinformatics are very, very good," said John M. Greene, senior staff scientist, bioinformatics research, at Gene Logic Inc., Gaithersburg, Maryland. "It seems that every time you turn around a company has decided to set up a bioinformatics group, or expand an existing group. Many scientists are turning their careers in this direction."
But Greene notes that breaking into the field may not be as simple as all the talk about a feeding frenzy for personnel suggests. He cites the common misperception that a person can take a course in C, the programming language, acquire some database knowledge, and be deluged with high-paying job offers. Salaries around the six-figure mark are possible in bioinformatics, but getting them or even an entry-level position requires more planning than was common in the past.

Not many of today's bioinformatics people planned it. Many started out doing something else, entered the field before it had a name, and learned key skills on the job. Some were computer scientists who learned biology. Others were life scientists who learned computing.
After getting a Ph.D. in genetics from Harvard University, Greene did a postdoc, and worked for almost a year at a start-up antisense company. His career path led to Human Genome Sciences (HGS) and a job that involved substantial Basic Local Alignment Search Tool (BLAST) analysis on expressed sequence tags (ESTs) to identify genes with possible medical applications. BLAST programs are basic tools for searching DNA and protein databases for sequence similarities. Greene liked the work and finally switched into bioinformatics full time at HGS. He recently moved up to Gene Logic, which offers pharmaceutical companies technology to speed up development of drug targets. Gene Logic has a proprietary technology that identifies changes in gene expression associated with disease. It is developing a flow-through DNA chip to gauge drug efficacy and toxicity by analyzing gene changes, and an object-oriented database of gene expression patterns to identify new drug targets.

Strongest demand today exists for individuals with degrees in the life sciences and computer sciences, and multiple years of programming and database development experience, Greene says. Typical combinations include a Ph.D. in molecular biology, cell biology, or biochemistry and a B.S. in computer sciences. Life science Ph.D's, largely self-taught in key computer skills, with industry experience, have good opportunities. People who emerge from the few doctoral programs in bioinformatics also will be "incredibly marketable," especially those with industry experience. This range of individuals, very difficult to find, often wind up heading bioinformatics departments or programs.

At the staff scientist and senior staff scientist levels, biopharma companies now tend to place emphasis on applicants with computer science skills. That's largely because databases and search tools are still being developed. Greene thinks that emphasis will shift in a few years to interpreting information in databases. Companies will then look for individuals who first and foremost are biologists but have key computational skills.

What are those skills? Greene's list includes knowledge of UNIX, the operating system used for many computational biology programs; a good grasp of the concept of relational databases, which are the heart of bioinformatics; and skill with Structured Query Language (SQL), a language used to query databases. In the future, knowledge of object-oriented databases may be increasingly important. Programming skills also are essential. Skills with C, the programming language, will help individuals learn Perl, the scripting language widely used in bioinformatics. Object-oriented languages, such as Java, will be increasingly important. Expert knowledge of sequence-analysis programs like BLAST and FASTA is critical. Web skills, of course, are necessary, including the ability to write some Hypertext Markup Language (HTML). What gives one applicant an edge over another? Recruiters get excited over applicants who have applied computational biology skills in a practical way. The individual who wrote a program, for instance, and used it in thesis or postdoctoral work, might have an advantage over a similar individual who just took programming courses."For individuals who thrill at being on the cutting edge of science, with the skills to excel in two very different worlds, bioinformatics can be an extraordinarily good career," Greene said. "For me, the switch was the best step I've taken in the last decade."

Tuesday, November 01, 2005

Syndication of articles from my Bioinformatics Blog

Dear Friends.

I would like to offer syndication of my articles on any website, that may want to display Bioinformatics related articles, tutorials and career information pertaining to Life Sciences.

If you like the content you see here, you can also show it in your website, provided you link back to my Blog and quote the original source of information. If you are interested in link exchange possibilities, feel free to send me an email at pwadhwa@gmail.com