scope of beginning a career in big data using hadoop in india?
Point 1
We (my organization) had mulitple POC (proof of concept) with various vendors of data integration tools and Big Data Platforms like
talend, syncsort, Informatica, Microsoft Parallel Datawarehouse, vertica, HDP, Cloudera etc and looks like every vendor is trying to provide solution or integration with hadoop which explains that things are moving towards big data.
we finally bought Talend, Cloudera distribution and Vertica.
Point 2
The major distributions like Cloudera and HDP (Hortonworks) are getting millions of dollars (funding) show’s the future is good.
Point 3
I recently had a chance to interview many people who were technically good but didn’t have experience on Hadoop ecosystem (In India) so demand is there.
how to begin and learn?
My major experience was in MSBI (microsoft business intelligence tools) so i knew only SQL and OLAP and bits of C#.
which actually became the foundation for me to learn major parts of Hadoop ecosystem
like Hive and Impala (Part of Cloudera Distribution).
I can easily divide the job opportunities in the following categories and all can have different learning curve and different tools and technologies.
Note – I started with single node hadoop(cloudera distribution) on my ubuntu installed machine first to learn. currently, cloudera is providing the quickstart VM with completely pre-installed hadoop ecosystem.
Hadoop Developer – The one who works on data integration between Hadoop and different systems. Knows (Java or python) and major parts of hadoop ecosystem.
Hadoop Administrator – The infrastructure people who look after the maintenance of Hadoop Cluster, updation etc.
Datascientist – My team comprises of data engineers and datascientist. The major difference being datascientist know tools like R programming, Python (Major machine learning libraries) and spark.
Hadoop is bleeding edge and its changing every few months and its also tough to sync all the other tools dependent or based on hadoop because of the changes in the release versions.
I wouldn’t be surprised if soon the whole OLTP or OLAP system is based of hadoop. That would surely mean tough competition for all other database vendors.