Labels

.net (1) *nix (1) administration (1) Android (2) Axis2 (2) best practice (5) big-data (1) business-analysis (1) code re-use (1) continuous-integration (1) Cordova-PhoneGap (1) database (2) defect (1) design (3) Eclipse (7) education (1) groovy (2) https (2) Hudson (4) Java (1) JAX-RS (2) Jersey (3) Jetty (1) localization (1) m2eclipse (2) MapForce (1) Maven (12) MySQL (1) Nexus (4) notes (4) OO (1) Oracle (4) performance (1) Perl (1) PL/SQL (1) podcast (1) PostgreSQL (1) requirement (1) scripting (1) serialization (1) shell (1) SoapUI (1) SQL (1) SSH (2) stored procedure (1) STS (2) Subclipse (1) Subversion (3) TOAD (3) Tomcat (4) UML (2) unit-testing (2) WAMP (1) WAS (3) Windows (3) WP8 (2) WTP (2) XML (4) XSLT (1)

Monday, January 21, 2013

Notes on "What is Hadoop? Other big data terms like MapReduce? Cloudera's CEO talks" video

  • Map reduce - spread processing of data over many computers so data can be processed in parallel with cheaper hardware.  No transactions or schema.  Aimed more at analysis (Reads) rather than full CRUD?
  • Relational databases not good for unstructured free text  
  • Hadoop - open source, consists of (1) distributed file system to spread out data (HDFS), (2) way to push code down to do data analysis on the data 
  • Scalable because can just drop in more servers 
  • Memcached - in memory cache of relational database, push through writes "incrementally"? 
  • NoSQL - distributed hash tables
  • Sharding (not discussed in video) - taking rows of a relational table and distributing across computers

No comments:

Post a Comment