Installing Apache Thrift On Windows
May 15, 2010. I've been doing some of the Facebook Engineering Puzzles recently, and in a previous blog post I described how to solve the problem User Bin Crash. The problem User Bin Crash would be considered a batch problem. Such problem requires a program that takes some input, and produces some output. You’ve got problems, I’ve got advice. This advice isn’t sugar-coated—in fact, it’s sugar-free, and may even be a little bitter. Welcome to Tough Love.
Appears to be outdated. When I add this to /etc/profile: export PYTHONPATH=$PYTHONPATH:/usr/lib/hive/lib/py I can then do the imports as listed in the link, with the exception of from hive import ThriftHive which actually need to be: from hive_service import ThriftHive Next the port in the example was 10000, which when I tried caused the program to hang. The default Hive Thrift port is 9083, which stopped the hanging.
I believe the easiest way is to use PyHive. To install you'll need these libraries: pip install sasl pip install thrift pip install thrift-sasl pip install PyHive Please note that although you install the library as PyHive, you import the module as pyhive, all lower-case. If you're on Linux, you may need to install SASL separately before running the above. Install the package libsasl2-dev using apt-get or yum or whatever package manager for your distribution. For Windows there are some options on GNU.org, you can download a binary installer. On a Mac SASL should be available if you've installed xcode developer tools ( xcode-select --install in Terminal) After installation, you can connect to Hive like this: from pyhive import hive conn = hive.Connection(host='YOUR_HIVE_HOST', port=PORT, username='YOU') Now that you have the hive connection, you have options how to use it.
You can just straight-up query: cursor = conn.cursor() cursor.execute('SELECT cool_stuff FROM hive_table') for result in cursor.fetchall(): use_result(result).or to use the connection to make a Pandas dataframe: import pandas as pd df = pd.read_sql('SELECT cool_stuff FROM hive_table', conn). I assert that you are using HiveServer2, which is the reason that makes the code doesn't work. You may use pyhs2 to access your Hive correctly and the example code like that: import pyhs2 with pyhs2.connect(host='localhost', port=10000, authMechanism='PLAIN', user='root', password='test', database='default') as conn: with conn.cursor() as cur: #Show databases print cur.getDatabases() #Execute query cur.execute('select * from table') #Return column info from query print cur.getSchema() #Fetch table results for i in cur.fetch(): print i Attention that you may install python-devel.x86_64 cyrus-sasl-devel.x86_64 before installing pyhs2 with pip.
Wish this can help you. The examples above are a bit out of date. Pyhs2 is no longer maintained. A better alternative is impyla It has many more features over pyhs2, for example, it has Kerberos authentication, which is a must for us. From impala.dbapi import connect conn = connect(host='my. Lycoming Engine Serial Number Lookup. host.com', port=10000) cursor = conn.cursor() cursor.execute('SELECT * FROM mytable LIMIT 100') print cursor.description # prints the result set's schema results = cursor.fetchall() ## cursor.execute('SELECT * FROM mytable LIMIT 100') for row in cursor: process(row) Cloudera is putting more effort now on hs2 client which is a C/C++ HiveServer2/Impala client. Might be a better option if you push a lot of data to/from python.
(has Python binding too - ) Some more information on impyla: • • Don't be confused that some of the above examples talk about Impala; just change port to 10000 (default) for HiveServer2, and it'll work the same way as with Impala examples. It's the same protocol (Thrift) that is used for both Impala and Hive. You don't have to do a global INVALIDATE METADATA, you could just do a table-level one INVALIDATE METADATA schema.table. Even then, I don't understand the downvote, because my code above connects to port 10000 - which is a thrift service of HiveServer2, so you don't have to do any invalidates, as you SQL commands would be run directly in Hive. Re-copying last paragraph. Don't be confused that some of the above examples talk about Impala; just change port to 10000 for HiveServer2, and it'll work the same way as with Impala examples. – Aug 15 at 5:45.
You could use python JayDeBeApi package to create DB-API connection from Hive or Impala JDBC driver and then pass the connection to pandas.read_sql function to return data in pandas dataframe. Similar to eycheu's solution, but a little more detailed. Here is an alternative solution specifically for hive2 that does not require PyHive or installing system-wide packages. I am working on a linux environment that I do not have root access to so installing the SASL dependencies as mentioned in Tristin's post was not an option for me: If you're on Linux, you may need to install SASL separately before running the above. Install the package libsasl2-dev using apt-get or yum or whatever package manager for your distribution. Specifically, this solution focuses on leveraging the python package: JayDeBeApi.
In my experience installing this one extra package on top of a python Anaconda 2.7 install was all I needed. This package leverages java (JDK). I am assuming that is already set up. Step 1: Install JayDeBeApi pip install jaydebeap Step 2: Download appropriate drivers for your environment: • Here is a required for an enterprise CDH environment • that talks about where to find jdbc drivers for Apache Hive Store all.jar files in a directory. I will refer to this directory as /path/to/jar/files/. Step 3: Identify your systems authentication mechanism: In the pyhive solutions listed I've seen PLAIN listed as the authentication mechanism as well as Kerberos.
Note that your jdbc connection URL will depend on the authentication mechanism you are using. I will explain Kerberos solution without passing a username/password. Create a Kerberos ticket if one is not already created $ kinit Tickets can be viewed via klist.
README. Anthony Joseph Hernandez. md Apache Thrift Last Modified: 2017-11-10 License Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the 'License'); you may not use this file except in compliance with the License. You may obtain a copy of the License at Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Introduction Thrift is a lightweight, language-independent software stack with an associated code generation mechanism for RPC. Thrift provides clean abstractions for data transport, data serialization, and application level processing. The code generation system takes a simple definition language as its input and generates code across programming languages that uses the abstracted stack to build interoperable RPC clients and servers. Thrift makes it easy for programs written in different programming languages to share data and call remote procedures.
With support for, chances are Thrift supports the ones that you currently use. Thrift is specifically designed to support non-atomic version changes across client and server code.
For more details on Thrift's design and implementation, take a gander at the Thrift whitepaper included in this distribution or at the README.md file in your particular subdirectory of interest. Project Hierarchy thrift/ compiler/ Contains the Thrift compiler, implemented in C++. Lib/ Contains the Thrift software library implementation, subdivided by language of implementation.
Cpp/ go/ java/ php/ py/ rb/. Test/ Contains sample Thrift files and test code across the target programming languages. Tutorial/ Contains a basic tutorial that will teach you how to develop software using Thrift. Requirements See for an up-to-date list of build requirements. Resources More information about Thrift can be obtained on the Thrift webpage at: Acknowledgments Thrift was inspired by pillar, a lightweight RPC tool written by Adam D'Angelo, and also by Google's protocol buffers.
Installation If you are building from the first time out of the source repository, you will need to generate the configure scripts. (This is not necessary if you downloaded a tarball.) From the top directory, do:./bootstrap.sh Once the configure scripts are generated, thrift can be configured. From the top directory, do:./configure You may need to specify the location of the boost files explicitly. If you installed boost in /usr/local, you would run configure as follows:./configure --with-boost=/usr/local Note that by default the thrift C++ library is typically built with debugging symbols included. If you want to customize these options you should use the CXXFLAGS option in configure, as such:./configure CXXFLAGS='-g -O2'./configure CFLAGS='-g -O2'./configure CPPFLAGS='-DDEBUG_MY_FEATURE' To enable gcov required options -fprofile-arcs -ftest-coverage enable them:./configure --enable-coverage Run./configure --help to see other configuration options Please be aware that the Python library will ignore the --prefix option and just install wherever Python's distutils puts it (usually along the lines of /usr/lib/pythonX.Y/site-packages/). If you need to control where the Python modules are installed, set the PY_PREFIX variable. (DESTDIR is respected for Python and C++.) Make thrift: make From the top directory, become superuser and do: make install Note that some language packages must be installed manually using build tools better suited to those languages (at the time of this writing, this applies to Java, Ruby, PHP).
Look for the README.md file in the lib// folder for more details on the installation of each language library package. Testing There are a large number of client library tests that can all be run from the top-level directory. Make -k check This will make all of the libraries (as necessary), and run through the unit tests defined in each of the client libraries. If a single language fails, the make check will continue on and provide a synopsis at the end.
To run the cross-language test suite, please run: make cross This will run a set of tests that use different language clients and servers. Development To build the same way Travis CI builds the project you should use docker.